Re: Top kernel oopses/warnings for the week of May 30th 2008

Previous thread: [PATCH] 8250: fix break handling for Intel 82571 by Aristeu Rozanski on Friday, May 30, 2008 - 12:25 pm. (1 message)

Next thread: [git pull] Input updates for 2.6.26-rc4 by Dmitry Torokhov on Friday, May 30, 2008 - 12:41 pm. (1 message)
To: Linux Kernel Mailing List <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Hugh Dickins <hugh@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 12:39 pm

The http://www.kerneloops.org website collects kernel oops and
warning reports from various mailing lists and bugzillas as well as
with a client users can install to auto-submit oopses.
Below is a top 10 list of the traces collected in the last 7 days.
(Reports prior to 2.6.23 have been omitted in collecting the top 10)

This week, a total of 3670 oopses and warnings have been reported,
compared to 3029 reports in the previous week.

In addition to Fedora, Debian now has included the client application in their
default GUI install targets, thanks a lot for that!

This week, based on feedback, I've split the report into "untainted"
and "caused by proprietary drivers". Let me know if I should continue
doing this or if the old format was better.

As an experiment (on request) I've exported the database to text files (one file
per report) and stuck it in a git repository. You can take a look with
git clone git://www.kerneloops.org/
Suggestions for improving the format of this are obviously very welcome, as are
"yes useful" and "no not useful" comments. Again, this is an experiment, if it's
not seen as useful I may discontinue it.

Per file statistics
1427 kernel/sysctl.c
238 fs/sysfs/dir.c
206 fs/buffer.c
167 security/selinux/hooks.c
84 kernel/spinlock.c
53 net/mac80211/main.c
48 mm/highmem.c
30 net/core/sock.c
26 net/bluetooth/rfcomm/sock.c
26 drivers/media/video/saa7134/saa7134-cards.c
24 mm/rmap.c
23 kernel/softirq.c

Seen with untainted systems
---------------------------
Rank 2: sysfs_add_one (warning)
Reported 243 times (759 total reports)
Duplicated sysfs entries, various drivers including USB
This warning was last seen in version 2.6.26-rc3, and first seen in 2.6.24-rc6.
More info: http://www.kerneloops.org/searchweek.php?search=sysfs_add_one

Rank 3: mark_buffer_dirty (warning)
Reported 222 times (759 total reports)
EXT3 bug while hot-removing a USB device
This warning was last seen in version 2.6.25.3, and first seen in 2.6.24-rc6.
More in...

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Hugh Dickins <hugh@...>, Jeff Garzik <jeff@...>
Date: Sunday, June 1, 2008 - 8:02 pm

This is a shining example of why people should avoid binary drivers. I'd
guess that the bug is related to the new 64-bit capability code.

It'd be really interesting to know what this driver is doing with
capabilities in the first place.

If anyone is using this driver, the output of the following command as a
non-root user from gnome-terminal or similar may be of interest:

$ cat /proc/self/status |grep ^Cap

It should generally be all zeroes.

- James
--
James Morris
<jmorris@namei.org>
--

To: James Morris <jmorris@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Hugh Dickins <hugh@...>, Jeff Garzik <jeff@...>
Date: Sunday, June 1, 2008 - 10:27 pm

it's easy; it's making the user root via the following function:

void ATI_API_CALL KCL_PosixSecurityCapSetEffectiveVector(KCL_TYPE_Cap cap)
{
capt_t(current->cap_effective) = cap;
}

--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, May 30, 2008 - 6:34 pm

Hi Arjan,

It seems like a strange coincidence that the first two entries have
the same number of total reports. Is this really the case or is there
something mixed up?

All the best,
Jochen
--
http://seehuhn.de/
--

To: Jochen Voß <jochen.voss@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, May 30, 2008 - 6:36 pm

coincidence; I just recreated a new version of the report (so several hours later) and they're 1 appart now.
--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 3:19 pm

Hugh
--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>, Dave Jones <davej@...>
Date: Monday, June 2, 2008 - 7:44 pm

Though I've spent quite a while poring over it, I regret to say I
haven't got much beyond the obvious with this BUG_ON(!PageHighMem)
in set_page_address() called from flush_all_zero_pkmaps().

It appears to be a corruption of the start of the pkmap_page_table,
but not a random corruption: entries of the form 0x378xxxxx through
0x37Bxxxxxx where they need to be 0x38xxxxxx or more to be highmem.
(I say appears because the compiler is reusing %eax a lot, there's
no trace on the stack or in registers of what pte was actually read.)

In every case except the 17141 nfsd one, it's found at the start of
the table, when flush_all_zero_pkmaps() is called for the very first
time (I'm guessing that from the fact that they're all failing on the
second entry, which preincrementation of the index made the first one
used). Whereas 17141 nfsd finds a 0x00000xxx some way into the page
table, quite possibly later on: may have a very different cause.

Do we have any idea whether all or most of these come from a single
machine? That would of course be a very different (less interesting)
story from if they're spread out over lots of machines.

I didn't notice anything suspicious in the Fedora patches to 2.6.25,
but I haven't heard (Google hasn't shown) any such problem outside
of these kerneloops from Fedora 9. Is it showing up on Rawhide at
all? If so, then we could devise some debug to include in coming
kernels to help shed more light on it.

Veering off at a tangent away from the oops: I was rather sobered
to see all those traces of execve using kmap, I thought we were
avoiding kmap like BKL in common paths these days (though it is
convenient for symlinks). Would a patch something like that
below, copying the filemap.c trick, be welcome?

Hugh

--- 2.6.26-rc4/fs/exec.c 2008-05-26 20:00:39.000000000 +0100
+++ linux/fs/exec.c 2008-06-02 11:18:32.000000000 +0100
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/init.h>
#include <linux/pagemap.h>
+#include &...

To: Hugh Dickins <hugh@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>, Dave Jones <davej@...>
Date: Monday, June 9, 2008 - 12:32 pm

FYI, i stuck this into -tip for testing and after some time i started
getting:

[ 8.540917] Freeing unused kernel memory: 304k freed
[ 12.368096] BUG: scheduling while atomic: ifup-eth/1820/0x10000001
[ 12.374144] Modules linked in:
[ 12.377175] Pid: 1820, comm: ifup-eth Not tainted 2.6.26-rc5-00029-ga252672-dirty #3490
[ 12.384031] [<c0131a39>] __schedule_bug+0x59/0x60
[ 12.388031] [<c06b1375>] schedule+0x465/0x8c0
[ 12.392031] [<c013eecf>] ? update_process_times+0x4f/0x60
[ 12.396031] [<c013b50f>] ? irq_exit+0x3f/0x70
[ 12.400451] [<c012164b>] ? smp_apic_timer_interrupt+0x5b/0x90
[ 12.406248] [<c0117038>] ? apic_timer_interrupt+0x28/0x30
[ 12.411702] [<c0131a58>] __cond_resched+0x18/0x30
[ 12.416466] [<c06b1838>] _cond_resched+0x28/0x30
[ 12.421141] [<c03720bb>] strnlen_user+0x2b/0x60
[ 12.425728] [<c018dd53>] copy_strings+0x63/0x210
[ 12.430403] [<c018f986>] do_execve+0x176/0x200
[ 12.434903] [<c0372007>] ? strncpy_from_user+0x37/0x60
[ 12.440031] [<c0114ade>] sys_execve+0x2e/0x60
[ 12.444447] [<c01165ae>] sysenter_past_esp+0x6a/0x90
[ 12.449469] =======================
[ 12.736676] eth1: link down
[ 12.736919] ADDRCONF(NETDEV_UP): eth1: link is not ready

it would occur about every 10 bootups with the same config. Bisection
led me to your patch.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>, Dave Jones <davej@...>
Date: Tuesday, June 10, 2008 - 8:42 am

Right, that would be with CONFIG_PREEMPT_VOLUNTARY. Or in my case
with CONFIG_DEBUG_SPINLOCK_SLEEP, strnlen_user's might_sleep gives
BUG: sleeping function called from invalid context...

At first I thought it was just falling foul of our zeal for might_sleep.
But no, the warning is correct: the get_user(str) and strnlen_user(str)
can perfectly well fault, but my suggested patch lets them be called
with a kmap_atomic outstanding.

I doubt it would be cost-effective to kunmap_atomic for each little
string there. I don't see a quick and effective way to fix it up.
I don't have the patience to go about adding get_user_inatomic and
strnlen_user_inatomic, there's more urgent things to be doing.

It would be nice to use a per-process kmap; or use an efficient
one-page mapping in the exec'ers userspace; or maybe just having
a kunmap_and_flush would help (to slow the cycling around pkmap
page table), though it would still involve the global spinlock.

Sorry, no quick and effective fix: please just drop the patch.

Thanks,
Hugh
--

To: Hugh Dickins <hugh@...>
Cc: <arjan@...>, <linux-kernel@...>, <torvalds@...>, <mingo@...>, <greg@...>, <jeff@...>, <davej@...>
Date: Monday, June 2, 2008 - 8:00 pm

On Tue, 3 Jun 2008 00:44:38 +0100 (BST)

eek.

/*
* Are we running in atomic context? WARNING: this macro cannot
* always detect atomic context; in particular, it cannot know about
* held spinlocks in non-preemptible kernels. Thus it should not be
* used in the general case to determine whether sleeping is possible.
* Do not use in_atomic() in driver code.
*/
#define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)

--

To: Andrew Morton <akpm@...>
Cc: <arjan@...>, <linux-kernel@...>, <torvalds@...>, <mingo@...>, <greg@...>, <jeff@...>, <davej@...>
Date: Monday, June 2, 2008 - 8:41 pm

Yes, that comment is all about how a common function cannot be expected
to guess whether it's being called in atomic context or not; but we
know that we don't have any spinlocks held here, therefore it's okay.

Or do you consider fs/exec.c a driver, and shouldn't set bad example?
It is exactly the test that do_page_fault() makes at the other end,
when deciding whether it can handle the fault.

Originally I had a bool atomic there instead. I switched over to
testing in_atomic() itself because I had it mind to suggest another
patch: it has long seemed wrong to me that we should have to disable
preemption and fault handling there, when often (on many architectures,
or on many pages) it's unnecessary.

So I'd like to change (the various implementations of) kmap_atomic()
to use pagefault_disable() only when the page actually is in highmem.

Hugh
--

To: Hugh Dickins <hugh@...>
Cc: <arjan@...>, <linux-kernel@...>, <torvalds@...>, <mingo@...>, <greg@...>, <jeff@...>, <davej@...>
Date: Monday, June 2, 2008 - 9:19 pm

Well, if you're sure.. I didn't look very closely (sorry), nor did you
explain very closely.

I think doing this sort of thing is OK in fs/exec.c from the
should-we-be-doing-this-in there POV, but it should have suitable comments

So... places like file_read_actor() would be given an open-coded
pagefault_disable() so we preserve out implicit boolean-passing down to
do_page_fault()?

One of the reasons why we (I?) left kmap_atomic() doing
pagefault_disable() for all pages was testing coverage: not many
developers test with highmem nowadays so there's a high risk (almost a
certainty) that people will start adding can-schedule code inside their
kmap_atomic() regions. Probably it's not a terribly good reason...
--

To: Hugh Dickins <hugh@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 5:43 pm

It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that
part. You can see it from the code portion: the "<0f> 0b" gives it away
(that's the ud2 opcode).

There's two BUG_ON()'s in that function, and I think it's the second one,
based on at least the code generation that my particular compiler version
gets. IOW, it would be the

BUG_ON(list_empty(&page_address_pool));

thing.

Why would we run out of the page-address pool? Or perhaps the right
question is what actually protects us from _not_ running out?

We seem to depend on the page_address_pool always being in sync with the
pkmap_count[] array, but the fact is, they are not protected by the same
locks. The array is protected by kmap_lock, and the page_address_pool is
protected by the "pool_lock".

And even if they were to nest properly (I don't think they do), we
actually do the list_empty(&page_address_pool) outside the pool lock,
so...

I dunno. That code is really messy. Why does it have two locks for the
data structures when it then seems to absolutely require that they are
always coherent? And if we want to have separate locks, we cannot require
that they are in lock-step, perhaps we should have more pages in the
page_address_pool than strictly required since they may not be 1:1?

I do hate that mm/highmem.c mess, but I also wonder what made it start to
trigger if it's a bug there. That code hasn't changed in ages, afaik.

I don't think this is Hugh's fault, but on the other hand I think it would
be great if Hugh looked at it. I think most of that code predates even the
BK repo - because I'm not finding any history for it even in the
historical archives. Who dares look at it?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 6:00 pm

ok for some it did gather this information, and it is

kernel BUG at mm/highmem.c:319!

--

To: Arjan van de Ven <arjan@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 6:30 pm

That's just _odd_. The call chain actually has kmap() in it, and kmap
does:

if (!PageHighMem(page))
return page_address(page);
return kmap_high(page);

so if it's the one at line 319, which says

BUG_ON(!PageHighMem(page));

then I wonder what happened to that PageHighMem() test of the page in
between..

Ahh.. Not the same "page". It looks like it's in the
flush_all_zero_pkmaps() path, and it's clearing some _other_ page in the
pkmap table in order to make room for the new one. So the page that causes
problems is from here:

page = pte_page(pkmap_page_table[i]);

rather than the one we're trying to map.

Not that it explains the BUG_ON(). We should only insert page table
entries into the pkmap_page_table[] array in map_new_virtual(), which in
turn is only called from kmap_high(), which in turn means that *those*
pages have also gine through the PageHighMem() test.

So it sounds like we either
- have corruption in pkmap_page_table[]
- or pte_page() doesn't reverse mk_pte(page) propely, and one or the
other is broken.

Does anybody know if the fc9 x86-32 kernel is built with PAE enabled?
Might this be another PAE bit-masking bug and thus possibly fixed by the
PTE_MASK changes?

Linus

--

To: Linus Torvalds <torvalds@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 6:34 pm

versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and these ones didn't,
(they're all in the "2.6.25-14.fc9.i686" form) so this is a kernel without PAE.
--

To: Arjan van de Ven <arjan@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 6:55 pm

Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686,
and with that many reports I'd have expected it from other kernels too.
What was the previous popular fc9 kernel (I assume it was 2.6.25-based
too?), and what changed?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Arjan van de Ven <arjan@...>, Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 8:41 pm

On Fri, May 30, 2008 at 03:55:25PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> >
> > versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and
> > these ones didn't, (they're all in the "2.6.25-14.fc9.i686" form) so
> > this is a kernel without PAE.
>
> Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686,
> and with that many reports I'd have expected it from other kernels too.
> What was the previous popular fc9 kernel (I assume it was 2.6.25-based
> too?), and what changed?

-14 is the version that we released F9 with, which explains its popularity.
-18 was the first update we pushed out within the first few days..

The earlier f9 builds were only beaten on by people testing our development
tree, which is nowhere near as many as what jump on a proper release.

Dave

--
http://www.codemonkey.org.uk
--

To: Linus Torvalds <torvalds@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 5:49 pm

I've seen it a few more times the last few weeks, I'll dig into how that is happening.
Maybe we changed the bug_on text to miss my regexps ;(
(it's only about 1000 lines of perl, so what can go wrong in that ;-)
--

To: Linus Torvalds <torvalds@...>
Cc: Hugh Dickins <hugh@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Greg KH <greg@...>, Jeff Garzik <jeff@...>
Date: Friday, May 30, 2008 - 6:17 pm

ok it was a bug I already fixed a few days ago; any reports from the last 2 or 3 days shouldn't have this.
--

Previous thread: [PATCH] 8250: fix break handling for Intel 82571 by Aristeu Rozanski on Friday, May 30, 2008 - 12:25 pm. (1 message)

Next thread: [git pull] Input updates for 2.6.26-rc4 by Dmitry Torokhov on Friday, May 30, 2008 - 12:41 pm. (1 message)