The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas as well as with a client users can install to auto-submit oopses. Below is a top 10 list of the traces collected in the last 7 days. (Reports prior to 2.6.23 have been omitted in collecting the top 10) This week, a total of 3670 oopses and warnings have been reported, compared to 3029 reports in the previous week. In addition to Fedora, Debian now has included the client application in their default GUI install targets, thanks a lot for that! This week, based on feedback, I've split the report into "untainted" and "caused by proprietary drivers". Let me know if I should continue doing this or if the old format was better. As an experiment (on request) I've exported the database to text files (one file per report) and stuck it in a git repository. You can take a look with git clone git://www.kerneloops.org/ Suggestions for improving the format of this are obviously very welcome, as are "yes useful" and "no not useful" comments. Again, this is an experiment, if it's not seen as useful I may discontinue it. Per file statistics 1427 kernel/sysctl.c 238 fs/sysfs/dir.c 206 fs/buffer.c 167 security/selinux/hooks.c 84 kernel/spinlock.c 53 net/mac80211/main.c 48 mm/highmem.c 30 net/core/sock.c 26 net/bluetooth/rfcomm/sock.c 26 drivers/media/video/saa7134/saa7134-cards.c 24 mm/rmap.c 23 kernel/softirq.c Seen with untainted systems --------------------------- Rank 2: sysfs_add_one (warning) Reported 243 times (759 total reports) Duplicated sysfs entries, various drivers including USB This warning was last seen in version 2.6.26-rc3, and first seen in 2.6.24-rc6. More info: http://www.kerneloops.org/searchweek.php?search=sysfs_add_one Rank 3: mark_buffer_dirty (warning) Reported 222 times (759 total reports) EXT3 bug while hot-removing a USB device This warning was last seen in version 2.6.25.3, and first seen in ...
It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that part. You can see it from the code portion: the "<0f> 0b" gives it away (that's the ud2 opcode). There's two BUG_ON()'s in that function, and I think it's the second one, based on at least the code generation that my particular compiler version gets. IOW, it would be the BUG_ON(list_empty(&page_address_pool)); thing. Why would we run out of the page-address pool? Or perhaps the right question is what actually protects us from _not_ running out? We seem to depend on the page_address_pool always being in sync with the pkmap_count[] array, but the fact is, they are not protected by the same locks. The array is protected by kmap_lock, and the page_address_pool is protected by the "pool_lock". And even if they were to nest properly (I don't think they do), we actually do the list_empty(&page_address_pool) outside the pool lock, so... I dunno. That code is really messy. Why does it have two locks for the data structures when it then seems to absolutely require that they are always coherent? And if we want to have separate locks, we cannot require that they are in lock-step, perhaps we should have more pages in the page_address_pool than strictly required since they may not be 1:1? I do hate that mm/highmem.c mess, but I also wonder what made it start to trigger if it's a bug there. That code hasn't changed in ages, afaik. I don't think this is Hugh's fault, but on the other hand I think it would be great if Hugh looked at it. I think most of that code predates even the BK repo - because I'm not finding any history for it even in the historical archives. Who dares look at it? Linus --
I've seen it a few more times the last few weeks, I'll dig into how that is happening. Maybe we changed the bug_on text to miss my regexps ;( (it's only about 1000 lines of perl, so what can go wrong in that ;-) --
ok it was a bug I already fixed a few days ago; any reports from the last 2 or 3 days shouldn't have this. --
ok for some it did gather this information, and it is kernel BUG at mm/highmem.c:319! --
That's just _odd_. The call chain actually has kmap() in it, and kmap does: if (!PageHighMem(page)) return page_address(page); return kmap_high(page); so if it's the one at line 319, which says BUG_ON(!PageHighMem(page)); then I wonder what happened to that PageHighMem() test of the page in between.. Ahh.. Not the same "page". It looks like it's in the flush_all_zero_pkmaps() path, and it's clearing some _other_ page in the pkmap table in order to make room for the new one. So the page that causes problems is from here: page = pte_page(pkmap_page_table[i]); rather than the one we're trying to map. Not that it explains the BUG_ON(). We should only insert page table entries into the pkmap_page_table[] array in map_new_virtual(), which in turn is only called from kmap_high(), which in turn means that *those* pages have also gine through the PageHighMem() test. So it sounds like we either - have corruption in pkmap_page_table[] - or pte_page() doesn't reverse mk_pte(page) propely, and one or the other is broken. Does anybody know if the fc9 x86-32 kernel is built with PAE enabled? Might this be another PAE bit-masking bug and thus possibly fixed by the PTE_MASK changes? Linus --
versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and these ones didn't, (they're all in the "2.6.25-14.fc9.i686" form) so this is a kernel without PAE. --
Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686, and with that many reports I'd have expected it from other kernels too. What was the previous popular fc9 kernel (I assume it was 2.6.25-based too?), and what changed? Linus --
On Fri, May 30, 2008 at 03:55:25PM -0700, Linus Torvalds wrote: > > > On Fri, 30 May 2008, Arjan van de Ven wrote: > > > > versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and > > these ones didn't, (they're all in the "2.6.25-14.fc9.i686" form) so > > this is a kernel without PAE. > > Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686, > and with that many reports I'd have expected it from other kernels too. > What was the previous popular fc9 kernel (I assume it was 2.6.25-based > too?), and what changed? -14 is the version that we released F9 with, which explains its popularity. -18 was the first update we pushed out within the first few days.. The earlier f9 builds were only beaten on by people testing our development tree, which is nowhere near as many as what jump on a proper release. Dave -- http://www.codemonkey.org.uk --
Though I've spent quite a while poring over it, I regret to say I haven't got much beyond the obvious with this BUG_ON(!PageHighMem) in set_page_address() called from flush_all_zero_pkmaps(). It appears to be a corruption of the start of the pkmap_page_table, but not a random corruption: entries of the form 0x378xxxxx through 0x37Bxxxxxx where they need to be 0x38xxxxxx or more to be highmem. (I say appears because the compiler is reusing %eax a lot, there's no trace on the stack or in registers of what pte was actually read.) In every case except the 17141 nfsd one, it's found at the start of the table, when flush_all_zero_pkmaps() is called for the very first time (I'm guessing that from the fact that they're all failing on the second entry, which preincrementation of the index made the first one used). Whereas 17141 nfsd finds a 0x00000xxx some way into the page table, quite possibly later on: may have a very different cause. Do we have any idea whether all or most of these come from a single machine? That would of course be a very different (less interesting) story from if they're spread out over lots of machines. I didn't notice anything suspicious in the Fedora patches to 2.6.25, but I haven't heard (Google hasn't shown) any such problem outside of these kerneloops from Fedora 9. Is it showing up on Rawhide at all? If so, then we could devise some debug to include in coming kernels to help shed more light on it. Veering off at a tangent away from the oops: I was rather sobered to see all those traces of execve using kmap, I thought we were avoiding kmap like BKL in common paths these days (though it is convenient for symlinks). Would a patch something like that below, copying the filemap.c trick, be welcome? Hugh --- 2.6.26-rc4/fs/exec.c 2008-05-26 20:00:39.000000000 +0100 +++ linux/fs/exec.c 2008-06-02 11:18:32.000000000 +0100 @@ -33,6 +33,7 @@ #include <linux/string.h> #include <linux/init.h> #include <linux/pagemap.h> +#include ...
On Tue, 3 Jun 2008 00:44:38 +0100 (BST) eek. /* * Are we running in atomic context? WARNING: this macro cannot * always detect atomic context; in particular, it cannot know about * held spinlocks in non-preemptible kernels. Thus it should not be * used in the general case to determine whether sleeping is possible. * Do not use in_atomic() in driver code. */ #define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE) --
Yes, that comment is all about how a common function cannot be expected to guess whether it's being called in atomic context or not; but we know that we don't have any spinlocks held here, therefore it's okay. Or do you consider fs/exec.c a driver, and shouldn't set bad example? It is exactly the test that do_page_fault() makes at the other end, when deciding whether it can handle the fault. Originally I had a bool atomic there instead. I switched over to testing in_atomic() itself because I had it mind to suggest another patch: it has long seemed wrong to me that we should have to disable preemption and fault handling there, when often (on many architectures, or on many pages) it's unnecessary. So I'd like to change (the various implementations of) kmap_atomic() to use pagefault_disable() only when the page actually is in highmem. Hugh --
Well, if you're sure.. I didn't look very closely (sorry), nor did you explain very closely. I think doing this sort of thing is OK in fs/exec.c from the should-we-be-doing-this-in there POV, but it should have suitable comments So... places like file_read_actor() would be given an open-coded pagefault_disable() so we preserve out implicit boolean-passing down to do_page_fault()? One of the reasons why we (I?) left kmap_atomic() doing pagefault_disable() for all pages was testing coverage: not many developers test with highmem nowadays so there's a high risk (almost a certainty) that people will start adding can-schedule code inside their kmap_atomic() regions. Probably it's not a terribly good reason... --
FYI, i stuck this into -tip for testing and after some time i started getting: [ 8.540917] Freeing unused kernel memory: 304k freed [ 12.368096] BUG: scheduling while atomic: ifup-eth/1820/0x10000001 [ 12.374144] Modules linked in: [ 12.377175] Pid: 1820, comm: ifup-eth Not tainted 2.6.26-rc5-00029-ga252672-dirty #3490 [ 12.384031] [<c0131a39>] __schedule_bug+0x59/0x60 [ 12.388031] [<c06b1375>] schedule+0x465/0x8c0 [ 12.392031] [<c013eecf>] ? update_process_times+0x4f/0x60 [ 12.396031] [<c013b50f>] ? irq_exit+0x3f/0x70 [ 12.400451] [<c012164b>] ? smp_apic_timer_interrupt+0x5b/0x90 [ 12.406248] [<c0117038>] ? apic_timer_interrupt+0x28/0x30 [ 12.411702] [<c0131a58>] __cond_resched+0x18/0x30 [ 12.416466] [<c06b1838>] _cond_resched+0x28/0x30 [ 12.421141] [<c03720bb>] strnlen_user+0x2b/0x60 [ 12.425728] [<c018dd53>] copy_strings+0x63/0x210 [ 12.430403] [<c018f986>] do_execve+0x176/0x200 [ 12.434903] [<c0372007>] ? strncpy_from_user+0x37/0x60 [ 12.440031] [<c0114ade>] sys_execve+0x2e/0x60 [ 12.444447] [<c01165ae>] sysenter_past_esp+0x6a/0x90 [ 12.449469] ======================= [ 12.736676] eth1: link down [ 12.736919] ADDRCONF(NETDEV_UP): eth1: link is not ready it would occur about every 10 bootups with the same config. Bisection led me to your patch. Ingo --
Right, that would be with CONFIG_PREEMPT_VOLUNTARY. Or in my case with CONFIG_DEBUG_SPINLOCK_SLEEP, strnlen_user's might_sleep gives BUG: sleeping function called from invalid context... At first I thought it was just falling foul of our zeal for might_sleep. But no, the warning is correct: the get_user(str) and strnlen_user(str) can perfectly well fault, but my suggested patch lets them be called with a kmap_atomic outstanding. I doubt it would be cost-effective to kunmap_atomic for each little string there. I don't see a quick and effective way to fix it up. I don't have the patience to go about adding get_user_inatomic and strnlen_user_inatomic, there's more urgent things to be doing. It would be nice to use a per-process kmap; or use an efficient one-page mapping in the exec'ers userspace; or maybe just having a kunmap_and_flush would help (to slow the cycling around pkmap page table), though it would still involve the global spinlock. Sorry, no quick and effective fix: please just drop the patch. Thanks, Hugh --
Hi Arjan, It seems like a strange coincidence that the first two entries have the same number of total reports. Is this really the case or is there something mixed up? All the best, Jochen -- http://seehuhn.de/ --
coincidence; I just recreated a new version of the report (so several hours later) and they're 1 appart now. --
This is a shining example of why people should avoid binary drivers. I'd guess that the bug is related to the new 64-bit capability code. It'd be really interesting to know what this driver is doing with capabilities in the first place. If anyone is using this driver, the output of the following command as a non-root user from gnome-terminal or similar may be of interest: $ cat /proc/self/status |grep ^Cap It should generally be all zeroes. - James -- James Morris <jmorris@namei.org> --
it's easy; it's making the user root via the following function:
void ATI_API_CALL KCL_PosixSecurityCapSetEffectiveVector(KCL_TYPE_Cap cap)
{
capt_t(current->cap_effective) = cap;
}
--
