Greetings! Is it possible a faulty ath5k module to affect diverse parts of kernel causing different oopses to happen referencing things as different as cdrom_ioctl, find_vma, i915_gem_object_get_pages, get_vfs_caps_from_disk, warn_slowpath_common (ath5k_tasklet_rx), proc_lookup_de and _spin_lock(ext4_getattr) [soft lookup]? The task in which such errors happen is capturing packets with kismet during the night. The errors aren't easy to create, sometimes they've already happened when I check the computer in the morning and sometimes they require stirring up the computer a bit such as starting X. I memtested the machine during 7h and no error was detected. I've been trying different kernels but main one is 2.6.32-21-generic, Ubuntu flavour, with linux-backports-modules-wireless-lucid-generic installed. I would use 2.6.34-rc5 vanilla if shutdown and suspend worked ;) As such: a) is it "normal"? I thought modules were somewhat isolated these days. b) any advices for debugging? I've yet to have two oopses that are similar.... Thank you in Advance, -- Pedro --
It is certainly possible -- improper setup of DMA could be scribbling over memory that doesn't belong to ath5k. That sort of thing can be ugly to track-down... You mentioned Kismet. Do you experience this sort of problem when using ath5k for "normal" purposes (e.g. browsing the web)? Or only when using it to monitor the network? John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. --
Advice for debugging: turn on slub/slab debug options, and possibly kmemcheck. kmemcheck was very helpful for me last time I had such a corruption issue. -- Bob Copeland %% www.bobcopeland.com --
Do you have more than ~2.5-3.5GB of ram (enough to make some ram non-32-bit-DMA-accessible) with swiotlb enabled, if such are you seeing "DMA: Out of SW-IOMMU space" kernel messages? --
I've 2 GB RAM + 5GB swap (got fed up with hibernating not working sometimes), so no such errors. --
For some reason I am unable to get a kernel I compiled to boot. Since I was unable to compile the kernel with those debugging options, I just turned on slub_debug on GRUB. I got a few messages which I pasting here only partially to check for their significance. If significant I'll post them fully: [ 2658.663308] ============================================================================= [ 2658.663424] BUG kmalloc-4096: Poison overwritten [ 2658.663483] ----------------------------------------------------------------------------- [ 2658.663486] [ 2658.663606] INFO: 0xed3db0c0-0xed3db0cf. First byte 0xc4 instead of 0x6b [ 2658.663698] INFO: Allocated in ath_rxbuf_alloc+0x30/0xa0 [ath] age=7117 cpu=0 pid=0 [ 2658.663799] INFO: Freed in skb_release_data+0x70/0xa0 age=0 cpu=0 pid=0 [ 2658.663882] INFO: Slab 0xc15fbb00 objects=7 used=5 fp=0xed3db090 flags=0x400040c3 [ 2658.663975] INFO: Object 0xed3db090 @offset=12432 fp=0xed3da060 [ 2658.663977] [ 2658.664069] Bytes b4 0xed3db080: 00 00 00 00 61 ff 08 00 5a 5a 5a 5a 5a 5a 5a 5a ....a<FF>..ZZZZZZZZ [ 2658.664258] Object 0xed3db090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk (a lot of lines with memory contents omitted) [ 2658.667258] Redzone 0xed3dc090: bb bb bb bb <BB><BB><BB><BB> [ 2658.667258] Padding 0xed3dc0b8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [ 2658.667258] Pid: 0, comm: swapper Not tainted 2.6.32-21-generic #32-Ubuntu [ 2658.667258] Call Trace: [ 2658.667258] [<c01faf63>] print_trailer+0xd3/0x120 [ 2658.667258] [<c01fb07c>] check_bytes_and_report+0xcc/0xf0 [ 2658.667258] [<c01fc021>] check_object+0x1a1/0x1e0 [ 2658.667258] [<c01fcc98>] alloc_debug_processing+0xc8/0x190 (...) [ 2658.667258] FIX kmalloc-4096: Restoring 0xed3db0c0-0xed3db0cf=0x6b [ 2658.667258] [ 2658.667258] FIX kmalloc-4096: Marking all objects used [ 4689.941595] And I've two or three more similar ...
Yes, please post the whole message. -- Bob Copeland %% www.bobcopeland.com --
Please post all the info you have. Full crash log and full slub-debug report are both very interesting. Not sure what exactly linux-backports-modules-wireless-lucid-generic is, but one way to narrow it down is to NOT use that and see if it still crashes. You might find your distro config in /boot/config-*; you could copy that to .config in your kernel directory and try to compile (and boot) a more recent kernel with that. Good luck, Vegard --
Mashup of two different mails with identical info. I *believe* it includes compat-wireless. As I've yet to understand fully what is compat-wireless I'm following your I get an error booting as if I'd forgotten to compile my disk's controller; since I used the .config of the distro's kernel I doubt that (and I checked). Must be missing some hocus-pocus on using Ubuntu/Debian's `make-kpkg' . Ubuntu provides mainline kernels, I'm currently with the one of two or three days ago, since the current seems to cause X issues, my plan is to boot it (linux-image-2.6.34-999-generic_2.6.34-999.201004211003_i386) just for doing these tests. Will do ;) I left the three extra lines at the top to provide a "timeline" of the time between leaving kismet running and the time of the first error. [ 109.876670] device wlan1 entered promiscuous mode [ 109.882126] device wlan1mon entered promiscuous mode [ 311.656242] usb 5-1: USB disconnect, address 2 [ 2658.663308] ============================================================================= [ 2658.663424] BUG kmalloc-4096: Poison overwritten [ 2658.663483] ----------------------------------------------------------------------------- [ 2658.663486] [ 2658.663606] INFO: 0xed3db0c0-0xed3db0cf. First byte 0xc4 instead of 0x6b [ 2658.663698] INFO: Allocated in ath_rxbuf_alloc+0x30/0xa0 [ath] age=7117 cpu=0 pid=0 [ 2658.663799] INFO: Freed in skb_release_data+0x70/0xa0 age=0 cpu=0 pid=0 [ 2658.663882] INFO: Slab 0xc15fbb00 objects=7 used=5 fp=0xed3db090 flags=0x400040c3 [ 2658.663975] INFO: Object 0xed3db090 @offset=12432 fp=0xed3da060 [ 2658.663977] [ 2658.664069] Bytes b4 0xed3db080: 00 00 00 00 61 ff 08 00 5a 5a 5a 5a 5a 5a 5a 5a ....a�..ZZZZZZZZ [ 2658.664258] Object 0xed3db090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2658.664443] Object 0xed3db0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2658.664629] Object 0xed3db0b0: 6b 6b 6b 6b 6b 6b 6b 6b ...
Ok there are 4 messages here, two of them are definitely 802.11 beacons, which would point the finger squarely at ath5k. We had reports of this some time ago but a few things got rewritten around that time. I'd guess These two are definitely beacons. -- Bob Copeland %% www.bobcopeland.com --
These are CTS frames. So I think ath5k is to blame too :/. regards, -- js --
Do note that's on a 2.6.32 kernel. I've however a new dmesg/syslog which on a 2.6.34-rc5-daily has the same kind of issues. It's 400kB in size..... Do you want me to send it to the mailing list? Or should I open a bug report on the kernel bugzilla and post them there? Or do both? -- Pedro --
Now live on... https://bugzilla.kernel.org/show_bug.cgi?id=15861 -- Pedro --
A Sábado, 24 de Abril de 2010 18:28:36 Pedro Francisco escreveu: -snip I am now able to compile my own kernels using `make all deb-pkg' on the vanilla source and additional hocus-pocus to create the initramfs, after having finally given up on using the Debian way (make-kpkg). A Sexta, 23 de Abril de 2010 17:37:03 me@bobcopeland.com escreveu: I can't seem to find the "kmemcheck: trap use of uninitialized memory" option in make menuconfig, section Kernel Hacking, though a search shows it should be there. Will it be useful if I do such tests now that I've opened a bug report with data from slab_debug? If so, how can I enable such feature? P.S.: considering I'm just planning on using the kernel for the tests, I don't bother enabling options which hurt interactivity, as long as a terminal works well. Thanks again, -- Pedro --
It was "CONFIG_FUNCTION_TRACER" which was on; incidently, such dependency doesn't appear in the "Depends on" line, had to do a web search. Will post the results tomorrow, unless there aren't any. -- Pedro --
Most likely you need to disable "Optimize for size", under "General setup" in the main menu to make it appear. But it could also be one of the other dependencies; check the "Depends on" line of KMEMCHECK when you do the search. Vegard --
Yes. A bug a kernel module could corrupt any memory. If you know a working version of the kernel try a git bisect otherwise just post your dmesg with the crash etc. regards, dan carpenter --
I don't. I doubt the dmesg will be of significance, as I said they're never the same. However I've posted the dmesgs here [ http://ubuntuforums.org/showthread.php?t=1457568 ] prior to asking for help in LKML. Kernel version is Ubuntu's 2.6.32, not Vanilla. I believe nothing significant can be extracted from them. Am I correct? -- Pedro --
Well they have the 'M' taint for "Machine Check Exception" so that probably means the hardware is failing. regards, --
Only on two of them. Since when I get a BUG: Unable to handle paging request the system seems to get stuck and on 100% CPU after a while the CPU overheats I assumed those Machine Check Exceptions were due to the overheating caused by the 100% CPU usage caused by the errors. If it is due to that, the CPU throttles to prevent further overheating so it shouldn't affect anything. -- Pedro --
