Saw this oops on my test machine this morning. I rebooted the machine
last night and hadn't done anything on it other than log in this
morning. The kernel here is based on Steve French's git tree, which is
based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:
commit 45d7f32c7a43cbb9592886d38190e379e2eb2226
Merge: 53bcef6 ab11b48
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun Aug 8 10:10:11 2010 -0700
Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
I also have some cifs patches in this kernel, but the cifs module
wasn't even plugged in at the time, and the patches don't affect
anything else. The host is a KVM guest. Let me know if you need other
info:
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 0
Modules linked in: nfsd lockd nfs_acl exportfs rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc ipv6 i2c_piix4 virtio_net i2c_core virtio_balloon floppy joydev pcspkr microcode virtio_blk virtio_pci virtio_ring virtio [last unloaded: mperf]
Pid: 2708, comm: gzip Not tainted 2.6.35+ #1 /
RIP: 0010:[<ffffffff81223830>] [<ffffffff81223830>] __call_for_each_cic+0x21/0x3f
RSP: 0018:ffff88003cea1e38 EFLAGS: 00010202
RAX: 00000001012070a8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff88003ab1ce80
RDX: 00000001012070ab RSI: ffff8800047d1260 RDI: 0000000000000286
RBP: ffff88003cea1e58 R08: 0000000000000286 R09: ffff88003cea1da8
R10: ffff88003a75a9e8 R11: ffff88003cea1e08 R12: ffff88003a75a9c0
R13: ffffffff8122387c R14: ffff88003e678000 R15: 0000000000000001
FS: 00007f6f4dc8b720(0000) GS:ffff880004600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000032c80a92c0 CR3: 0000000001a43000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gzip (pid: 2708, threadinfo ffff88003cea0000, task ffff88003cf5a440)
Stack:
...This looks a lot like this bug: https://bugzilla.redhat.com/show_bug.cgi?id=577968 See also: http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&... It's been around since 2.6.30.8 according to kerneloops.org. If you find that you have a reliable way of reproducing the issue, that would be great. Cheers, --
On Tue, 10 Aug 2010 10:22:41 -0400 Ok, thanks -- no clear reproducer so far. This morning was the first time I've seen it and it was on the console of my rawhide machine. The last thing I did with it was reboot it last night. I suspect that the gzip process came from a cron job or something. -- Jeff Layton <jlayton@redhat.com> --
What version did you hit it on? -- Jens Axboe --
On Tue, 10 Aug 2010 12:10:05 -0400 It was a kernel built out of git, based on Steve French's git tree. The last commit from Linus in it was 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of that was patches that only touched cifs code. cifs.ko hadn't been plugged in since it was rebooted. -- Jeff Layton <jlayton@redhat.com> --
OK. That bug is pretty elusive, so far I haven't been able to figure out what the heck is going on here and my attempts at reproducing have all failed. The reports so far seem to have the cron component in common. Does fedora ionice some cron jobs or anything like that? Or use CLONE_IO? -- Jens Axboe --
On Tue, 10 Aug 2010 19:58:41 -0400 Yes. I sort of doubt anything there would use CLONE_IO, but ionice is definitely used. Fedora uses anacron. I don't see any explicit calls to gzip in there, but it's possible something else is calling it: # grep ionice /etc/cron.*/* /etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1 /etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1 # cat /etc/anacrontab # /etc/anacrontab: configuration file for anacron # See anacron(8) and anacrontab(5) for details. SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root # the maximal random delay added to the base delay of the jobs RANDOM_DELAY=45 # the jobs will be started during the following hours only START_HOURS_RANGE=3-22 #period in days delay in minutes job-identifier command 1 5 cron.daily nice run-parts /etc/cron.daily 7 25 cron.weekly nice run-parts /etc/cron.weekly @monthly 45 cron.monthly nice run-parts /etc/cron.monthly -- Jeff Layton <jlayton@redhat.com> --
ionice must be a deciding factor in this, perhaps coupled with something else. Otherwise we would be seeing a lot more of these. -- Jens Axboe --
Well, what's really strange is that this is only affecting f14. I'm installing a system and I'll see if I can't reproduce it. Cheers, Jeff --
