Re: cfq: oops in __call_for_each_cic

Previous thread: [MeeGo-Dev][PATCH] Topcliff: Update PCH_IEEE1588 driver to 2.6.35 by Masayuki Ohtak on Tuesday, August 10, 2010 - 3:32 am. (25 messages)

Next thread: [MeeGo-Dev][PATCH] Topcliff: Update PCH_GPIO driver to 2.6.35 by Masayuki Ohtak on Tuesday, August 10, 2010 - 3:59 am. (2 messages)
From: Jeff Layton
Date: Tuesday, August 10, 2010 - 3:40 am

Saw this oops on my test machine this morning. I rebooted the machine
last night and hadn't done anything on it other than log in this
morning. The kernel here is based on Steve French's git tree, which is
based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:

commit 45d7f32c7a43cbb9592886d38190e379e2eb2226
Merge: 53bcef6 ab11b48
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Aug 8 10:10:11 2010 -0700

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile

I also have some cifs patches in this kernel, but the cifs module
wasn't even plugged in at the time, and the patches don't affect
anything else. The host is a KVM guest. Let me know if you need other
info:

general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 0 
Modules linked in: nfsd lockd nfs_acl exportfs rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc ipv6 i2c_piix4 virtio_net i2c_core virtio_balloon floppy joydev pcspkr microcode virtio_blk virtio_pci virtio_ring virtio [last unloaded: mperf]

Pid: 2708, comm: gzip Not tainted 2.6.35+ #1 /
RIP: 0010:[<ffffffff81223830>]  [<ffffffff81223830>] __call_for_each_cic+0x21/0x3f
RSP: 0018:ffff88003cea1e38  EFLAGS: 00010202
RAX: 00000001012070a8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff88003ab1ce80
RDX: 00000001012070ab RSI: ffff8800047d1260 RDI: 0000000000000286
RBP: ffff88003cea1e58 R08: 0000000000000286 R09: ffff88003cea1da8
R10: ffff88003a75a9e8 R11: ffff88003cea1e08 R12: ffff88003a75a9c0
R13: ffffffff8122387c R14: ffff88003e678000 R15: 0000000000000001
FS:  00007f6f4dc8b720(0000) GS:ffff880004600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000032c80a92c0 CR3: 0000000001a43000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gzip (pid: 2708, threadinfo ffff88003cea0000, task ffff88003cf5a440)
Stack:
 ...
From: Jeff Moyer
Date: Tuesday, August 10, 2010 - 7:22 am

This looks a lot like this bug:
  https://bugzilla.redhat.com/show_bug.cgi?id=577968

See also:
  http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&...

It's been around since 2.6.30.8 according to kerneloops.org.  If you
find that you have a reliable way of reproducing the issue, that would
be great.

Cheers,
--

From: Jeff Layton
Date: Tuesday, August 10, 2010 - 7:27 am

On Tue, 10 Aug 2010 10:22:41 -0400

Ok, thanks -- no clear reproducer so far. This morning was the
first time I've seen it and it was on the console of my rawhide
machine. The last thing I did with it was reboot it last night. I
suspect that the gzip process came from a cron job or something.

-- 
Jeff Layton <jlayton@redhat.com>
--

From: Jens Axboe
Date: Tuesday, August 10, 2010 - 9:10 am

What version did you hit it on?

-- 
Jens Axboe

--

From: Jeff Layton
Date: Tuesday, August 10, 2010 - 9:35 am

On Tue, 10 Aug 2010 12:10:05 -0400

It was a kernel built out of git, based on Steve French's git tree. The
last commit from Linus in it was
45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of
that was patches that only touched cifs code. cifs.ko hadn't been
plugged in since it was rebooted.

-- 
Jeff Layton <jlayton@redhat.com>
--

From: Jens Axboe
Date: Tuesday, August 10, 2010 - 4:58 pm

OK. That bug is pretty elusive, so far I haven't been able to figure
out what the heck is going on here and my attempts at reproducing
have all failed. The reports so far seem to have the cron component
in common. Does fedora ionice some cron jobs or anything like that?
Or use CLONE_IO?

-- 
Jens Axboe

--

From: Jeff Layton
Date: Tuesday, August 10, 2010 - 6:23 pm

On Tue, 10 Aug 2010 19:58:41 -0400

Yes. I sort of doubt anything there would use CLONE_IO, but ionice is
definitely used. Fedora uses anacron. I don't see any explicit calls to
gzip in there, but it's possible something else is calling it:

# grep ionice /etc/cron.*/*
/etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1
/etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1

# cat /etc/anacrontab 
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22

#period in days   delay in minutes   job-identifier   command
1	5	cron.daily		nice run-parts /etc/cron.daily
7	25	cron.weekly		nice run-parts /etc/cron.weekly
@monthly 45	cron.monthly		nice run-parts /etc/cron.monthly

-- 
Jeff Layton <jlayton@redhat.com>
--

From: Jens Axboe
Date: Wednesday, August 11, 2010 - 6:23 am

ionice must be a deciding factor in this, perhaps coupled with something
else. Otherwise we would be seeing a lot more of these.

-- 
Jens Axboe

--

From: Jeff Moyer
Date: Wednesday, August 11, 2010 - 8:41 am

Well, what's really strange is that this is only affecting f14.  I'm
installing a system and I'll see if I can't reproduce it.

Cheers,
Jeff
--

Previous thread: [MeeGo-Dev][PATCH] Topcliff: Update PCH_IEEE1588 driver to 2.6.35 by Masayuki Ohtak on Tuesday, August 10, 2010 - 3:32 am. (25 messages)

Next thread: [MeeGo-Dev][PATCH] Topcliff: Update PCH_GPIO driver to 2.6.35 by Masayuki Ohtak on Tuesday, August 10, 2010 - 3:59 am. (2 messages)