login
Header Space

 
 

Re: [BUG] soft lockup detected with ipcs

Previous thread: [PATCH 2.6.25-rc3] processor_idle: fixed CPU 'active state' stably remain C0 by cat /proc by youquan_song on Friday, February 29, 2008 - 5:42 am. (1 message)

Next thread: [PATCH] VIRTIO NET: Enable netpoll interface for netconsole logging by Amit Shah on Friday, February 29, 2008 - 6:54 am. (3 messages)
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, February 29, 2008 - 6:46 am

Hi,

when I run 'ipcs' my system freeze up immediatelly.
I was able to get kernel BUG message once, I think it is not
printed out all the time it freeze.

I tried on 2.6.24, 2.6.20 and 2.6.18.
I attached screenshot from 2.6.18 freeze and config.

regards,
jirka
To: Jiri Olsa <olsajiri@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, February 29, 2008 - 7:43 am

Could you please turn all the lock debugging options in your .config on 
(most importantly CONFIG_PROVE_LOCKING, and all the other lock debugging 
options might come handy too) and try again, to see if we get any debug 
error messages? I'd guess that someone is holding mqueue_inode_info-&gt;lock 
for too long or there is some AB-BA deadlock on it, which lockdep and 
friends might be able to diagnose.

-- 
Jiri Kosina
SUSE Labs

--
To: Jiri Kosina <jkosina@...>, Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Sunday, March 2, 2008 - 1:45 pm

I got more logs via netconsole, first I ran ipcs it segfaulted next run
the system freezed.
I attached also the current config.
To: Jiri Olsa <olsajiri@...>
Cc: Jiri Kosina <jkosina@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Sunday, March 2, 2008 - 4:02 pm

Looks like you got a use-after free when lockdep was playing with a
spinlock which is taken in sys_shmctl() or one of its inlined callees.

Does setting CONFIG_LOCKDEP=n prevent this from happening?
--
To: Jiri Olsa <olsajiri@...>, Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Sunday, March 2, 2008 - 5:23 pm

Did you run ipcs without any arguments? If so, it should only call 
shmctl(SHM_INFO) and shmctl(SHM_STAT), so the only two spinlocks involved 
should be either shmem_inode_info-&gt;lock in shm_get_stat() or 
kern_ipc_perm-&gt;out_lock in ipc_lock().

Could you please try what is the output with the attached patch below, so 
that we know which spinlock buggers?

Also, could you please provide strace of the first (segfaulting) 
invocation of ipcs?


BTW any idea where does the '2b' come from? (it's single-bit flip).

-- 
Jiri Kosina
SUSE Labs
--
To: Jiri Olsa <olsajiri@...>, Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Sunday, March 2, 2008 - 5:26 pm

Uhm, and hereby I attach the patch.

diff --git a/ipc/shm.c b/ipc/shm.c
index c47e872..2abfb70 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -615,6 +615,7 @@ static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
 			*rss += (HPAGE_SIZE/PAGE_SIZE)*mapping-&gt;nrpages;
 		} else {
 			struct shmem_inode_info *info = SHMEM_I(inode);
+			printk(KERN_DEBUG "info: %p\n", info);
 			spin_lock(&amp;info-&gt;lock);
 			*rss += inode-&gt;i_mapping-&gt;nrpages;
 			*swp += info-&gt;swapped;
diff --git a/ipc/util.c b/ipc/util.c
index fd1b50d..5a0d4d2 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -662,6 +662,7 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
 
 	up_read(&amp;ids-&gt;rw_mutex);
 
+	printk(KERN_DEBUG "out: %p\n", out);
 	spin_lock(&amp;out-&gt;lock);
 	
 	/* ipc_rmid() may have already freed the ID while ipc_lock
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 8:00 am

I'll try the patch, so far I attached the ipcs strace and adjacent kernel logs.
To: Jiri Olsa <olsajiri@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 8:06 am

Thanks for the strace.

The kernel oopses showing the soft lockup happen at the time of the first 
segfaulting run of ipcs, or the second one that hangs?

The oopses are different compared to the ones you posted previously -- 
those were clearly poiting to use-after-free, these show spinlock lockup.

-- 
Jiri Kosina
SUSE Labs
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 8:19 am

ipcs.kernel.out.1 is the kernel log from the time strace ipcs was done.
ipcs.kernel.out.2 is from the next run of ipcs that stuck,
there's no overlay between those logs.

When the kernel freeze I'm getting logs with the spinlock lockup.. and it's repeating.
--
To: Jiri Olsa <olsajiri@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 8:28 am

Thanks. Have you tried turning lockdep off, if that still triggers?

Aparently the first BUG leaves the spin lock locked, and all subsequent 
accessess to it lock the machine hard. We'll need to see what goes wrong 
inside lockdep.

Could you please run

	addr2line -e vmlinux c0133034

(passing the path to the vmlinux binary of the crashing kernel to the -e 
parameter).

-- 
Jiri Kosina
SUSE Labs
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 8:38 am

I've rebuilt the kernel meanwhile, so dont know how accurate this output is:

/home/jirka/xxx/kernel/linux-2.6-stable/include/asm/atomic_32.h:97

Anyway, I'll run the test again with your patch, and then we can run the addrline again...
I'll try to turn the lockdep off as well.
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 10:04 am

I applied you diff and ran the test again..


your patch with LOCKDEP enabled:

ipcs.strace.1		- strace of ipcs from the first run
ipcs.kernel.out.1 	- kernel logs from 1st strace ipcs run
ipcs.kernel.out.2 	- kernel logs from 2nd strace ipcs run - system freezed


your patch with LOCKDEP disabled:

ipcs.kernel.out-NO_LOCKDEP.1 - kernel logs from 1st strace ipcs run - system freezed


maybe I overlooked smth, but following is the only way I found 
how to disable LOCKDEP, hope it is fine... let me know:

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80b7ba4..2bfd8dc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -43,7 +43,7 @@ config GENERIC_CLOCKEVENTS_BROADCAST
 
 config LOCKDEP_SUPPORT
        bool
-       default y
+       default n
 
 config STACKTRACE_SUPPORT
        bool
To: Jiri Olsa <olsajiri@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Monday, March 3, 2008 - 10:45 am

OK, so the actual NULL pointer dereference happens inside atomic_inc(), 
during incrementing of counter-&gt;v (as %esi == 0xffffffff). So apparently 
someone corrupted spinlock_t for shmem_inode_info.lock, which is being 
obtained in shm_get_stat().

-- 
Jiri Kosina
SUSE Labs
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Tuesday, March 4, 2008 - 5:56 am

I'm striping down the config to hopefully find out the cause...
anything else I can try? besides code study :)

thanks
--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Nadia Derbey <Nadia.Derbey@...>, <peifferp@...>
Date: Tuesday, March 4, 2008 - 8:57 am

I found a configuration which works for me, attaching both configs, bad and good one
hope this helps
To: Jiri Olsa <olsajiri@...>
Cc: <linux-kernel@...>
Date: Friday, February 29, 2008 - 6:57 am

how weird.  Lots of people are surely running ipcs and not seeing this.

Can you suggest what's different about your setup?
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, February 29, 2008 - 7:01 am

no idea :) I noticed like 2 weeks ago... I probably changed some config,
but I dont remember what it was, I'll try to figure out.

Anyway I can reproduce this immediatelly, so I can run any diag you'd need to see.

jirka




--
To: Jiri Olsa <olsajiri@...>
Cc: <linux-kernel@...>
Date: Friday, February 29, 2008 - 7:29 am

[Empty message]
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, February 29, 2008 - 7:35 am

I'll try to get the netconsole..

The trigger is simple just type ipcs and press enter :)
I noticed that running from xterm freeze all the time.
It wont freeze up sometimes running directly from terminal.

I also tried to reproduce it in qemu but with no luck.
--
Previous thread: [PATCH 2.6.25-rc3] processor_idle: fixed CPU 'active state' stably remain C0 by cat /proc by youquan_song on Friday, February 29, 2008 - 5:42 am. (1 message)

Next thread: [PATCH] VIRTIO NET: Enable netpoll interface for netconsole logging by Amit Shah on Friday, February 29, 2008 - 6:54 am. (3 messages)
speck-geostationary