Hi,
when I run 'ipcs' my system freeze up immediatelly.
I was able to get kernel BUG message once, I think it is not
printed out all the time it freeze.I tried on 2.6.24, 2.6.20 and 2.6.18.
I attached screenshot from 2.6.18 freeze and config.regards,
jirka
Could you please turn all the lock debugging options in your .config on
(most importantly CONFIG_PROVE_LOCKING, and all the other lock debugging
options might come handy too) and try again, to see if we get any debug
error messages? I'd guess that someone is holding mqueue_inode_info->lock
for too long or there is some AB-BA deadlock on it, which lockdep and
friends might be able to diagnose.--
Jiri Kosina
SUSE Labs--
I got more logs via netconsole, first I ran ipcs it segfaulted next run
the system freezed.
I attached also the current config.
Looks like you got a use-after free when lockdep was playing with a
spinlock which is taken in sys_shmctl() or one of its inlined callees.Does setting CONFIG_LOCKDEP=n prevent this from happening?
--
Did you run ipcs without any arguments? If so, it should only call
shmctl(SHM_INFO) and shmctl(SHM_STAT), so the only two spinlocks involved
should be either shmem_inode_info->lock in shm_get_stat() or
kern_ipc_perm->out_lock in ipc_lock().Could you please try what is the output with the attached patch below, so
that we know which spinlock buggers?Also, could you please provide strace of the first (segfaulting)
invocation of ipcs?BTW any idea where does the '2b' come from? (it's single-bit flip).
--
Jiri Kosina
SUSE Labs
--
Uhm, and hereby I attach the patch.
diff --git a/ipc/shm.c b/ipc/shm.c
index c47e872..2abfb70 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -615,6 +615,7 @@ static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
*rss += (HPAGE_SIZE/PAGE_SIZE)*mapping->nrpages;
} else {
struct shmem_inode_info *info = SHMEM_I(inode);
+ printk(KERN_DEBUG "info: %p\n", info);
spin_lock(&info->lock);
*rss += inode->i_mapping->nrpages;
*swp += info->swapped;
diff --git a/ipc/util.c b/ipc/util.c
index fd1b50d..5a0d4d2 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -662,6 +662,7 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)up_read(&ids->rw_mutex);
+ printk(KERN_DEBUG "out: %p\n", out);
spin_lock(&out->lock);/* ipc_rmid() may have already freed the ID while ipc_lock
--
I'll try the patch, so far I attached the ipcs strace and adjacent kernel logs.
Thanks for the strace.
The kernel oopses showing the soft lockup happen at the time of the first
segfaulting run of ipcs, or the second one that hangs?The oopses are different compared to the ones you posted previously --
those were clearly poiting to use-after-free, these show spinlock lockup.--
Jiri Kosina
SUSE Labs
--
ipcs.kernel.out.1 is the kernel log from the time strace ipcs was done.
ipcs.kernel.out.2 is from the next run of ipcs that stuck,
there's no overlay between those logs.When the kernel freeze I'm getting logs with the spinlock lockup.. and it's repeating.
--
Thanks. Have you tried turning lockdep off, if that still triggers?
Aparently the first BUG leaves the spin lock locked, and all subsequent
accessess to it lock the machine hard. We'll need to see what goes wrong
inside lockdep.Could you please run
addr2line -e vmlinux c0133034
(passing the path to the vmlinux binary of the crashing kernel to the -e
parameter).--
Jiri Kosina
SUSE Labs
--
I've rebuilt the kernel meanwhile, so dont know how accurate this output is:
/home/jirka/xxx/kernel/linux-2.6-stable/include/asm/atomic_32.h:97
Anyway, I'll run the test again with your patch, and then we can run the addrline again...
I'll try to turn the lockdep off as well.
--
I applied you diff and ran the test again..
your patch with LOCKDEP enabled:
ipcs.strace.1 - strace of ipcs from the first run
ipcs.kernel.out.1 - kernel logs from 1st strace ipcs run
ipcs.kernel.out.2 - kernel logs from 2nd strace ipcs run - system freezedyour patch with LOCKDEP disabled:
ipcs.kernel.out-NO_LOCKDEP.1 - kernel logs from 1st strace ipcs run - system freezed
maybe I overlooked smth, but following is the only way I found
how to disable LOCKDEP, hope it is fine... let me know:diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80b7ba4..2bfd8dc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -43,7 +43,7 @@ config GENERIC_CLOCKEVENTS_BROADCASTconfig LOCKDEP_SUPPORT
bool
- default y
+ default nconfig STACKTRACE_SUPPORT
bool
OK, so the actual NULL pointer dereference happens inside atomic_inc(),
during incrementing of counter->v (as %esi == 0xffffffff). So apparently
someone corrupted spinlock_t for shmem_inode_info.lock, which is being
obtained in shm_get_stat().--
Jiri Kosina
SUSE Labs
--
I'm striping down the config to hopefully find out the cause...
anything else I can try? besides code study :)thanks
--
I found a configuration which works for me, attaching both configs, bad and good one
hope this helps
how weird. Lots of people are surely running ipcs and not seeing this.
Can you suggest what's different about your setup?
--
no idea :) I noticed like 2 weeks ago... I probably changed some config,
but I dont remember what it was, I'll try to figure out.Anyway I can reproduce this immediatelly, so I can run any diag you'd need to see.
jirka
--
I'll try to get the netconsole..
The trigger is simple just type ipcs and press enter :)
I noticed that running from xterm freeze all the time.
It wont freeze up sometimes running directly from terminal.I also tried to reproduce it in qemu but with no luck.
--
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Stoyan Gaydarov | From 2.4 to 2.6 to 2.7? |
| Andi Kleen | [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu |
| Greg Kroah-Hartman | [PATCH 013/196] Documentation: Replace obsolete "driverfs" with "sysfs". |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: HTB accuracy for high speed |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [GIT]: Networking |
