Hi, when I run 'ipcs' my system freeze up immediatelly. I was able to get kernel BUG message once, I think it is not printed out all the time it freeze. I tried on 2.6.24, 2.6.20 and 2.6.18. I attached screenshot from 2.6.18 freeze and config. regards, jirka
Could you please turn all the lock debugging options in your .config on (most importantly CONFIG_PROVE_LOCKING, and all the other lock debugging options might come handy too) and try again, to see if we get any debug error messages? I'd guess that someone is holding mqueue_inode_info->lock for too long or there is some AB-BA deadlock on it, which lockdep and friends might be able to diagnose. -- Jiri Kosina SUSE Labs --
I got more logs via netconsole, first I ran ipcs it segfaulted next run the system freezed. I attached also the current config.
Looks like you got a use-after free when lockdep was playing with a spinlock which is taken in sys_shmctl() or one of its inlined callees. Does setting CONFIG_LOCKDEP=n prevent this from happening? --
Did you run ipcs without any arguments? If so, it should only call shmctl(SHM_INFO) and shmctl(SHM_STAT), so the only two spinlocks involved should be either shmem_inode_info->lock in shm_get_stat() or kern_ipc_perm->out_lock in ipc_lock(). Could you please try what is the output with the attached patch below, so that we know which spinlock buggers? Also, could you please provide strace of the first (segfaulting) invocation of ipcs? BTW any idea where does the '2b' come from? (it's single-bit flip). -- Jiri Kosina SUSE Labs --
Uhm, and hereby I attach the patch.
diff --git a/ipc/shm.c b/ipc/shm.c
index c47e872..2abfb70 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -615,6 +615,7 @@ static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
*rss += (HPAGE_SIZE/PAGE_SIZE)*mapping->nrpages;
} else {
struct shmem_inode_info *info = SHMEM_I(inode);
+ printk(KERN_DEBUG "info: %p\n", info);
spin_lock(&info->lock);
*rss += inode->i_mapping->nrpages;
*swp += info->swapped;
diff --git a/ipc/util.c b/ipc/util.c
index fd1b50d..5a0d4d2 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -662,6 +662,7 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
up_read(&ids->rw_mutex);
+ printk(KERN_DEBUG "out: %p\n", out);
spin_lock(&out->lock);
/* ipc_rmid() may have already freed the ID while ipc_lock
--I'll try the patch, so far I attached the ipcs strace and adjacent kernel logs.
Thanks for the strace. The kernel oopses showing the soft lockup happen at the time of the first segfaulting run of ipcs, or the second one that hangs? The oopses are different compared to the ones you posted previously -- those were clearly poiting to use-after-free, these show spinlock lockup. -- Jiri Kosina SUSE Labs --
ipcs.kernel.out.1 is the kernel log from the time strace ipcs was done. ipcs.kernel.out.2 is from the next run of ipcs that stuck, there's no overlay between those logs. When the kernel freeze I'm getting logs with the spinlock lockup.. and it's repeating. --
Thanks. Have you tried turning lockdep off, if that still triggers? Aparently the first BUG leaves the spin lock locked, and all subsequent accessess to it lock the machine hard. We'll need to see what goes wrong inside lockdep. Could you please run addr2line -e vmlinux c0133034 (passing the path to the vmlinux binary of the crashing kernel to the -e parameter). -- Jiri Kosina SUSE Labs --
I've rebuilt the kernel meanwhile, so dont know how accurate this output is: /home/jirka/xxx/kernel/linux-2.6-stable/include/asm/atomic_32.h:97 Anyway, I'll run the test again with your patch, and then we can run the addrline again... I'll try to turn the lockdep off as well. --
I applied you diff and ran the test again..
your patch with LOCKDEP enabled:
ipcs.strace.1 - strace of ipcs from the first run
ipcs.kernel.out.1 - kernel logs from 1st strace ipcs run
ipcs.kernel.out.2 - kernel logs from 2nd strace ipcs run - system freezed
your patch with LOCKDEP disabled:
ipcs.kernel.out-NO_LOCKDEP.1 - kernel logs from 1st strace ipcs run - system freezed
maybe I overlooked smth, but following is the only way I found
how to disable LOCKDEP, hope it is fine... let me know:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80b7ba4..2bfd8dc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -43,7 +43,7 @@ config GENERIC_CLOCKEVENTS_BROADCAST
config LOCKDEP_SUPPORT
bool
- default y
+ default n
config STACKTRACE_SUPPORT
boolOK, so the actual NULL pointer dereference happens inside atomic_inc(), during incrementing of counter->v (as %esi == 0xffffffff). So apparently someone corrupted spinlock_t for shmem_inode_info.lock, which is being obtained in shm_get_stat(). -- Jiri Kosina SUSE Labs --
I'm striping down the config to hopefully find out the cause... anything else I can try? besides code study :) thanks --
I found a configuration which works for me, attaching both configs, bad and good one hope this helps
how weird. Lots of people are surely running ipcs and not seeing this. Can you suggest what's different about your setup? --
no idea :) I noticed like 2 weeks ago... I probably changed some config, but I dont remember what it was, I'll try to figure out. Anyway I can reproduce this immediatelly, so I can run any diag you'd need to see. jirka --
I'll try to get the netconsole.. The trigger is simple just type ipcs and press enter :) I noticed that running from xterm freeze all the time. It wont freeze up sometimes running directly from terminal. I also tried to reproduce it in qemu but with no luck. --
| Jeff Garzik | Re: Wasting our Freedom |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Mathieu Desnoyers | [RFC patch 08/18] cnt32_to_63 should use smp_rmb() |
| Richard Hughes | Add INPUT support to toshiba_acpi |
git: | |
| Jan | [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file |
| Elijah Newren | Trying to use git-filter-branch to compress history by removing large, obsolete bi... |
| Thomas Koch | is gitosis secure? |
| Matthieu Moy | git push to a non-bare repository |
| frantisek holop | booting openbsd on eee without cd-rom |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Otto Moerbeek | Re: identifying sparse files and get ride of them trick available? |
| Renaud Allard | very weak bridge performance |
| Linux Kernel Mailing List | [ALSA] hda: Added new IDT codec family |
| Linux Kernel Mailing List | usb-storage: clean up unusual_devs.h |
| Linux Kernel Mailing List | USB: Enhance usage of pm_message_t |
| Linux Kernel Mailing List | resource: allow MMIO exclusivity for device drivers |
