Hi Andrew, I was running kernbench on top of 2.6.22-rc6-mm1 and I got a Hangcheck alert (This is when kernbench reached make -j). Also make -j is hanging. (Also checked, it doesn't hang on 2.6.22) This is my /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) MP CPU 2.50GHz stepping : 5 cpu MHz : 2500.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts sync_rdtsc cid xtpr bogomips : 4982.85 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) MP CPU 2.50GHz stepping : 5 cpu MHz : 2500.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts sync_rdtsc cid xtpr bogomips : 4976.29 clflush size : 64 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) MP CPU 2.50GHz stepping : 5 cpu MHz : 2500.000 cache size : 1024 KB physical id : 1 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts sync_rdtsc cid xtpr bogomips : 4976.25 clflush size : 64 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) MP CPU 2.50GHz stepping : 5 cpu MHz : ...
hm, never had a report of that before. It's the first time I've seen hangcheck produce anything useful, frankly. Please try to capture the full sysrq-T output when it is hung. -
Available at http://dhaval.giani.googlepages.com/sysrq-t-trace.bz2 In the meantime I will go and check if it was there in 2.6.22-rc4-mm2 -- regards, Dhaval I would like to change the world but they don't give me the source code! -
Hi Andrew, It is hanging with 2.6.22-rc4-mm2 as well as on the latest git on kernel.org (2.6.22-git10). I will get back to you with more information as soon as I have it. Thanks -- regards, Dhaval I would like to change the world but they don't give me the source code! -
Hi Andrew,
I've got a crash dump and stack traces. They are as follows (The trace
is on 2.6.22-git10)
(gdb) thread 1
[Switching to thread 1 (process 8096)]#0 delay_tsc (loops=1)
at include/asm/msr.h:64
64 {
(gdb) bt
#0 delay_tsc (loops=1) at include/asm/msr.h:64
#1 0xc0245130 in __delay (loops=Variable "loops" is not available.
) at arch/i386/lib/delay.c:74
#2 0xc0247115 in __spin_lock_debug (lock=0xc0564480)
at lib/spinlock_debug.c:111
#3 0xc02471cc in _raw_spin_lock (lock=0xc0564480) at lib/spinlock_debug.c:132
#4 0xc041ad3e in _spin_lock_irq (lock=0xc0564480) at kernel/spinlock.c:105
#5 0xc015ff2c in shrink_active_list (nr_pages=32, zone=0xc0563300,
sc=0xd65b3e60, priority=5) at mm/vmscan.c:926
#6 0xc01602a3 in shrink_zone (priority=5, zone=0xc0563300, sc=0xd65b3e60)
at mm/vmscan.c:1044
#7 0xc016036c in shrink_zones (priority=5, zones=0xc056584c, sc=0xd65b3e60)
at mm/vmscan.c:1101
#8 0xc0160488 in try_to_free_pages (zones=0xc056584c, order=Variable "order" is not available.
)
at mm/vmscan.c:1153
#9 0xc015c190 in __alloc_pages (gfp_mask=688338, order=0, zonelist=0xc0565848)
at mm/page_alloc.c:1336
#10 0xc0165285 in do_anonymous_page (mm=0xe4498280, vma=0xd3afef3c,
address=3083890688, page_table=0xd65ca838, pmd=0xe3e33df0, write_access=1)
at include/linux/gfp.h:100
#11 0xc0165a58 in __handle_mm_fault (mm=0xe4498280, vma=0xd3afef3c,
address=3083890688, write_access=1) at mm/memory.c:2549
#12 0xc041c984 in do_page_fault (regs=0xd65b3fb8, error_code=6)
at include/linux/mm.h:776
#13 0xc041b37a in error_code () at include/linux/sched.h:13
#14 0x0000006c in ?? ()
#15 0x0000001b in ?? ()
#16 0x00000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (process 7371)]#0 __spin_lock_debug (lock=0xc0564480)
at include/asm/spinlock.h:88
88 {
(gdb) bt
#0 __spin_lock_debug (lock=0xc0564480) at include/asm/spinlock.h:88
#1 0xc02471cc in _raw_spin_lock (lock=0xc0564480) at lib/spinlock_debug.c:132
#2 ...Looks interesting. -- regards, Dhaval I would like to change the world but they don't give me the source code! -
Softlockup is broken in 2.6.22.
=======================================================================
Subject: fix the softlockup watchdog to actually work
From: Ingo Molnar <mingo@elte.hu>
this Xen related commit:
commit 966812dc98e6a7fcdf759cbfa0efab77500a8868
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Tue May 8 00:28:02 2007 -0700
Ignore stolen time in the softlockup watchdog
broke the softlockup watchdog to never report any lockups. (!)
print_timestamp defaults to 0, this makes the following condition
always true:
if (print_timestamp < (touch_timestamp + 1) ||
and we'll in essence never report soft lockups.
apparently the functionality of the soft lockup watchdog was never
actually tested with that patch applied ...
[this is -stable material too.]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/softlockup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -79,10 +79,11 @@ void softlockup_tick(void)
print_timestamp = per_cpu(print_timestamp, this_cpu);
/* report at most once a second */
- if (print_timestamp < (touch_timestamp + 1) ||
- did_panic ||
- !per_cpu(watchdog_task, this_cpu))
+ if ((print_timestamp >= touch_timestamp &&
+ print_timestamp < (touch_timestamp + 1)) ||
+ did_panic || !per_cpu(watchdog_task, this_cpu)) {
return;
+ }
/* do not print during early bootup: */
if (unlikely(system_state != SYSTEM_RUNNING)) {
-
Hi Chuck, The system does not hang on 2.6.22. It is however still hanging with yesterday's git plus Ingo's softlockup patch. -- regards, Dhaval I would like to change the world but they don't give me the source code! -
[refer http://marc.info/?l=linux-kernel&m=118474574807055 for complete report of this bug] Ingo, Dhaval tracked the root cause of this problem to be in cfs (btw cfs patches weren't git-bisect safe). Basically, "make -s -j" workload hanged the machine, leading to lot of OOM killings. This was on a 8-cpu machine with no swap space configured and 4GB RAM. The same workload works "fine" (runs to completion) on 2.6.22. I played with the scheduler tunables a bit and found that the problem goes away if I set sched_granularity_ns to 100ms (default value 32ms). So my theory is this: 32ms preemption granularity is too low value for any compile thread to make "usefull" progress. As a result of this rapid context switch, job retiral rate slows down compared to job arrival rate. This builds up job pressure on the system very quickly (than may have happened with 100ms default granularity_ns or 2.6.22 kernel), value, this may be seen as a regression. Perhaps, these new tunables in cfs are something for users to become used to and tune it to approp setting for their system. It would have been nice for kernel to auto-tune the settings based on workload, but I guess that's harder. -- Regards, vatsa -
while i agree that the 32msec was too low, i think the problem is that "make -s -j" is a workload that has no guarantee of "success" on that system. The box does not have enough RAM to service it and does not have enough swap to survive it. In make -j, jobs are started without any throttling whatsoever. _Any_ control mechanism within the kernel can act as an "accidental throttle": for example IO could artificially slow it down to reduce job rate and keep RAM usage below the critical level. Or a kernel bug could cause tasks to be delayed and thus let the make -j "succeed". Or some bad kernel inefficiency in sys_fork() could have this effect too. It is very important that we dont look at every random number that a system can produce as a "benchmark", we really have to yep - 32msecs was too low, please try -rc1 too: i've increased the granularity limit so it should be larger than 32ms. Reduce CONFIG_HZ as By increasing the granularity the timings change - one can imagine workloads where _reducing_ the granularity would result in an effective throttling of the workload. I'm sure a workload could be constructed on the old scheduler too where its 100 msecs isnt enough either, only 200msecs would help. That thinking never ends - you cannot tune non-throttled workloads. We've got to be really careful about this. Ingo -
