i have obsvered this bug from 2.6.27-rc1 to rc7
ok so i did git bisection which pointed:-
commit 3ed3f06295e69700fa808396f7b350bff2b69de0
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Wed Jun 4 01:00:47 2008 +0400
x86: nmi - consolidate nmi_watchdog_default for 32bit mode
64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
by default and doing the same on 32bit mode is safe too.
Such an action saves us from several #ifdef.
git -bisection log.
git-bisect start
# bad: [6e86841d05f371b5b9b86ce76c02aaee83352298] Linux 2.6.27-rc1
git-bisect bad 6e86841d05f371b5b9b86ce76c02aaee83352298
# good: [bce7f793daec3e65ec5c5705d2457b81fe7b5725] Linux 2.6.26
git-bisect good bce7f793daec3e65ec5c5705d2457b81fe7b5725
# bad: [d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0] V4L/DVB (8415):
gspca: Infinite loop in i2c_w() of etoms.
git-bisect bad d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0
# bad: [666484f0250db2e016948d63b3ef33e202e3b8d0] Merge branch
'core/softirq' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git-bisect bad 666484f0250db2e016948d63b3ef33e202e3b8d0
# bad: [d59fdcf2ac501de99c3dfb452af5e254d4342886] Merge commit
'v2.6.26' into x86/core
git-bisect bad d59fdcf2ac501de99c3dfb452af5e254d4342886
# good: [3de352bbd86f890dd0c5e1c09a6a1b0b29e0f8ce] Merge branch
'x86/mpparse' into x86/devel
git-bisect good 3de352bbd86f890dd0c5e1c09a6a1b0b29e0f8ce
# bad: [b4df32f4aeef8794d0135fc8dc250acb44cfee60] x86: fix warning in
e820_reserve_resources with 32bit
git-bisect bad b4df32f4aeef8794d0135fc8dc250acb44cfee60
# bad: [7f0be02c5ed1deb04c54c6a17f412e04f417df11] x86: move
boot_params declaring to setup.c
git-bisect bad 7f0be02c5ed1deb04c54c6a17f412e04f417df11
# bad: [4b62ac9a2b859f932afd5625362c927111b7dd9b] Merge branch
'x86/nmi' into x86/devel
git-bisect bad 4b62ac9a2b859f932afd5625362c927111b7dd9b
# bad: [f781b03c4b1c713ac000877c8bbc31fc4164a29b] x86:
touch_nmi_watchdog(): reset alert counters for supported nmi_watchdog
modes ...[Tej - Thu, Sep 25, 2008 at 08:05:23PM +0530] | On 9/11/08, Jason Wessel <jason.wessel@windriver.com> wrote: | > Tej wrote: | >> observed a panic on 2.6.27-rc6 caused by enabling | >> "CONFIG_KGDB_TESTS_ON_BOOT" option | >> | >> panic logged using serial console and .config are attached. | >> | >> Linux version 2.6.27-rc6 (root@luser-desktop) (gcc version 4.2.3 | > (Ubuntu 4.2.3-8 | > [clip] | >> kgdb: Registered I/O driver kgdbts. | >> kgdbts:RUN plant and detach test | >> kgdbts:RUN sw breakpoint test | >> kgdbts:RUN bad memory access test | >> kgdbts:RUN singlestep test 1000 iterations | >> kgdbts:RUN singlestep [0/1000] | >> kgdbts:RUN singlestep [100/1000] | >> kgdbts: BP mismatch c0109c02 expected c02ad250 | > | > | > | > So if we break this down into what it means, the kgdb test got an | > exception and stopped somewhere other than where it placed the | > breakpoint. In this case 0xc0109c02 which you can see right after the | > text "BP mismatch". | | i have obsvered this bug from 2.6.27-rc1 to rc7 | however the 2.6.26 was fine | | | > | > This address is in your stack trace below (read on) | | > | > | >> ------------[ cut here ]------------ | >> WARNING: at drivers/misc/kgdbts.c:302 check_and_rewind_pc+0xb2/0xe0() | >> Modules linked in: | >> Pid: 0, comm: swapper Not tainted 2.6.27-rc6 #16 | >> [<c0125d04>] warn_on_slowpath+0x54/0x70 | >> [<c01260f0>] ? __call_console_drivers+0x60/0x70 | >> [<c013c15b>] ? up+0x2b/0x40 | >> [<c0126634>] ? release_console_sem+0x1a4/0x1c0 | >> [<c01548f7>] ? __rmqueue_smallest+0xb7/0x130 | >> [<c02445e7>] ? __copy_to_user_ll+0x57/0x60 | >> [<c0153dc0>] ? probe_kernel_write+0x20/0x40 | >> [<c02ad250>] ? kgdbts_break_test+0x0/0x30 | >> [<c0126bab>] ? printk+0x1b/0x20 | >> [<c02ad250>] ? kgdbts_break_test+0x0/0x30 | >> [<c02adfb2>] check_and_rewind_pc+0xb2/0xe0 | >> [<c0109c02>] ? mwait_idle+0x32/0x40 | > | > | > So if you were to do: | > gdb ./vmlinux | > i line *0xc0109c02 | > | > It will print ...
[Cyrill Gorcunov - Thu, Sep 25, 2008 at 06:56:35PM +0400] | [Tej - Thu, Sep 25, 2008 at 08:05:23PM +0530] | | On 9/11/08, Jason Wessel <jason.wessel@windriver.com> wrote: | | > Tej wrote: | | >> observed a panic on 2.6.27-rc6 caused by enabling | | >> "CONFIG_KGDB_TESTS_ON_BOOT" option | | >> | | >> panic logged using serial console and .config are attached. | | >> | | >> Linux version 2.6.27-rc6 (root@luser-desktop) (gcc version 4.2.3 | | > (Ubuntu 4.2.3-8 | | > [clip] | | >> kgdb: Registered I/O driver kgdbts. | | >> kgdbts:RUN plant and detach test | | >> kgdbts:RUN sw breakpoint test | | >> kgdbts:RUN bad memory access test | | >> kgdbts:RUN singlestep test 1000 iterations | | >> kgdbts:RUN singlestep [0/1000] | | >> kgdbts:RUN singlestep [100/1000] | | >> kgdbts: BP mismatch c0109c02 expected c02ad250 | | > | | > | | > | | > So if we break this down into what it means, the kgdb test got an | | > exception and stopped somewhere other than where it placed the | | > breakpoint. In this case 0xc0109c02 which you can see right after the | | > text "BP mismatch". | | | | i have obsvered this bug from 2.6.27-rc1 to rc7 | | however the 2.6.26 was fine | | | | | | > | | > This address is in your stack trace below (read on) | | | | > | | > | | >> ------------[ cut here ]------------ | | >> WARNING: at drivers/misc/kgdbts.c:302 check_and_rewind_pc+0xb2/0xe0() | | >> Modules linked in: | | >> Pid: 0, comm: swapper Not tainted 2.6.27-rc6 #16 | | >> [<c0125d04>] warn_on_slowpath+0x54/0x70 | | >> [<c01260f0>] ? __call_console_drivers+0x60/0x70 | | >> [<c013c15b>] ? up+0x2b/0x40 | | >> [<c0126634>] ? release_console_sem+0x1a4/0x1c0 | | >> [<c01548f7>] ? __rmqueue_smallest+0xb7/0x130 | | >> [<c02445e7>] ? __copy_to_user_ll+0x57/0x60 | | >> [<c0153dc0>] ? probe_kernel_write+0x20/0x40 | | >> [<c02ad250>] ? kgdbts_break_test+0x0/0x30 | | >> [<c0126bab>] ? printk+0x1b/0x20 | | >> [<c02ad250>] ? kgdbts_break_test+0x0/0x30 | | >> [<c02adfb2>] ...
[Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530] ... | > | > If assume that git bisect is right - it's hardly possible | | i have bisected it twice n jason mentioned that it looks like some | kind of nmi race condition. | | but i will bisect it again in rc7 and let you know. | | > that proc_nmi_enabled() related changes introduced this | > error (ie this commit) - kgdb seems to be not | > touching /proc/sys/kernel/nmi while doing self-tests. | > | > - Cyrill - | > | There was a race in nmi code indeed which was fixed recently in -tip tree but it was leading to use unitialized vars. So give me some time to digg a bit. I'll do it tomorrow and write you. Thanks a lot for pointing out this problem! - Cyrill - --
[Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530] ... Btw Tej, could you please apply this patch and check if it help. http://lkml.org/lkml/2008/9/22/250 Aristeu Rozanski fixed one race window which take place on unpredicted nmi's. - Cyrill - --
[Cyrill Gorcunov - Thu, Sep 25, 2008 at 11:39:56PM +0400] | [Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530] | ... | | Btw Tej, could you please apply this patch | and check if it help. | | http://lkml.org/lkml/2008/9/22/250 | | Aristeu Rozanski fixed one race window which take | place on unpredicted nmi's. | | - Cyrill - Btw Tej, could you post boot log if kernel were started as root=/dev/sda1 ro debug apic=debug kgdbts=V2 splash console=tty0 console=ttyS0, Ie your original option "quiet" is to be replace with "debug apic=debug kgdbts=V2". Thanks! - Cyrill - --
sorry for this long pause. machine boots fine with the options specified, and logs are collected as follows. some warning related to sysctl table checking, but its related to libvirtd (KVM) at the end. 7b0084f7ffff0000ffff0000#2f Singlestep stopped at IP: c02a5e91 get8: $Z0,c02a5e90,1#3c put8: $OK#9a get9: $c#63 kgdbts: breakpoint complete put9: $T05thread:0000000000000001;#a7 get10: $g#67 put10: $105f6dc08600000000c01001e403000068672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#0a Stopped at IP: c02a5e91 get11: $G105f6dc08600000000c01001e4030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#cf put11: $OK#9a get12: $D#44 put12: $OK#9a kgdbts: breakpoint complete get0: $?#3f put0: $S05#b8 get1: $Z0,c02a5e90,1#3c put1: $OK#9a get2: $c#63 put2: $T05thread:0000000000000001;#a7 get3: $g#67 put3: $000000000000000084030000e503000063672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#29 Stopped at IP: c02a5e91 get4: $G000000000000000084030000e5030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#f3 put4: $OK#9a get5: $z0,c02a5e90,1#5c put5: $OK#9a get6: $s#73 put6: $T05thread:0000000000000001;#a7 get7: $g#67 put7: $000000000000000084030000e50300004c1f84f74c1f84f700000000e8030000915e2ac04603000060000000680000007b0000007b0084f7ffff0000ffff0000#30 Singlestep stopped at IP: c02a5e91 get8: $Z0,c02a5e90,1#3c put8: $OK#9a get9: $c#63 kgdbts: breakpoint complete put9: $T05thread:0000000000000001;#a7 get10: $g#67 put10: $105f6dc08600000000c01001e503000068672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#0b Stopped at IP: c02a5e91 get11: $G105f6dc08600000000c01001e5030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#d0 put11: $OK#9a get12: $D#44 put12: $OK#9a kgdbts: breakpoint complete get0: ...
[Tej - Fri, Oct 03, 2008 at 04:18:55PM +0530] | On 9/26/08, Cyrill Gorcunov <gorcunov@gmail.com> wrote: | > [Cyrill Gorcunov - Thu, Sep 25, 2008 at 11:39:56PM +0400] | > | [Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530] | > | ... | > | | > | Btw Tej, could you please apply this patch | > | and check if it help. | > | | > | http://lkml.org/lkml/2008/9/22/250 | > | | > | Aristeu Rozanski fixed one race window which take | > | place on unpredicted nmi's. | > | | > | - Cyrill - | > | > Btw Tej, could you post boot log if kernel were started as | > | > root=/dev/sda1 ro debug apic=debug kgdbts=V2 splash console=tty0 | > console=ttyS0, | sorry for this long pause. | | machine boots fine with the options specified, and logs are collected | as follows. | some warning related to sysctl table checking, but its related to | libvirtd (KVM) at the end. | ok, lets assume NMI race is fixed - Cyrill - --
