Re: Kernel PANIC with 2.6.27-rc6

Previous thread: bug report by Todor Gyumyushev on Thursday, September 25, 2008 - 7:08 am. (1 message)

Next thread: ext3 read only handling by Pavel Machek on Thursday, September 25, 2008 - 8:03 am. (2 messages)
From: Tej
Date: Thursday, September 25, 2008 - 7:35 am

i have obsvered this  bug from 2.6.27-rc1 to rc7

ok so i did git bisection which pointed:-

commit 3ed3f06295e69700fa808396f7b350bff2b69de0
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date:   Wed Jun 4 01:00:47 2008 +0400

    x86: nmi - consolidate nmi_watchdog_default for 32bit mode

    64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
    by default and doing the same on 32bit mode is safe too.
    Such an action saves us from several #ifdef.



git -bisection log.

git-bisect start
# bad: [6e86841d05f371b5b9b86ce76c02aaee83352298] Linux 2.6.27-rc1
git-bisect bad 6e86841d05f371b5b9b86ce76c02aaee83352298
# good: [bce7f793daec3e65ec5c5705d2457b81fe7b5725] Linux 2.6.26
git-bisect good bce7f793daec3e65ec5c5705d2457b81fe7b5725
# bad: [d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0] V4L/DVB (8415):
gspca: Infinite loop in i2c_w() of etoms.
git-bisect bad d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0
# bad: [666484f0250db2e016948d63b3ef33e202e3b8d0] Merge branch
'core/softirq' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git-bisect bad 666484f0250db2e016948d63b3ef33e202e3b8d0
# bad: [d59fdcf2ac501de99c3dfb452af5e254d4342886] Merge commit
'v2.6.26' into x86/core
git-bisect bad d59fdcf2ac501de99c3dfb452af5e254d4342886
# good: [3de352bbd86f890dd0c5e1c09a6a1b0b29e0f8ce] Merge branch
'x86/mpparse' into x86/devel
git-bisect good 3de352bbd86f890dd0c5e1c09a6a1b0b29e0f8ce
# bad: [b4df32f4aeef8794d0135fc8dc250acb44cfee60] x86: fix warning in
e820_reserve_resources with 32bit
git-bisect bad b4df32f4aeef8794d0135fc8dc250acb44cfee60
# bad: [7f0be02c5ed1deb04c54c6a17f412e04f417df11] x86: move
boot_params declaring to setup.c
git-bisect bad 7f0be02c5ed1deb04c54c6a17f412e04f417df11
# bad: [4b62ac9a2b859f932afd5625362c927111b7dd9b] Merge branch
'x86/nmi' into x86/devel
git-bisect bad 4b62ac9a2b859f932afd5625362c927111b7dd9b
# bad: [f781b03c4b1c713ac000877c8bbc31fc4164a29b] x86:
touch_nmi_watchdog(): reset alert counters for supported nmi_watchdog
modes ...
From: Cyrill Gorcunov
Date: Thursday, September 25, 2008 - 7:56 am

[Tej - Thu, Sep 25, 2008 at 08:05:23PM +0530]
| On 9/11/08, Jason Wessel <jason.wessel@windriver.com> wrote:
| > Tej wrote:
| >> observed a panic on 2.6.27-rc6 caused by  enabling
| >> "CONFIG_KGDB_TESTS_ON_BOOT" option
| >>
| >> panic logged using serial console and .config are attached.
| >>
| >> Linux version 2.6.27-rc6 (root@luser-desktop) (gcc version 4.2.3
| > (Ubuntu 4.2.3-8
| > [clip]
| >> kgdb: Registered I/O driver kgdbts.
| >> kgdbts:RUN plant and detach test
| >> kgdbts:RUN sw breakpoint test
| >> kgdbts:RUN bad memory access test
| >> kgdbts:RUN singlestep test 1000 iterations
| >> kgdbts:RUN singlestep [0/1000]
| >> kgdbts:RUN singlestep [100/1000]
| >> kgdbts: BP mismatch c0109c02 expected c02ad250
| >
| >
| >
| > So if we break this down into what it means, the kgdb test got an
| > exception and stopped somewhere other than where it placed the
| > breakpoint.  In this case 0xc0109c02 which you can see right after the
| > text "BP mismatch".
| 
| i have obsvered this  bug from 2.6.27-rc1 to rc7
| however the 2.6.26 was fine
| 
| 
| >
| > This address is in your stack trace below (read on)
| 
| >
| >
| >> ------------[ cut here ]------------
| >> WARNING: at drivers/misc/kgdbts.c:302 check_and_rewind_pc+0xb2/0xe0()
| >> Modules linked in:
| >> Pid: 0, comm: swapper Not tainted 2.6.27-rc6 #16
| >>  [<c0125d04>] warn_on_slowpath+0x54/0x70
| >>  [<c01260f0>] ? __call_console_drivers+0x60/0x70
| >>  [<c013c15b>] ? up+0x2b/0x40
| >>  [<c0126634>] ? release_console_sem+0x1a4/0x1c0
| >>  [<c01548f7>] ? __rmqueue_smallest+0xb7/0x130
| >>  [<c02445e7>] ? __copy_to_user_ll+0x57/0x60
| >>  [<c0153dc0>] ? probe_kernel_write+0x20/0x40
| >>  [<c02ad250>] ? kgdbts_break_test+0x0/0x30
| >>  [<c0126bab>] ? printk+0x1b/0x20
| >>  [<c02ad250>] ? kgdbts_break_test+0x0/0x30
| >>  [<c02adfb2>] check_and_rewind_pc+0xb2/0xe0
| >>  [<c0109c02>] ? mwait_idle+0x32/0x40
| >
| >
| > So if you were to do:
| > gdb ./vmlinux
| > i line *0xc0109c02
| >
| > It will print ...
From: Cyrill Gorcunov
Date: Thursday, September 25, 2008 - 8:47 am

[Cyrill Gorcunov - Thu, Sep 25, 2008 at 06:56:35PM +0400]
| [Tej - Thu, Sep 25, 2008 at 08:05:23PM +0530]
| | On 9/11/08, Jason Wessel <jason.wessel@windriver.com> wrote:
| | > Tej wrote:
| | >> observed a panic on 2.6.27-rc6 caused by  enabling
| | >> "CONFIG_KGDB_TESTS_ON_BOOT" option
| | >>
| | >> panic logged using serial console and .config are attached.
| | >>
| | >> Linux version 2.6.27-rc6 (root@luser-desktop) (gcc version 4.2.3
| | > (Ubuntu 4.2.3-8
| | > [clip]
| | >> kgdb: Registered I/O driver kgdbts.
| | >> kgdbts:RUN plant and detach test
| | >> kgdbts:RUN sw breakpoint test
| | >> kgdbts:RUN bad memory access test
| | >> kgdbts:RUN singlestep test 1000 iterations
| | >> kgdbts:RUN singlestep [0/1000]
| | >> kgdbts:RUN singlestep [100/1000]
| | >> kgdbts: BP mismatch c0109c02 expected c02ad250
| | >
| | >
| | >
| | > So if we break this down into what it means, the kgdb test got an
| | > exception and stopped somewhere other than where it placed the
| | > breakpoint.  In this case 0xc0109c02 which you can see right after the
| | > text "BP mismatch".
| | 
| | i have obsvered this  bug from 2.6.27-rc1 to rc7
| | however the 2.6.26 was fine
| | 
| | 
| | >
| | > This address is in your stack trace below (read on)
| | 
| | >
| | >
| | >> ------------[ cut here ]------------
| | >> WARNING: at drivers/misc/kgdbts.c:302 check_and_rewind_pc+0xb2/0xe0()
| | >> Modules linked in:
| | >> Pid: 0, comm: swapper Not tainted 2.6.27-rc6 #16
| | >>  [<c0125d04>] warn_on_slowpath+0x54/0x70
| | >>  [<c01260f0>] ? __call_console_drivers+0x60/0x70
| | >>  [<c013c15b>] ? up+0x2b/0x40
| | >>  [<c0126634>] ? release_console_sem+0x1a4/0x1c0
| | >>  [<c01548f7>] ? __rmqueue_smallest+0xb7/0x130
| | >>  [<c02445e7>] ? __copy_to_user_ll+0x57/0x60
| | >>  [<c0153dc0>] ? probe_kernel_write+0x20/0x40
| | >>  [<c02ad250>] ? kgdbts_break_test+0x0/0x30
| | >>  [<c0126bab>] ? printk+0x1b/0x20
| | >>  [<c02ad250>] ? kgdbts_break_test+0x0/0x30
| | >>  [<c02adfb2>] ...
From: Tej
Date: Thursday, September 25, 2008 - 11:34 am

i have bisected it twice n jason mentioned that it looks like some
kind of nmi race condition.

--

From: Cyrill Gorcunov
Date: Thursday, September 25, 2008 - 12:27 pm

[Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530]
...
| >
| > If assume that git bisect is right - it's hardly possible
| 
| i have bisected it twice n jason mentioned that it looks like some
| kind of nmi race condition.
| 
| but i will bisect it again in rc7 and let you know.
| 
| > that proc_nmi_enabled() related changes introduced this
| > error (ie this commit) - kgdb seems to be not
| > touching /proc/sys/kernel/nmi while doing self-tests.
| >
| > 		- Cyrill -
| >
| 

There was a race in nmi code indeed which was fixed recently in
-tip tree but it was leading to use unitialized vars. So give
me some time to digg a bit. I'll do it tomorrow and write you.
Thanks a lot for pointing out this problem!

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Thursday, September 25, 2008 - 12:39 pm

[Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530]
...

Btw Tej, could you please apply this patch
and check if it help.

	http://lkml.org/lkml/2008/9/22/250

Aristeu Rozanski fixed one race window which take
place on unpredicted nmi's.

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Friday, September 26, 2008 - 1:50 am

[Cyrill Gorcunov - Thu, Sep 25, 2008 at 11:39:56PM +0400]
| [Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530]
| ...
| 
| Btw Tej, could you please apply this patch
| and check if it help.
| 
| 	http://lkml.org/lkml/2008/9/22/250
| 
| Aristeu Rozanski fixed one race window which take
| place on unpredicted nmi's.
| 
| 		- Cyrill -

Btw Tej, could you post boot log if kernel were started as

	root=/dev/sda1 ro debug apic=debug kgdbts=V2 splash console=tty0 console=ttyS0,

Ie your original option "quiet" is to be replace with "debug apic=debug kgdbts=V2".
Thanks!

		- Cyrill -
--

From: Tej
Date: Friday, October 3, 2008 - 3:48 am

sorry for this long pause.

machine boots fine with the options specified, and logs are collected
as follows.
some warning related to sysctl table checking, but its related to
libvirtd (KVM) at the end.

7b0084f7ffff0000ffff0000#2f
Singlestep stopped at IP: c02a5e91
get8: $Z0,c02a5e90,1#3c
put8: $OK#9a
get9: $c#63
kgdbts: breakpoint complete
put9: $T05thread:0000000000000001;#a7
get10: $g#67
put10: $105f6dc08600000000c01001e403000068672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#0a
Stopped at IP: c02a5e91
get11: $G105f6dc08600000000c01001e4030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#cf
put11: $OK#9a
get12: $D#44
put12: $OK#9a
kgdbts: breakpoint complete
get0: $?#3f
put0: $S05#b8
get1: $Z0,c02a5e90,1#3c
put1: $OK#9a
get2: $c#63
put2: $T05thread:0000000000000001;#a7
get3: $g#67
put3: $000000000000000084030000e503000063672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#29
Stopped at IP: c02a5e91
get4: $G000000000000000084030000e5030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#f3
put4: $OK#9a
get5: $z0,c02a5e90,1#5c
put5: $OK#9a
get6: $s#73
put6: $T05thread:0000000000000001;#a7
get7: $g#67
put7: $000000000000000084030000e50300004c1f84f74c1f84f700000000e8030000915e2ac04603000060000000680000007b0000007b0084f7ffff0000ffff0000#30
Singlestep stopped at IP: c02a5e91
get8: $Z0,c02a5e90,1#3c
put8: $OK#9a
get9: $c#63
kgdbts: breakpoint complete
put9: $T05thread:0000000000000001;#a7
get10: $g#67
put10: $105f6dc08600000000c01001e503000068672ac04c1f84f700000000e8030000915e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#0b
Stopped at IP: c02a5e91
get11: $G105f6dc08600000000c01001e5030000000000004c1f84f700000000e8030000905e2ac04602000060000000680000007b0084f77b0011c0ffff0000ffff0000#d0
put11: $OK#9a
get12: $D#44
put12: $OK#9a
kgdbts: breakpoint complete
get0: ...
From: Cyrill Gorcunov
Date: Friday, October 3, 2008 - 7:01 am

[Tej - Fri, Oct 03, 2008 at 04:18:55PM +0530]
| On 9/26/08, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
| > [Cyrill Gorcunov - Thu, Sep 25, 2008 at 11:39:56PM +0400]
| > | [Tej - Fri, Sep 26, 2008 at 12:04:30AM +0530]
| > | ...
| > |
| > | Btw Tej, could you please apply this patch
| > | and check if it help.
| > |
| > | 	http://lkml.org/lkml/2008/9/22/250
| > |
| > | Aristeu Rozanski fixed one race window which take
| > | place on unpredicted nmi's.
| > |
| > | 		- Cyrill -
| >
| > Btw Tej, could you post boot log if kernel were started as
| >
| > 	root=/dev/sda1 ro debug apic=debug kgdbts=V2 splash console=tty0
| > console=ttyS0,
| sorry for this long pause.
| 
| machine boots fine with the options specified, and logs are collected
| as follows.
| some warning related to sysctl table checking, but its related to
| libvirtd (KVM) at the end.
|

ok, lets assume NMI race is fixed

		- Cyrill -
--

Previous thread: bug report by Todor Gyumyushev on Thursday, September 25, 2008 - 7:08 am. (1 message)

Next thread: ext3 read only handling by Pavel Machek on Thursday, September 25, 2008 - 8:03 am. (2 messages)