Re: getting false SIGTRAP breakpoints in kernel i.e. kernel hung unless gdb remotely attached on x86 & cont is issued

Previous thread: Re: [RFC][Resend] Make NFS-Client readahead tunable by Michael Trimarchi on Wednesday, September 17, 2008 - 6:42 am. (1 message)

Next thread: Virtual Memory problem in kernel 2.4.18xsmp by yun lin on Wednesday, September 17, 2008 - 7:11 am. (2 messages)
From: Denis Joseph Barrow
Date: Wednesday, September 17, 2008 - 6:47 am

Hi ladies/gentlemen,
The kernel I'm running gdb with is 2.6.27-rc4
The false sigtrap is occuring in ia32_sysenter_target in arch/x86/kernel/entry_32.S:303
when gdb is stepped from the user process as described below

To reproduce
compile kernel with kgdb support 
compile my randsleep program attached using the .mk script
as root
attach randsleep to an idle serial port e.g. /dev/ttyS0 by typing
randsleep /dev/ttyS0
from another bash shell type
ps -aux | grep randsleep

gdb ./randsleep
attach <pid of randsleep>

You should get messages from gdb like
Attaching to program: /home/djbarrow/devel2/randsleep/randsleep, process 6397
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7fda430 in __kernel_vsyscall ()

Now type step.

The machine is now hung until gdb attaches remotely.


-- 
best regards,
D.J. Barrow

Your example does not indicate how or why you setup kgdb.  kgdb can be
compiled into the kernel, but it should not have any effect what so ever
unless it is configured for use because it will not register to receive
any of the breakpoint or single stepping traps.

Perhaps there is more to the description of your problem?


--

From: Denis Joseph Barrow
Date: Wednesday, September 17, 2008 - 7:20 am

Hi Jason,
The problem I believe is very reproducable.
I'm doing nothing special with kgdb just using it to help me with 
3g modem driver development & my driver wasn't loaded when the problem occured.
I have the following command in my /boot/grub/menu.lst kernel parameter to enable gdb.
kgdboc=/dev/ttyS0,115200 maxcpus=1

And when I do the steps mentioned when in a console I get a message waiting for gdb to attach
I'm familiar with kgdb, have been using it for years & know enough to be sure this is undesired behaviour.



-- 
best regards,
D.J. Barrow
--


It can be reproduced quite easily as it is a generic problem that


This was the key detail that was missing.  Along with the program and
other gdb details provided the source of the problem was not too hard
to track down.

When you attach to the running program with ptrace (via gdb), it
interrupts the system call and executing the high level "step" will
result in gdb executing a number of instruction step operations to try
to get back to an instruction which corresponds to the next valid line
of high level source code.

It was the 3rd or 4th instruction step that jumped back into the
kernel space because gdb ultimately tries to single step a system call
in your example.  For the kernel, single stepping a system call is a
special operation in that the system call must appear to complete
atomically and the user space ends up on the next user space assembly
instruction after the system call.  Behind the scenes the kernel
executes the system call and tracks this condition.

It appears kgdb needs to account for this condition as well, by simply
ignoring it when it occurs.

Please try the attached patch, as it will hopefully address the
problem.

Jason.

Hi Jason,
Sorry for nitpicking & a big thanks for your patch.
While this patch stops the big problem, the kernel halting, gdb
debugging the userland code still doesn't behave correctly
now. Trying to stepi over a sysenter call in gdb doesn't return
to the gdb debugger ctrl-c in the debugger still works however. 
Some code probably needs to be also fixed in arch/x86/kernel/ptrace.c
or ideally the generic kernel/ptrace.c, seeing as this works
with gdb on a normal kernel it's not a gdb issue even if
it can be kludge fixed there.
I'm running GNU gdb 6.8-debian from ubuntu 8.04 hardy heron






-- 
best regards,
D.J. Barrow
--


The patch I sent is not yet included the kgdb stream because I was
waiting for further comment.  I do not see any issues however in the
case that you describe, so I will describe how I tested it and then
perhaps you can explain further the case that does not work.

Given that I still had your test program, I used it to set a
breakpoint at read(), after performing an attach as you described
previous with attaching to the running process.  At this point kgdboc
is loaded and configured.

gdb commands:

att PID_OF_randsleep
break read
continue

At this point to hit the breakpoint, I had to type some input on the
ttyS0 which randsleep where randsleep was connected.  At that point I
was able to step the system call with enough "si" commands.  In the
example shown below I did have to provide some more input after the
"si" for address 0xffffe419 so that the system call would come out of
the sleep state because it happened to be a blocking read.

Example:
(gdb) continue
Continuing.

Breakpoint 1, 0x08056b30 in read ()
(gdb) si
0x08056b38 in read ()
(gdb)
0x08056b3a in __read_nocancel ()
(gdb)
0x08056b3b in __read_nocancel ()
(gdb)
0x08056b3f in __read_nocancel ()
(gdb)
0x08056b43 in __read_nocancel ()
(gdb)
0x08056b47 in __read_nocancel ()
(gdb)
0x08056b4c in __read_nocancel ()
(gdb)
0xffffe414 in __kernel_vsyscall ()
(gdb) disas $pc $pc+8
Dump of assembler code from 0xffffe414 to 0xffffe41c:
0xffffe414 <__kernel_vsyscall+0>:       push   %ecx
0xffffe415 <__kernel_vsyscall+1>:       push   %edx
0xffffe416 <__kernel_vsyscall+2>:       push   %ebp
0xffffe417 <__kernel_vsyscall+3>:       mov    %esp,%ebp
0xffffe419 <__kernel_vsyscall+5>:       sysenter
0xffffe41b <__kernel_vsyscall+7>:       nop  
End of assembler dump.
(gdb) si
0xffffe415 in __kernel_vsyscall ()
(gdb)
0xffffe416 in __kernel_vsyscall ()
(gdb)
0xffffe417 in __kernel_vsyscall ()
(gdb)
0xffffe419 in __kernel_vsyscall ()
(gdb)
0xffffe424 in __kernel_vsyscall ()
(gdb)


At least this case ...

Sorry sorry Jason,
Better wipe the cobwebs gathering in my brain.
I just realised that your program is behaving perfectly.
The code I was testing will suspend in the read syscall forever
because it has no bytes to read,
I now also tested it with code that doesn't suspend & it works
perfectly.

The patch gets my full blessing even if my blessing is unimportant.



-- 
best regards,
D.J. Barrow
--


Full regression testing has turned up another problem.  If you single
step a system call with kgdb which is the same system call that ptrace
is stepping kgdb silently fails.

The complete fix to both problems is implemented in this patch, and it
has been regression tested across all the architectures.

This patch will be queued to the 2.6.27 fixes.

Jason.

From: Denis Joseph Barrow
Date: Wednesday, September 17, 2008 - 8:40 am

Hi Jason,
Just to be complete the kernel config might be useful in reproducing the bug.
I'm running Ubuntu 8.04 - the Hardy Heron - released in April 2008.
				


-- 
best regards,
D.J. Barrow
Previous thread: Re: [RFC][Resend] Make NFS-Client readahead tunable by Michael Trimarchi on Wednesday, September 17, 2008 - 6:42 am. (1 message)

Next thread: Virtual Memory problem in kernel 2.4.18xsmp by yun lin on Wednesday, September 17, 2008 - 7:11 am. (2 messages)