Hi ladies/gentlemen, The kernel I'm running gdb with is 2.6.27-rc4 The false sigtrap is occuring in ia32_sysenter_target in arch/x86/kernel/entry_32.S:303 when gdb is stepped from the user process as described below To reproduce compile kernel with kgdb support compile my randsleep program attached using the .mk script as root attach randsleep to an idle serial port e.g. /dev/ttyS0 by typing randsleep /dev/ttyS0 from another bash shell type ps -aux | grep randsleep gdb ./randsleep attach <pid of randsleep> You should get messages from gdb like Attaching to program: /home/djbarrow/devel2/randsleep/randsleep, process 6397 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 0xb7fda430 in __kernel_vsyscall () Now type step. The machine is now hung until gdb attaches remotely. -- best regards, D.J. Barrow
Your example does not indicate how or why you setup kgdb. kgdb can be compiled into the kernel, but it should not have any effect what so ever unless it is configured for use because it will not register to receive any of the breakpoint or single stepping traps. Perhaps there is more to the description of your problem? --
Hi Jason, The problem I believe is very reproducable. I'm doing nothing special with kgdb just using it to help me with 3g modem driver development & my driver wasn't loaded when the problem occured. I have the following command in my /boot/grub/menu.lst kernel parameter to enable gdb. kgdboc=/dev/ttyS0,115200 maxcpus=1 And when I do the steps mentioned when in a console I get a message waiting for gdb to attach I'm familiar with kgdb, have been using it for years & know enough to be sure this is undesired behaviour. -- best regards, D.J. Barrow --
It can be reproduced quite easily as it is a generic problem that This was the key detail that was missing. Along with the program and other gdb details provided the source of the problem was not too hard to track down. When you attach to the running program with ptrace (via gdb), it interrupts the system call and executing the high level "step" will result in gdb executing a number of instruction step operations to try to get back to an instruction which corresponds to the next valid line of high level source code. It was the 3rd or 4th instruction step that jumped back into the kernel space because gdb ultimately tries to single step a system call in your example. For the kernel, single stepping a system call is a special operation in that the system call must appear to complete atomically and the user space ends up on the next user space assembly instruction after the system call. Behind the scenes the kernel executes the system call and tracks this condition. It appears kgdb needs to account for this condition as well, by simply ignoring it when it occurs. Please try the attached patch, as it will hopefully address the problem. Jason.
Hi Jason, Sorry for nitpicking & a big thanks for your patch. While this patch stops the big problem, the kernel halting, gdb debugging the userland code still doesn't behave correctly now. Trying to stepi over a sysenter call in gdb doesn't return to the gdb debugger ctrl-c in the debugger still works however. Some code probably needs to be also fixed in arch/x86/kernel/ptrace.c or ideally the generic kernel/ptrace.c, seeing as this works with gdb on a normal kernel it's not a gdb issue even if it can be kludge fixed there. I'm running GNU gdb 6.8-debian from ubuntu 8.04 hardy heron -- best regards, D.J. Barrow --
The patch I sent is not yet included the kgdb stream because I was waiting for further comment. I do not see any issues however in the case that you describe, so I will describe how I tested it and then perhaps you can explain further the case that does not work. Given that I still had your test program, I used it to set a breakpoint at read(), after performing an attach as you described previous with attaching to the running process. At this point kgdboc is loaded and configured. gdb commands: att PID_OF_randsleep break read continue At this point to hit the breakpoint, I had to type some input on the ttyS0 which randsleep where randsleep was connected. At that point I was able to step the system call with enough "si" commands. In the example shown below I did have to provide some more input after the "si" for address 0xffffe419 so that the system call would come out of the sleep state because it happened to be a blocking read. Example: (gdb) continue Continuing. Breakpoint 1, 0x08056b30 in read () (gdb) si 0x08056b38 in read () (gdb) 0x08056b3a in __read_nocancel () (gdb) 0x08056b3b in __read_nocancel () (gdb) 0x08056b3f in __read_nocancel () (gdb) 0x08056b43 in __read_nocancel () (gdb) 0x08056b47 in __read_nocancel () (gdb) 0x08056b4c in __read_nocancel () (gdb) 0xffffe414 in __kernel_vsyscall () (gdb) disas $pc $pc+8 Dump of assembler code from 0xffffe414 to 0xffffe41c: 0xffffe414 <__kernel_vsyscall+0>: push %ecx 0xffffe415 <__kernel_vsyscall+1>: push %edx 0xffffe416 <__kernel_vsyscall+2>: push %ebp 0xffffe417 <__kernel_vsyscall+3>: mov %esp,%ebp 0xffffe419 <__kernel_vsyscall+5>: sysenter 0xffffe41b <__kernel_vsyscall+7>: nop End of assembler dump. (gdb) si 0xffffe415 in __kernel_vsyscall () (gdb) 0xffffe416 in __kernel_vsyscall () (gdb) 0xffffe417 in __kernel_vsyscall () (gdb) 0xffffe419 in __kernel_vsyscall () (gdb) 0xffffe424 in __kernel_vsyscall () (gdb) At least this case ...
Sorry sorry Jason, Better wipe the cobwebs gathering in my brain. I just realised that your program is behaving perfectly. The code I was testing will suspend in the read syscall forever because it has no bytes to read, I now also tested it with code that doesn't suspend & it works perfectly. The patch gets my full blessing even if my blessing is unimportant. -- best regards, D.J. Barrow --
Full regression testing has turned up another problem. If you single step a system call with kgdb which is the same system call that ptrace is stepping kgdb silently fails. The complete fix to both problems is implemented in this patch, and it has been regression tested across all the architectures. This patch will be queued to the 2.6.27 fixes. Jason.
Hi Jason, Just to be complete the kernel config might be useful in reproducing the bug. I'm running Ubuntu 8.04 - the Hardy Heron - released in April 2008. -- best regards, D.J. Barrow
