I sent Jan & Jason earlier a quick review of the kgdb code currently in git-x86#mm. The kgdb code in git-x86#mm is right now is totally broken of course because the CFI annotations in assembler code are gone now, but they are needed for gdb use. And the bogus fault handling code is still in there, but Jason apprently fixed that one. Here are the other commments I wrote just in case someone is interested. Overall I would say there is quite a lot of stuff to clean up still. <snip> ok, doing a review on the code in git-x86#mm. I have not read everything in detail, just a quick skip over. My personal test for clean kernel debugger integration is if the debugger could be made a loadable (but not unloadable) module with minor/no effort. How far away is your code from that? The 32bit in_exception_stack code looks weird. Are you sure you cannot just use the pt_regs passed to the die notifier? It might be safer to explicitely disable interrupts early in the notifier. You should probably use simple_strtoul() instead of inventing an own hex parser in kgdb.c. And sprintf instead of an own hex writer. In general more use sprintf would probably shorten a lot of the parser code. The num_online_cpu()s check in getthread() looks weird. What is that supposed to do? kgdb_softlock_notify: I think the right way to handle this is just calling touch_softlockup_watchdog() (or perhaps better touch_nmi_watchdog) Actually it should be only needed once when you exit the debugger for soft lockup, but regularly for nmi watchdog. On x86 it is ok, but on other architectures i'm suspect the SMP synchronization with atomics might benefit from more memory barriers. gdb_cmd_reboot: always calling emergency_sync looks dangerous. In fact i suspect if you hold the other CPUs the changes are good it will lock up. You should at least offer a option to not call that (or better don't do it by default). Even machine_start is likely lock up actually if the other CPUs are truly halted (as th...
.. Speaking of which.. the kernel implementation of snprintf() seems to have a bug somewhere, in that it returns an incorrect count in some situations -- mostly around where the buffer is too small to hold the data being written. There's an off-by-one bug there somewhere, but I have not had time yet to track it down more precisely. Cheers --
This loss of CFI data is unfortunate for other potential consumers too, such as systemtap, which intends to supplement its current stack backtracing heuristics with CFI data. - FChE --
yes, but we removed those patches meanwhile because Roland McGrath has
agreed to clean up the CFI mess. (Thanks Roland! We were very close to
nuking them altogether because the unreadability they introduced was
causing real bugs.)
[ btw., Andi's characterisation of it being "totally broken" is
incorrect and alarmist. It only means that the "backtrace" command
will not work in kgdb in certain types of contexts (only the current
function is displayed, not the parent function(s)), but there's no
"breakage", just less information available to the debugger. ]
in any case, while there are still holes in the frame annotations and it
all needs a good bit of cleanup, that is a largely separate issue from
whether kgdb is merge-worthy or not.
I'd like to thank Andi for his kgdb review efforts and comments.
Unfortunately they were not done against kgdb-light which is being
offered for upstream merge, but against an older version of kgdb. (Sorry
about the 12 hour delay in my reply - Andi's mail did not have any of
the kgdb or x86 persons Cc:-ed so it was not immediately noticed on a
busy Monday, only in the lower-prio lkml scans.)
after a quick glance i'm happy to conclude that all but one of Andi's
review comments were already addressed in kgdb-light -v7 AFAICS, and the
only one not addressed (emergency_resync()) is now addressed in -v8.
the -v8 kgdb-light tree (against v2.6.25-rc1) can be pulled from:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git
shortlog, diffstat and patch attached below. Tip of the -v8 tree is
commit f20d70a6dd5785c9dd4c8215daf21f34365ce1c7.
In any case, if there are any open issues we are very much ready and
willing to address them now or after any potential upstream merge of
this codebase. This has meanwhile become one of the best-reviewed pieces
of kernel code in living memory ;-)
Ingo
------------------>
Ingo Molnar (3):
pids: add pid_max prototype
uaccess: add probe_kernel......dst and src are twisted. Fix below (should make sw-breakpoints work
again).
Jan
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index dce89d1..10fe113 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -153,14 +153,14 @@ int __weak kgdb_validate_break_address(unsigned long addr)
{
char tmp_variable[BREAK_INSTR_SIZE];
- return probe_kernel_read((char *)addr, tmp_variable, BREAK_INSTR_SIZE);
+ return probe_kernel_read(tmp_variable, (char *)addr, BREAK_INSTR_SIZE);
}
int __weak kgdb_arch_set_breakpoint(unsigned long addr, char *saved_instr)
{
int err;
- err = probe_kernel_read((char *)addr, saved_instr, BREAK_INSTR_SIZE);
+ err = probe_kernel_read(saved_instr, (char *)addr, BREAK_INSTR_SIZE);
if (err)
return err;
--thanks, applied :-) Merged, uploaded the updated tree to the usual place. New tip of head is 59b64c80d706ae3504ccc76a9c00cdfea4a10701. Ingo --
> after a quick glance i'm happy to conclude that all but one of Andi's At least the weird in_interrupt_stack() code seems to be still in there. I also still see a lot of open coded hex manipulation. The synchronization code looks as bad as it was before. The getthread magic pids are also still in there. What is that good for? The notifiers are also still run with interrupts enabled, so could in theory be preempted (only in some very exotic cases though) The cpu hotplug races are still there I think I have not rechecked everything else. -Andi --
i've uploaded kgdb-light -v9, it can be pulled from:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git
find the shortlog, diffstat, full patch below.
There are a number of changes in -v9 - a patch from Jason to clean up
kernel-messages-over-the-kgdb-console behavior and a bugfix from Jan.
fixed. (That code is gone now - the pt_regs passed in to the notifier
points to the right kind of register context already, so the whole
shadow task business on 64-bit was an unnecessary complication - as you
suspected it. With it went another bunch of kernel/kgdb.c complication
i reworked and cleaned up all the kgdb locking code completely. No more
NR_CPUS spinlocks, there's now proper use of barriers, etc. No more
silly "timeouts" for locking either ... There's now a very well-defined
and clean locking state-machine for the primary CPU and the secondary
previously Mark indicated some sort of sprintf return value breakage he
observed, and kgdb would rely on sprintf return values so i'm inclined
i fixed them too.
thanks for your suggestions, anything else you can think of?
Ingo
------------------>
Ingo Molnar (3):
pids: add pid_max prototype
uaccess: add probe_kernel_write()
x86: kgdb support
Jason Wessel (3):
kgdb: core
consoles: polling support, kgdboc
kgdb: document parameters
Documentation/kernel-parameters.txt | 5 +
arch/x86/Kconfig | 1 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/kgdb.c | 555 ++++++++++++
drivers/char/tty_io.c | 47 +
drivers/serial/8250.c | 58 ++
drivers/serial/Kconfig | 3 +
drivers/serial/Makefile | 1 +
drivers/serial/kgdboc.c | 163 ++++
drivers/serial/serial_core.c | 70 ++-
include/asm-x86/kgdb.h | 81 ++
include/linux/kgdb.h | 271 ++++++
include/linux/pid.h | ...First nobody answered the "kgdb clean enough for a module"
A timeout for waiting for other CPUs is actually not a bad idea for a
debugger. After all you still want to debug even if some other CPUs
Pretty much all the proc output and sysfs show functions rely on these return
The 64bit gdb actually supports segment registers:
static int amd64_linux_gregset64_reg_offset[] =
{
RAX * 8, RBX * 8, /* %rax, %rbx */
RCX * 8, RDX * 8, /* %rcx, %rdx */
RSI * 8, RDI * 8, /* %rsi, %rdi */
RBP * 8, RSP * 8, /* %rbp, %rsp */
R8 * 8, R9 * 8, /* %r8 ... */
R10 * 8, R11 * 8,
R12 * 8, R13 * 8,
R14 * 8, R15 * 8, /* ... %r15 */
RIP * 8, EFLAGS * 8, /* %rip, %eflags */
CS * 8, SS * 8, /* %cs, %ss */
DS * 8, ES * 8, /* %ds, %es */
FS * 8, GS * 8, /* %fs, %gs */
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1,
ORIG_RAX * 8
You silently overwrite any user ptrace hw breakpoints right? To do it cleanly
All the kerneldoc comments are useless if you don't add the file
I don't think that case should happen during roundup for once. If it happens
something is wrong.
That seems weird. I think other parts try to support user mode debugging
too. In theory there is no reason it shouldn't be able to do this
Ok I have not checked, but I hope you have strong protections against
reentry of the debugger just in case an exception here falls down to
That wrapper should not be needed; everybody can use instruction_pointer()
Hmm, so idle thread numbers depend on a sysctl? That seems weird.
It's still likely to deadlock on MP I think
[...] Haven't read further.
-Andi
--this is kgdb-light, version -v10 (against Linus-latest), and can be pulled from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git shortlog, diffstat and full patch can be found further below. in -v10 i have depleted Andi Kleen's (known ;) pool of last-minute correct (just like SysRq-B can hang) - in -v10 i've fixed this by it's a sensible check, it validates a remote-provided "tid" against the range we accept. (It's a bit superfluous as find_task_by_pid() is safe enough.) yes, i thought about that too yesterday, but was unsure how the rest of the code would code with that. It's no big issue and i have no strong feelings either way. GDB wants to track threads, but on the Linux side each idle task has PID GDB simply needs an ID it can work with - there's nothing weird about as Roland explained it, the simplicity and of this code intentional. That's wrong: in-source documentation is never useless. If it happens to meet DocBook style that's an added bonus. If the DocBook template is no, not all architectures have it. This is a weak alias that is The currently submitted code does not try to support user mode debugging. It might be a nice add-on later, if done cleanly and yes, this means if the user has configured kcrash as well then that will i disagree that a kernel debugger should necessarily be a kernel module. It might be nice at a future stage, but right now it would just i went for correctness and simplicity first. If a system is hung, the debugging CPU might hang too at any time. A timeout on the other hand introduces the possibility of a 'dead' CPU just coming back to life after the 'timeout', corrupting debugger data. So for now the rule is very simple. you are wrong, as procfs does not rely on it for a protocol stack. It relies on it mostly for user-space "this many bytes were processed" return values and those values are often wrong and user-space just handles it gratiously. Anyway,...
Hi, I am just trying to play with this, I am not able to make it work. Assume I am able with git, I use it every day on my software, then I am correctly tracking and building your branch on git.kernel.org. My config has both KGDB and KGDB_SERIAL_CONSOLE. kernel command line has kgdbwait kgdboc=ttyS0,115200. linux console is not on serial but if do not enable KGDB_SERIAL_CONSOLE I am then unable to hit SysRq+g. If instead I use linux serial console, gdb messes with getty and things quickly get worse. 1. at boot, kernel waits for a gdb connection 2. gdb connection is established from another box 3. gdb breaks in 4. hit continue at gdb prompt 5. target box returns to life. everything looks ok (it is 2.6.25-rc1!) 6. SysRq+h on the target correctly shows the help on the console 7. SysRq+g on the target, it prints "SysRq : GDB" and hangs forever 8. gdb on the host does not break in is really that simple, am i overlooking anything? thank you. regards, Domenico # # Automatically generated make config: don't edit # Linux kernel version: 2.6.25-rc1 # Mon Feb 11 11:34:22 2008 # CONFIG_PARISC=y CONFIG_MMU=y CONFIG_STACK_GROWSUP=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME=y CONFIG_TIME_LOW_RES=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_IRQ_PER_CPU=y CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=15 # CONFIG_CGROUPS is ...
I doubt the .config you posted was the one you used, because there were no KGDB options specified at all. If you want to be using your vga device as the console that is ok. You can still activate the sysrq with running the command: echo g > /proc/sysrq-trigger You might consider doing "set debug remote 1" in the gdb session if you want to see what is going on. After the continue, the next time you break in with the debugger it should send $T.........etc which is a stop packet. Jason. --
indeed it was of a parisc.. :) here is the right one: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.25-rc1 # Tue Feb 12 13:11:26 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=14 # CONFIG_CGROUPS is not set # CONFIG_FAIR_GROUP_SCHED is not se...
Can't be very many because oprofile needs it and it works on most archs now. Anyways, the right thing is to just add it to the architectures Hmm, I'm pretty sure I saw some code that was only useful for the user case. Perhaps you should take all that out then if it doesn't work anyways? Seems wrong intended behaviour to me. If kgdb is active it should have priority The question is less about actually having it as a module, but just if the interfaces are clean enough to allow it as a module. If all code is correct, it likely won't need a debugger. Ok it just makes kgdb useless for a wide range of kernel problems. If the return value of read() is wrong not even cat(1) will work correctly, but lose bytes. You're saying cat is not supposed to work on /proc? That's certainly Sure a lot of ugly code works in practice. If it's clean and maintaintable code is another question of course. -Andi --
i'm glad that there are no other valid-looking (to me) objections left against kgdb-light -v10, other than "it would be nice to have more kgdb (It seems we do have a fundamental disagreement about how kgdb should act: in my opinion it should never break a correct system, while in your opinion apparently it's OK to compromise on that to make debugging easier. (find below more detailed arguments about this.) If you do not change your position about that issue we'll have to agree to disagree about that point.) So unless i forgot about something (please yell if so), it seems to me kgdb is now pretty ready for an upstream merge. here are my replies to the most recent points you brought up [in order of how i perceive the topic's importance]: i gave you very specific technological reasons for why we dont want to do spinning for now: we dont _ever_ want to break a correctly working system with kgdb. A valid counter-argument is _not_ to argue "but it would be nice to have if the system is broken in X, Y and Z ways" (like you did), but to point it out why the behavior we chose is wrong on a correctly working system. Yes, a buggy system might misbehave in various ways but my primary interest is in keeping correctly working systems correct. And note that kgdb is not just a "debugger", it's a system inspection tool. An intelligent, human-controlled printk. A kernel internals learning tool. An extension to the kernel console concept. Yes, people frequently use it for debugging too, but the other uses are actually but your contention is simply wrong. Most of our debugging infrastructure is non-modular for a good reason. Modularization increases complexity and that's exactly the wrong direction for it's not reimplemented - kgdb_arch_pc() does not directly map to that's already all taken out in -v10. (if you find any leftovers then that's the approach we are taking: be as unintrusive as possible. This means that the notifier here is registered at th...
This is not a technical argument, but I am not a big fan of hard hanging the system if you cannot sync all the CPUs. The original intent was to at least provide a sync error message to the end user after some reasonable time. Then allow someone to collect any data you can get and you basically have to reboot. The reboot was never forced, but assumed the end users of this knew what they were doing in the first place. Certainly in a completely working system where you use kgdb only for inspection this is not an issue, unless you use a breakpoint or single step one of the smp_call functions. As we all know there are lots of We might be best served to add a comment to explain the purpose of kgdb_arch_pc() and put it in the optional implementation function headers in include/linux/kgdb.h On some archs certain exceptions do not report the address that the exception occurred at when you call instruction_pointer(). This optional function allows for an arch to perform a "fixup" to get the address the exception actually occurred at. Kgdb requires the actual exception address so a sanity check can be performed to make sure kgdb did not hit an exception while in a chunk of code kgdb requires for its functionality. If you hit one of these conditions kgdb makes its best attempt to try to "patch the wound" inflicted by shooting yourself but at least you get notified vs a silent hang :-) Jason. --
That was for the old longjmp checks, but that code is long gone isn't it In what cases is that still needed? -Andi --
No this is a kgdb integrity check. Basically when you reach this chunk of code it is before the hand off to the source debugger. We cannot continue because there was a breakpoint in a part of the system kgdb was using while doing its normal work. The reality is that KGDB is not self contained. It relies on some external I/O methods, atomic page fault handling and some other pieces. If you take an exception there, the kgdb integrity check absolutely needs to fail. It was the source of some great pain for folks who tried to use kgdb for more than an inspection tool. There are some parts of the system This is a totally stupid test case but it should illustrate the problem. - connect to kgdb - place a break point at kgdb_disable_hw_debug() - continue - Now try to break in again You will see the kgdb integrity check fail because you put a breakpoint in a place that kgdb actually needs to perform its work. In this case kgdb will clean up the exception and restore operation and you will be ok. The kgdboc has a much smaller swath of code that you cannot debug vs something like kgdboe where there are lot more places you cannot debug due to the amount of code to services the ethernet device, irq sync etc... This check is absolutely required to help prevent silent death via dumb breakpoints or stepping around in random places (which is ill advised anyway). Jason. --
Don't you just need a simple recursion counter for this? I cannot think of a reliable simple way to check for this using the instruction pointer. The only way would be to use explicit annotations for all possible code similar to what kprobes does (__kprobes), but that would be hugely intrusive all over the tree. With the recursion counter the only problem would be someone setting a break point on the early notifier code itself that contains a recursion check, but that would be only a few lines of code so the risk of someone setting a break point exactly there would I believe you, but I don't believe that your implementation of this handles all cases. -Andi --
It is more than a simple recursion check (which is already in the code) because there are some conditions we can recover from. I'd rather not crash the system out if it can be recovered. kgdb_reenter_check() has the "simple" recursion check with the exception_level variable. It also has several special checks against the breakpoint situation I described before as well as trying to remove all the breakpoints. Ultimately if all those checks fails, it results in panic, because at the end of the day that is all you can do if you re-enter the debugger from the debugger (IE: you are toast). I don't see any reason to change the code there unless it can be further improved some way. We want to fail loudly and verbosely when this kind of thing happens so we can determine if it was the end user that caused the problem or if there is a real defect. Jason. --
Ok I'm trying to understand the code as you describe it. As far
as I can see (in kgdb-light-v10) it is:
+ addr = kgdb_arch_pc(ks->ex_vector, ks->linux_regs);
+ kgdb_deactivate_sw_breakpoints();
+
+ /*
+ * If the break point removed ok at the place exception
+ * occurred, try to recover and print a warning to the end
+ * user because the user planted a breakpoint in a place that
+ * KGDB needs in order to function.
+ */
+ if (kgdb_remove_sw_break(addr) == 0) {
and
+static int kgdb_remove_sw_break(unsigned long addr)
+{
+ int i;
+
+ for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+ if ((kgdb_break[i].state == BP_SET) &&
+ (kgdb_break[i].bpt_addr == addr)) {
+ kgdb_break[i].state = BP_REMOVED;
+ return 0;
+ }
+ }
+ return -ENOENT;
correct?
I don't think that code does what you describe at all. Are you
sure we're talking about the same thing?
There is certainly no real protection against break points in
debugger code in there as far as I can see (except for the reentry
counter)
-Andi
--Perhaps we are talking about different things. I will agree that there is no protection against putting a breakpoint in some code that is part of the debugger. That was certainly not what I was talking about. You might be able to perform some protections at the breakpoint set time or request to single step, but that is much more complex than is worth it for the time being. The recursion check guards against basic stupidity or accidental stepping out of a frame you didn't mean to. Also there are a lot of non-obvious code paths that can get executed via kgdb such as the fault handlers. The simple recursion check covers enough cases that you don't want to live without it. If you want to improve it please provide some patches or further elaborate on what needs to be fixed. Jason --
I don't know -- I have not reread everything. Please don't consider my comments as approval of the code base. I still think it does quite a lot of dubious and ugly things overall and should get far more clean up Stopping all CPUs for indefinite time very much seems like "breaking a correctly working system" to me. In a correctly working system The only way I know of to do that is gdb vmlinux /proc/kcore For that gdb vmlinux /proc/kcore already works fine. Or fireproxy. The main complexity in module handling is handling (or rather preventing) module unload. I explicitely excluded that in my earlier mail. Module loading on the other hand tends to be relatively easy. I did a modular kernel debugger on my own some time ago and once the interfaces were clean it was very simple. I think the reverse is true too -- if having it as a module is easy then the interfaces are clean too. That is why I asked for it. It's a good basic If that is true then it is definitely misnamed and likely Yeah, it is consistently wrong agreed. -Andi --
well, this is a small detail, but still you are wrong, and on a correctly working system this will not occur. (if yes, tell me how) KGDB does a very straightforward "all CPUs enter controlled state" transition when the session begins, and at the end an "all CPUs continue" transition. I'm not sure what you mean exactly under "stopping all CPUs for indefinite amount of time" (your statement is sufficiently vague to be hard to counter via specifics) - that does not happen, unless the system is so buggy that a CPU is not able to process an NMI anymore [which is rather rare] - in that case the whole system is likely locked up anyway. In that case the simplest and safest behavior is what kgdb-light does currently: it will only proceed if all CPUs have responded. Note that you are wrong to suggest that "KGDB locks up", the system _has already locked up_. yes, we could "time out" and force a KGDB session even if some CPUs do not respond. But it's obviously not a completely safe system state, because other CPUs might be changing things under the feet of the debugger. So the safest first-level approach is to not enter the debugger in this case. Ingo --
Quite frankly, I don't see why the kernel kgdb layer should have *any* code like this at all. The one who is actually debugging is the one who should decide which CPU's get stopped, and which don't. I realize that the gdb remote protocol is probably a piece of crap and cannot handle that, but hey, that's not my problem, and more importantly, I don't think it's even a *remotely* valid reason for making bad decisions in the kernel. gdb was still open source last time I saw, and I think it's reasonable to just say: - the kgdb commands should always act on the *current* CPU only - add one command that says "switch over to CPU #n" which just releases the current CPU and sends an IPI to that CPU #n (no timeouts, no synchronous waiting, no nothing - it's like a "continue", but with a "try to get the other CPU to stop" Yes, other CPU's will obviously often end up stopping due to waiting for some spinlock or other if we stop one, but that's a separate issue, and quite often it might be sufficient - and what we want. And yes, you'd likely have to add some support to gdb to make this _usable_, but now all that usability crap, all those timeouts for "stop all CPU's" are now in user space on the _debugger_ side. That can be as fancy as it wants to be. And maybe this isn't realistic. I'm not saying "we _must_ do it this way", I just want to say that the kernel kgdb layer should be as thin ass humanly possible, and maybe the right thing to do is to simply totally punt on the whole "stop other cpu's" issue and make it a debugger-side question. In other words, is it perhaps possible to just *get*rid*of* that "kgdb_active" and "nmicallback" and the whole multi-CPU roundup? Just use a kgdb spinlock around the stuff that actually sends and receives individual packets, and expect the debugger side to sort them out (yeah, I suspect this involves having to add the CPU ID to each packet). Linus --
The problem I see here is that the kernel tends to get badly confused if one CPU just stops responding. At some point someone does an global IPI and that then hangs. You would need to hotunplug the CPU which is theoretically possible, but quite intrusive. Or maybe the "isolate CPUs in cpusets" frame work someone posted recently on l-k could be used. Still would probably have all kinds of tricky issues and races. -Andi --
Yes. A stopped CPU is very visible and hence can change the behaviour of I don't think you'd want to be poking around in kernel internals while some of the CPUs are continuing to run. It sounds rather creepy. You want everything to stop. Including time-related things. Bear in mind that one of the things you do with kgdb is to modify kernel memory - I'd do things like int foo; ... if (foo == 1) special_stuff(); ... to trigger a particular behaviour at a particular time. If you're making multiple changes, you want them "atomic" wrt all CPUs. (Of course, if you happeed to breakpoint one CPU while it was partway through reading multiple locations, you lose. But that's a teeny window). OT: another thing you can do with kgdb is error-path testing: foo = kmalloc(...) BP-> if (!foo) recover(); put a breakpoint on the !foo test and set foo to zero by hand. --
Hi -
Just for completeness, keep in mind that one can already
If "foo" is a global within a particular compliation unit, any old
function in that CU can be probed to set/get the global. (Setting
incoming function parameters works too.)
# cat delayed-set.stp
# set a systemtap script variable based on a /proc control file
probe procfs("activate").write { setit = $value }
global setit
# check the systemtap global in order to set the kernel global
probe kernel.function("foo_checking_fn") {
if (setit=="1") { setit = ""; $foo = 1 }
}
# stap -g -m ds delayed-set.stp &
# cat setit2.stp
probe kernel.statement("*@dir/function.c:222") { /*if (desired)*/ $foo = 0 }
# stap -g setit2.stp &
- FChE
--You don't even need systemtap: it has been always possible with gdb -write vmlinux /proc/kcore The only value add of the kgdb stub is really that you can single step too and do post mortem (although the later works fine using kdump/crash these days too) -Andi --
Note I wrote the one above only as a reply to Linus' proposal, not because I was advocating "live debugging" (or rather I think if you want that just use gdb vmlinux /proc/kcore) I agree with you in principle, but what do you do if one of the CPUs doesn't answer? Ingo seems to think it's ok for the debugger then to just hang too, I think it should eventually time out and debug anyways. Also there are some limits on how much the system should be frozen down. For example if you're strict about this requirement you would need to require full DMA quiescence (like kexec does). But that's obvious not a good idea for the debugger. So it's always shades of gray, not black/white. What I find strange with the current patch is that it goes to extreme measures to stop the CPUs (will hang forever), but not do the whole thing (DMA quiescence too) -Andi --
Andi, please refrain from misrepresenting my position, i can speak for myself. As i stated it before, i'm not against adding some configuration on roundup behavior as long as it's clearly add-on optional system policy. In fact i already implemented that in the kgdb-light tree. But if your goal is to create smoke where there is none, then feel free to continue to beat down on this poor strawman, ok? ;-) Meanwhile we'll continue to clean up kgdb to make it ready for an eventual (and IMO well deserved) .26 merge. All the useful bits of your comments have made kgdb cleaner - thanks for that. But please lets keep this constructive. Thanks, Ingo --
You're thinking about this totally *wrong*. You definitely do not want to hot-unplug or isolate anything at all. That's explicitly against the whole point of kgdb not changing what it is trying to measure. Just let the other CPU's hang naturally if they need to wait for IPI's etc. What's the downside? That's what you were trying to do in the first place by havign the kgdb callback! So you can't have it both ways. Either serializing other cpu's with kgdb is good (the whole "kgdb_nmicallback" thing or whatever it was called), in which case it's also perfectly ok to just let them stop when waiting for IPI's. My point was *not* that kgdb should take control of one CPU, and the other CPU's should continue to work as if nothing happened. That is insane and impossible (since you may be stopping a CPU while it holds central spinlocks etc). No, my point was that I think kgdb should be as light and non-intrusive as possible, and that any "higher level behaviour" (like the decision of whether to try to synchronize other CPU's or not) should be left to the debugger. But only if that makes kgdb patches less intrusive! In other words, I'm not at all trying to push any particular solution here, except for the "keep it simple, and anything even remotely debatable or intrusive to the system should be excised". And I wanted to point out that maybe all these timeout etc decisions can be pushed to the debugger. So I think we can either: - have no timeouts or other fancy crap _at_all_, with very simple locking (ie looks what v10 mostly seems to do) - or you do the fancy dance entirely in the remote debugger. I don't care. The only thing I care about is that kgdb support never _ever_ shows up in any interesting code, and that it remains totally invisible to essentially all of the kernel except the place that would otherwise print out an oops. And I absolutely don't want it to be fancy, I want it to be so simple that even _I_ can look at it and say "I...
I agree that it wouldn't be a good idea -- i was just pointing out There tend to be timeouts (e.g. softlock/nmi watchdog at least). I think some of the IPIs eventually time out too. In general losing a lot of time can lead to weird side effects. -Andi --
I do agree that kgdb and watchdogs aren't like to work well together. And I don't think you can sanely expect to not have a "disable lockup detection" when you start poking around with kgdb. We also just have to expect that time will also stop while sometbody is messing around with kgdb. So I don't dispute that any kernel debugger will *always* be intrusive in those ways. That's pretty inevitable. I just think the code itself can try to avoid hooking into various places all over the map. Linus --
i actually think that the notion of "stopping all system state" is rather intuitive from a debugging POV: when you have a bug trigger somewhere then getting an NMI to all CPUs and stopping them dead in their tracks preserves us the system in its most useful state. also, when using kgdb to "look at system state" it's best to have as little "unrelated" activity as possible. the timeout argument brought up by Andi was IMO really just pulled here by its hair, it never happened to me in practice and i was surprised it even came up. (i never before saw "KGDB roundup" fail - and i tried it on an 8-way system too) I think the memory-access routine cleanups were far more interesting from a failure scenario POV. Maybe Andrew could shed some light on his experience and preference here - he's been using KGDB for many years. Ingo --
on a second thought - i actually think it's rather possible and straightforward to do what you suggest. Stopping of all CPUs is still useful, but should be an optional property. I'll play with this a bit and see how GDB reacts. Ingo --
Stopping with interrupts off. Nothing scheduled anymore.
An easy definition for the condition is anything that requires
touch_{nmi,softlockup}_watchdog [which kgdb definitely does,
While that is a slight risk that problem is already there anyways.
Lots of agents in the system could do that. Do you plan to stop
all DMA too for example if you're so worried about this?
Or how about SMM code changing something?
Anyways the slight risk of the other CPUs eventually recovering
would seem a acceptable trade off versus not being able to use
the debugger to debug the system with hanging CPUs.
A possible compromise between my and your position on this would
be also having an option for this, with default to off
(although I would expect that would be a inconvenient default for
many people)
-Andi (who retires from this thread now, I already spent too much on this)
--Quite frankly, if kgdb starts doing somethign "fancy", there is no way I'll merge it. This includes things like having "breakpoint reservations" (discussed earlier) and just generally trying to add lots of infrastructure to make kgdb "fit in" to the kernel. The fact is, not having kgdb enabled should have _zero_ impact on the kernel, but the implication is also that when you *do* enable kgdb, that shouldn't change anything visible either: the rest of the kernel should simply not _possibly_ be able to even tell or care (which is why breakpoint reservations are out - they are against the whole point of debugging a unmodified kernel). So kgdb in my opinion will *have* to step on some other kernel infrastructure. I also think that things like timeouts are not something the client should do - if a kgdb lock never gets through, then it should be the debugging end that should just react to ^C and tough titties: it's not like there's a whole lot to be done. In other words: I think the kgdb patches have become a lot more palatable over the last week, but I think so exactly because they have gotten smaller and less invasive. Anything that evokes any discussion AT ALL should just be removed. It really should be that simple. [ The exception being that I think hw breakpoint support should be added back in - never mind that it won't work if the "native kernel" also uses them. Tough. If the debugger screws up the hw breakpoint state, that's a debugger error. I'd hope that the remote debugger *defaults* to just re-writing the instruction stream for that reason, but if you want to do a data breakpoint, you simply *have* to use the hw breakpoints for kgdb. And you just need to live with the fact that it simply won't be possible to do if the client wants to use hw breakpoints too. If the client starts using hw breakpoints - tough titties, it takes them. ] So keep the damn thing really simple, and don't try to handle every possible thing. W...
HW breakpoints (particularly data breakpoints) are extremely useful and I am proposing to put them back into kgdb-light. The HW breakpoint patch worked exactly as described. If the user space writes into the HW breakpoint registers via ptrace, those take priority over the kernel HW breakpoints. We have to assume that the person debugging the kernel ultimately has full control of the system, and will do what ever is needed to debug/inspect the kernel how he or she chooses. Of course at some future time if there is a generic API to use for the kernel level HW breakpoints, kgdb can adapt like anything else. Jason. --
yes, that's what i've been doing. When anything became questionable even just a little bit, it was the axe or i changed it to something really obvious and correct. by 2.6.26 kgdb-light will become so neutral to the rest of the kernel yeah. I believe we need to achieve a "known zero impact, 100% trustable" yes - if something locks up, it will be the NMI watchdog starting the debugger eventually - not the other way around. The NMI watchdog is already system policy which can be turned on/off and is well known and expresses the user's wishes with what should happen to the system. The debugger should really be a fundamentally very passive "console" type of thing, with as little direct policy as possible. That minimizes the chance that it accidentally messes up something and also makes it more likely that it will actually work reliably and dependably. Ingo --
I think that part is actually mostly ok now (old kgdb stubs were much worse in this regard) I still think the ultimative proof for this would be working While agreed in theory there are some exceptions: any time an oops happens or a crash dump would happen automatically kgdb should definitely do something very visible. Unfortunately that's not always the case The problem here is that the timeout we were talking about is not integrated into the kgdb event loop, it is rather the loop that protects the event loop. If you hang there all input If you want areally simple kgdb the best option would be re-porting the really simple (tm) 2.4 era x86-64 kgdb stub I did long ago. It had a small fraction of the code size of the current code, but was also missing a lot of features of course. -Andi --
To pick up this idea again I did the experimental patch below. It applies against Jason's latest kgdb-light patch queue: http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_... The patch nicely demonstrates what deeper dependencies on kernel services currently exist in kgdb-light. The following symbols were unresolvable: o genapic - for send_IPI_allbutself, ie. CPU roundup o machine_emergency_restart - for implementing "R0" gdb packet o idle_task - kgdb tells the per-cpu idle tasks apart o clocksource_touch_watchdog - obvious For simplicity reasons I just gpl-exported all of them. The result is a fully modular kgdb that may help to reduce concerns regarding its intrusiveness. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- arch/x86/kernel/Makefile | 4 +++- arch/x86/kernel/genapic_64.c | 1 + arch/x86/kernel/{kgdb.c => kgdb-x86.c} | 0 arch/x86/kernel/reboot.c | 1 + arch/x86/mach-generic/probe.c | 1 + kernel/Makefile | 1 - kernel/sched.c | 1 + kernel/time/clocksource.c | 1 + lib/Kconfig.kgdb | 2 +- {kernel => lib}/kgdb.c | 8 ++++++-- 10 files changed, 15 insertions(+), 5 deletions(-) rename arch/x86/kernel/{kgdb.c => kgdb-x86.c} (100%) rename {kernel => lib}/kgdb.c (100%) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4cd39cd..2e733b1 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -58,7 +58,6 @@ obj-$(CONFIG_MODULES) += module_$(BITS).o obj-$(CONFIG_ACPI_SRAT) += srat_32.o obj-$(CONFIG_EFI) += efi.o efi_$(BITS).o efi_stub_$(BITS).o obj-$(CONFIG_DOUBLEFAULT) += doublefault_32.o -obj-$(CONFIG_KGDB) += kgdb.o obj-$(CONFIG_VM86) += vm86_32.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o @@ -79,6 +78,9 @@ endif obj-$(CONFIG_SCx200) += scx200.o scx200-y...
Very nice! If it's that simple then the kgdb integration is really I would rather export some generic wrapper for that than the full genapic Hmm, might be a bit dangerous to call this directly -- there are various quirks with e.g. not rebooting on CPU #0 and not resetting APIC state. But ok [this is not directly related to the fact that it's exported now, just mentioning this in general] -Andi --
kgdb was never expected to be any better than using a sysrq-b. Unless you put some really ugly code in to deal with lots of edge cases there are certainly times it is just going to force you to press the power button. If there is a better function for "reset now and cross fingers" kgdb can be changed easily enough to do that. Also given that you can have more than one type of "R" packet it would be easy enough to add other reboot types, such as an R1 which might call to the bios or another means to reset. For now the intent is pretty clear and there is not likely a need to expand the scope. Jason. --
see? There's the difference between us. The initial merge of KGDB does not want to include policies for nuances that pose an "acceptable slight risk". Had we done it, you could have (rightfully in that case, i have to say) complained about that "slight but real danger" of a debugger activating yes, i said this the first time you mentioned this issue that while it's all no big deal, it could perhaps be a sensible (but default-off) layered-on option for later. But the initial merge is for obvious stuff and it does not want to do such things. Ingo --
Ok fine. Please implement DMA quiescence on kgdb entry then; otherwise the code won't conform to your standards at all. -Andi --
well, it does not take a mindreader to conclude that nothing we do could possibly result in you "concluding" that kgdb would be ready for a merge ;-) for me it suffices that you've run out of technical arguments and are Ingo --
Let's put it like this: the amount of problems I found by reading only about 20% of the kgdb patch justifies the statement above. Also contrary to your repeated claims many of those are not fixed yet as far as I can see. Also every time I dig into something new problems raise their head -- e.g. see earlier exchange with Jason about the likely bogus recursion check. Overall that's not a sign of a clean merge ready code base. -Andi --
There was work underway on that before (hw_breakpoint). I'm not entirely sure you want to use fancy stuff like that in kgdb. It's nice for kgdb to be as self-contained as possible, so you can debug everything else with it. If you're going to rely on lots of higher-level infrastructure, it could be using kprobes for its software breakpoint handling, too. At any rate, I think it would be good if the hw breakpoint support in kgdb were chopped out into a separate patch. First make kgdb work with no code touching debug registers at all. Then a second patch can add the hw breakpoint support. The latter one can be mulled over or replaced with a hw_breakpoint merge or just put off for a while because it's a big can of worms to multiplex the use of the debug registers. Thanks, Roland --
agreed, and i've done that now. Thanks, Ingo --
1) The content is valid no matter the formatting 2) It is a well known format 3) And it is widely used Using a wellknown format does not require it to be postprocessed. It is by far better than "one-format-for-each-developer". Sam --
They seem to have reverted these patches now, so that problem is gone. Yes there are a couple of other consumers of CFI too, like crash or the out of tree (but not dead) in kernel unwinder. -Andi --
| Martin Michlmayr | Network slowdown due to CFS |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Ingo Molnar | [git pull] x86 arch updates for v2.6.25 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Alexander Gladysh | [Q] Encrypted GIT? |
| Andreas Ericsson | Re: About git and the use of SHA-1 |
| Gerrit Pape | [PATCH/rfc] git-svn.perl: workaround assertions in svn library 1.5.0 |
| Matthieu Moy | git push to a non-bare repository |
| Christian Weisgerber | Re: libiconv problem |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Daniel Ouellet | identifying sparse files and get ride of them trick available? |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Ben Hutchings | Re: [GIT]: Networking |
