This is a linux port of the kernel debugger I wrote in 2000 for the MANOS/Gadugi Operating System. I created this particular port in June of this year from the MANOS/Gadugi source code I released under the GNU public license in 2000. I wrote the SMP debugger use in SMP Netware in 1994 and 1995, and that was later rolled into the main Netware kernel, though a lot of folks contributed helped merge it into Netware. This debugger closely resembles the legacy Netware kernel debugger, and I find it easier to use than kdb with less crashes and problems. This version is ia32 only at present, but I am completing x86_64 support and will post it as it is completed. I basically wrote this tool for my own internal use and for my projects since I could not find a debugger in linux I was used to. I add support to it as I need it for my own internal use. This linux port of my kernel debugger does not require kdb or the kdb hooks and is more minimal than kdb and has some features kdb does not, such as Intel style disassembly with dereferencing of data during disassembly and a very robust mathematical numeric support with conditional breakpoints. I created a far more robust version of this debugger in 2001 which included source level support, integrated screen and keyboard support, remote networking capability, and loader support and licensed it to another company. I was placed under a 5 year non-compete not to port this tool to Linux until end of year 2007. The folks who licensed it did absolutely nothing with it of consequence, and 2007 has come and gone, so I am released from the non-compete and decided to port the debugger from my old Open Source operating system and I figured it might be as useful to others as it has been for my projects. I will be posting user space modules which can be loaded with this version at some point which will enable source level debugging and a bunch of other features. This add ons may get farmed out to another company for support. KNOWN ...
This patch is formally submitted for consideration for inclusion in the base linux kernel. ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch Jeff --
Haven't actually looked, but you should've probably waited just a bit for people to start using and then getting fed up with kgdb... Rene. --
Formally submitted patches should be sent to the list inline. Reviewing something on an FTP server just becomes that much harder. josh --
Some non-technical comments to the patch series:
- Each patch posting in a patch series should have an own Subject and
changelog which specifically describes the included patch.
- The Developer's Certificate of Origin is written simply as a single
line:
Signed-off-by: Jeffrey Vernon Merkey <email@address>
This line needs to be included in the changelog of each patch, i.e.
precedes the diff. (Tools which harvest patches from mboxes are
trained to pick the changelog up from before the diff.)
- The MUA rewrapped some lines.
- File name and date of last change are redundant information and are
better left out of the source files.
- Understandably for a port from other kernels, there are clashes with
Linux kernel's coding style like CamelCase names, comment style,
indentations.
- Why define LONGLONG, WORD, BYTE and so on? They could be plain
unsigned char etc., or u8 etc. if you like it brief.
- Boolean values should be the standard true and false, not locally
defined TRUE and FALSE.
- Usually the #include's are not collected in an intermediary header
(as in patch 7/25) but put directly into the files which require
a particular #include.
I haven't looked in detail at the patches; it's far out of my area of
experience...
--
Stefan Richter
-=====-==--- =--- --=--
http://arcgraph.de/sr/
--
OK, Sounds like I get a D- on patch format submission. I will rework the patches, switch back to GPL2 (since I guess GPL 3 is still not there yet) and clean up this list of issues. ULONG, etc. is Microsoft syntax for cross platform compatibility. Since this is a LINUX SPECIFIC PATCH, I'll rip out and rework the Gates-isms in the code. All that aside, the damn works so at least folks can start using it while I perform code beautification. --
You're aware that the Microsoft assumption
typedef unsigned long ULONG
is not compatible with 64-bit platforms in the rest of the world?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
No I was not, but I am now. At any rate, I removed the Microsoft-isms from the code. I can cut yet another patch for git6, but git5 was there -- GPL2 and all. How about putting in into the kernel guys -- :-) Jeff --
Seriously? Because it doesn't seem to have had enough peer review, it hasn't had widespread testing in somewhere like linux-next or -mm, and we already have kgdb so you have to also explain why you can't improve kgdb in the areas it trails mdb. But the ideal outcome would be if you could contribute patches to kgdb to the point where it is as good as mdb. It is already in the tree and supported by a handful of architectures... any chance of that? (I don't know kernel debugger code, so I ask as an interested user) --
If you go back to LKML from 2000, this debugger has been around for 10 years. I agree not in the hands of the public, but its very mature I plan to work on kdb and yes, there is a version of this that runs as an alternate debugger of kdb - you can even switch back and forth between them - but that misses the point as well. I can wait untl its more widespread -- or not. Jeff --
OK I don't doubt that at all, but I just mean in terms of being reviewed by Linux people and how it merges with the current kernel (eg. we now That would be great if you do work on kgdb... But I guess I do miss the point, then. Is there a technical difference with kgdb that cannot be worked around, a difference of opinion with maintainers, a wish to have mdb features at short notice? --
Nick, its OK. There have been 27,453 downloads of the patches from my ftp server since yesterday when I osted it -- from what I am seeing people are voting with their feet. People can get it and I even posted it t SourceForge as well. After ten years of working on Linux I thougt it would be nice for something I wrote to end up there. It will happen when its time. As it stands, people are using it and it is going to help a lot --
That's all well and good :). But it didn't exactly answer my question. My question was not what is the point of you writing these patches, but what is the point of merging it into the kernel (over the alternatives). It may seem like a trivial question, but it is one that must be answered in order to be considered to get merged. --
Integrated kernel debugger in linux (minimal one) and given that there are already patches to add tickets and text to locks and other tools, one more can only help. This is by no means the full MDB debugger you have seen, just a pared down core I submitted. The entire MDB debugger is much larger. I have been working on it for ten years, and you may or may not have noticed, I typically do not ask many questions these days from the community for my appliance and router development, nor ask for help for any of the companies I have created and sold based on Linux over the past ten years since I have tools to fix my stuff without needing a hardware based inverse assembler like most folks need to debug hardware and file Jeff Jeff --
Nick, please note there is/was some mis-communication between the two of you with respect to kgdb, the currently merged GNU debugger interface, and kdb, the SGI kernel debugger. Merkey responded to you as if you asked about differences with KDB while what you did was ask about differences with KGDB. Both KDB and MDB are significantly different from KGDB at least in sofar that the latter is a remote debugger; it requires two machines. KDB and MDB are local. This makes KDB and MDB more accesible for small time use at least. The other most profound advantage is ofcourse that it's not GDB. Rene. --
Without public use, it's difficult to determine that there aren't any nasty interactions. If you want to maximize your chances of getting this code into the kernel, you might want to read Jonathan Corbet's post, "[PATCH] A development process document, V2". It discusses the normal process, how to prepare patches for submission, etc. Chris --
Read it already. Quite a few large companies are using it at present and have been since 2000, BTW. Jeff --
The criterion for kernel inclusion isn't really whether it works, however. It's whether other people would be able to understand it well enough to support it if you disappear (or if somebody else has changes that require changes to it). If it works well but isn't nice code, nobody really benefits from having it in the kernel distribution rather than external (like it's been for the past 8 years). If it is nice code (somewhat regardless of whether it happens to work right now), people can work on it and keep it in sync with the kernel as they change things. -Daniel *This .sig left intentionally blank* --
It both works and is nice code. But I may not be impartial. Jeff --
You activate it from the keyboard the same way as kdb -- pause/break or int X or exceptions. Jeff --
That's great, except kgdb has existed in the kernel for various architectures well before that as well. ppc32's stub dates back to 1998, sh had it since 2001, mips around the same time, etc, etc. While the current rework and tidying of the stubs is something new, kgdb itself is kgdb and kdb are totally different things, kgdb is what is generally available and worth improving in-kernel. While it's certainly good to have options, having multiple in-kernel debuggers is not going to help matters for the vast majority of users. I agree with Nick, it would be nice to see what we have in-kernel being extended and worked on by more people, especially those with a background in these things. On the other hand, it seems like there's sufficient interest in your project out-of-tree, so there's not really much point in merging it if you're content with the interface as it exists today and it continues to work for your users. One of the things we can do however is try to provide cleaner abstractions for the various debuggers to tie in to, so we don't end up with each debugger piling on its own set of ifdefs in all of the same places (int3 handling comes to mind, which you could already do more cleanly through the die chain today). Perhaps it would be more useful to see what sort of hooks mdb wants in the architecture and core code, how those overlap with kgdb, and how we might extend kgdb in areas where mdb is more feature complete. --
Not your call to make. Kernel Debuggers are very personal choices and its pure arrogance to assume any of us can make a choice for someone else with tools. My tastes in debuggers is like my tastes in food, or women, This is a great suggestion. mdb already uses an alternate debugger interface with the hooks into traps_XX.c and reboot_XX.c. I still would like to see it in kernel. but an alternate debugger interface as you point out is almost a necessity at this point. there's a good example in mdb.c and mdb-list.c. Jeff --
I don't think kgdb and a simple assembler debugger are directly comparable. kgdb always requires a remote machine, which has many advantages, but is also often very inconvenient or impossible to arrange. An low overhead assembler debugger can be always compiled in just in case. Also at least for the x86 port the debugger interfaces should be general enough now (see die hooks as a "debug vfs") that it would be quite possible to have a multitude of debuggers just using them. In fact that's already the cases, kprobes and kgdb and kdump are all kinds of debuggers using such hooks. As long as it doesn't impact the core code and the mdb code itself is considered merge worthy and has clean interfaces that would seem fine to me.It essentially would just live somewhere in its own directory using the existing interfaces. My standard test for seeing if a debugger has clean interfaces is to see if it can be loaded as a module. There are enough different debugging styles around that offering developers different tools of which they can pick whatever suits them is not a bad idea. Also as everyone knows debugging is often a major time eater and if more tools are available that can only help the kernel. That said I haven't read the mdb code, not judging on its general merge-worthiness or am really completely sure what are all the details of a "netware style debugger", just a general high level comment on debuggers. At least judging based on the patch sizes it at least doesn't seem particularly bloated. But of course it would need full proper review first. -Andi --
OK thanks for the info. I don't actually know debugger code as I said, so I wasn't against merging mdb if it offers things that kgdb fundamentally cannot. If so, then ensuring clean interfaces indeed would seem like a good first step to getting it merged. --
The competing implementation is kdb not kgdb. kgdb is just a stub for remote debugging using gdb. kdb is an in-kernel debugger like the one proposed here. --
Is there work underway to get kdb merged? (I'm just asking because I don't know; I personally don't need kdb nor mdb.) -- Stefan Richter -=====-==--- =--- --=== http://arcgraph.de/sr/ --
KDB still exists in patches but the merge effort was given up when Linus stated that he did not want a kernel debugger. No problem to start merge attempts again AFAICT. Jay? --
To merge KDB or any other RAS tools, you need to deal with kdump. Kdump hijack panic() before the die calling chain. For KDB or a RAS tool to work, an infrastructure such as the "add new notifier function" by Takenori Nagano should be in place. His last attempt fell short, in my opinion, was partly due to his "[PATCH 3/3] Move crash_kexec() into panic_notifier" did not do what it meant to do: to fit kexec/kdump into the new infrastructure. That is not fatal; it can be fixed to make it right. If community is interested in getting a kernel debugger to the kernel, we can continue Takenori's work. Once the infrastructure is accepted, then merging KDB or any other kernel debugger will make sense. Regards, --
As I look through entry_32.S and traps_32.c I do not see where kdump is hooking the notify_die handler which would intercept calls to a debugger. Where does kdump hook this path? --
kdump uses crash_kexec() call for hooking. It hooks in panic(), die_nmi() and die(). Thanks Vivek --
Imho kdump should just be fixed to use die chains. -Andi --
Well, we had that discussion several times. I'm not against it (instead, I would like it), but I don't think that repeating the discussion over and over does help ... Bernhard --=20 Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development
No, it was rejected with the argument that in panic case, as less code as possible should be executed before kexec'ing the panic kernel. See also: http://kerneltrap.org/node/14050 (for example) Bernhard -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
Violently agree, especially since the IA64 handling of NMI type events is significantly different from x86 and requires at least two callbacks via the die chain. Alas the kdump authors are adamant that they will not use die chains, which makes it almost impossible for any other RAS code to coexist with kdump. This intransigence on the part of kdump is one of the reasons that I gave up on getting _any_ RAS code (not just KDB) into the Linux kernel. See http://kerneltrap.org/node/14050 and http://marc.info/?l=linux-arch&m=116304508731232&w=2, the latter explains why you need die chains to handle IA64 correctly. x86 debugging is relatively easy, ia64 is hard due to interactions between the firmware and the OS, either can stop the other cpus. If your debugging framework does not handle ia64 INIT and MCA events, then you cannot debug most of the interesting ia64 events. In any case, we have gone round this loop too many times for me to care about it any more. I have given up on Linux RAS code. --
I am doing a quick source code grep and in all the cases except panic, kdump gets a chance to run in the end. We are running die notifications first. For example, in the case of nmi, in the case of traps, in the case of mce, notifier list is being executed first. So a debugger or any other RAS tool on the notifier chain will get a chance to run first. panic() is the only place where kdump gets a chance to run first and panic notifiers are not executed. To me so far only in kernel debugger seems to be a reasonable candiate which needs to run before kdump after a panic event. If a debugger is really getting merged into the kernel, then I think debugger can put a hook in the panic() before kdump. Wouldn't this solve the problem? Thanks Vivek --
To be fully clear panic() that is called outside oops/exception context Yes a kernel debugger should be able to hook into panic() In fact it can do that already by just setting a break point, but clearly having a real notifier is preferable. The use case would be then that the kernel debugger would kgdb is already merged. Also the x86 notifiers are general enough that there are a couple of debuggers floating around that are just using existing interfaces (as in need very little in terms Yes it would, but right now there is no such hook. Also if there was such a hook kdump could use it like everyone else. There's a priority scheme in notifiers so you can still run usually last. -Andi --
Agree. And here is another example of the need for such a hook: In a partitioned system [I work for SGI, so I'm talking about an Altix], there is memory sharing among multiple single-system images. And if one of those partitions were to panic the other partitions need to be informed that they cannot address the panic'd partition's memory. (Once that partition is rebooted any such access will cause an MCA in the accessor.) So the cross-partition driver (xpc) needs to run a callback there, too. It seems to me, as Keith has voiced, that it should be the user's choice -- Cliff Wickman Silicon Graphics, Inc. cpw@sgi.com (651) 683-3824 --
There are already existing shutdown hooks. Aren't they good enough for that? I would feel uneasy about having arbitary drivers hook into panic(). While I'm sure your code is great there is unfortunately a lot of crappy driver code around. -Andi --
I hooked panic last night and inserted a notify_die hook -- there is even a state defined for it already -- DIE_PANIC. The rest of the code should be ok. My only question was where to harvest the regs variable since panic is not a real exception. Here's a first stab. You also must add #include <linux/kdebug.h> to the top of panic as well. diff -Naur linux-2.6.27/kernel/panic.c linux-2.6.27-mdb/kernel/panic.c --- linux-2.6.27/kernel/panic.c 2008-08-07 15:32:29.000000000 -0600 +++ linux-2.6.27-mdb/kernel/panic.c 2008-08-07 15:29:09.000000000 -0600 @@ -82,6 +82,12 @@ printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf); bust_spinlocks(0); + // call the notify_die handler for any resident debuggers which + // may be active and pass the message string. On a software + // fault return at least some sort of regs for a remote debugger + // to look at. + notify_die(DIE_PANIC, buf, get_irq_regs(), 0, 0, 0); + /* * If we have crashed and we have a crash kernel loaded let it handle * everything else. Jeff --
For shutdown, yes. But on a panic crash_kexec() gets called That is Eric Biederman's concern as well. But it seems we should have a way for a user/customer to customize those events and their order, as I noted in a previous post. -- Cliff Wickman Silicon Graphics, Inc. cpw@sgi.com (651) 683-3824 --
Hi Andi, IIUC, there are two lists for exception and panic notifications. All the exceptios, NMI related notifications go through "die_chain" and all the panic notifications are done through "panic_notifier_list". Are you suggesting that kdump should be put onto panic_notifier_list, in such a way so that it runs last? Just few points to ponder. - panic_notifier_list is exported and any module can register and make use of it. As you mentioned in your other mail, there are lot of drivers out there with crappy code and if we do it, all the drivers get a chance to do stuff after panic() and there is no gurantee that kdump code will ever get a chance to run. - Kdump is built on the philosophy that after a panic(), one should do as as little as possible in the kernel and all the actions should be deferred to new kernel. That's why we recommend that all the panic notifier actions (except debugger), should be done in second kernel. It does introduce a little delay in notification but it also makes it more reliable. - Neil Horman, has already provided infrastructure so that one can put it user space code in second kernel's initrd and it will be executed. This can be easily done for modules also. But somehow nobody seems to be interested in doing things in second kernel and everybody wants to run its post panic code in the first kernel. So far, except debugger, we have not run into any strong case which needs to run post panic code in first kernel and things will not work out if post panic actions are taken in second kernel. That's why there is always resistance from our side to move kdump to panic notifier list so that we can make modules do the right thing and that is, run in second kernel. The moment kdump is put onto panic_notifier_list, nobody will think of doing anything in second kernel (because it takes extra effort). Everybody will register a panic notifier handler in first kernel and be happy.. If everybody thinks that they can do ...
In the case of the cross-partition driver, running panic notification in the second kernel is an interesting idea. I discussed it with Robin Holt, who is more knowledgable than I on the details of that driver, and he told me that there is a great deal of state information needed for the notification. It's easy to do in the first kernel, but extremely difficult in a second kernel. Couldn't we have some tunable flexability in that area, to determine should run on a panic, and in what order? --
KDB registers to the panic_notifier_list, but since crash_kexec() takes control early in panic(), the panic_notifier_list is essentially dead if kdump is chkconfig'ed on. I think a kernel debugger is not complete if it does not have an option to create a kernel dump. Unfortunately we have to tell KDB users to not chkconfig on kdump. I am working on KDB to allow KDB to co-exist with kdump. But it is done through a hack to place KDB ahead of crash-kexec(). It would be preferred to have a formal notifier_list. Regards, --
May be that's the way forward. Export the list of registered handlers on panic_notifier_list through sysfs or debugfs and also provide flexibility that user can change the priorities from userspace. That should work for all. Thanks Vivek --
The point was that kernel debuggers have an at least as legitimate need as kdump to run early on panic as kdump. In particularly they should run before kdump because kdump can be triggered from the debugger. But for modular kernel debuggers the hook would need to be exported, so in theory everyone could use it. In theory code review should catch that. Another alternative would be to readd the old namespaces patches I posted some time ago, this allowed to export symbols only to specific modules (but that would be also unfortunate for out of tree debuggers) Since we have nearly all other needed hooks for kernel debuggers anyways it doesn't really make sense to stop at panic. So this earlier requirements should be relaxed. Perhaps code review can solve the problem? -Andi --
I think given that so many people want kdump on panic_notifier_list, it would be worthwhile to experiment with the different approach. - Move kdump to panic_notifier_list. - Export panic_notifier_list to user space and provide flexibility so that a user can change the priorities of registered handlers dynamically. This will allow an admin to explicitly see who all are goint to run in what order in case of panic and also give him capability that he can choose to change the order. This kind of list should keep all the kind of users happy. Those who want to run all the other modules before kdump, they will be able to do so and those who don't want, they can boost the priority of kdump to put it ahead in the list. I think Takenori had some working patches in the past for this. Probably time to revisit the patches. (Somebody willing to look into it?). Thanks Vivek --
I found a problem with APIC NMI support which seems to affect all the debuggers, but appears machine specific -- at least I can reproduce it with all of the modules MDB, KDB, and KGDB modules on my ACER 2410 dual core laptop. It explains the mysterious hangs I would see in KDB all the time on SMP systems. The call: send_IPI_allbutself(vector) will hard hang an on ACER laptop with dual core processors if issued while any one of the processors are actively inside an INT 1 handler, then take a SECOND NMI inside of this path, and nest. It hangs the requesting (focus) processor during nested interrupts if a target processor is A) inside an INT 1 exception B) takes an NMI interrupt C) returns from the NMI back into the INT1 D) receives a second NMI. I am aware that a second NMI will not propagate to a processor currently servicing an NMI until the processor sees an IRET instruction (at least this is how intel worked years back). I have not been able to reproduce it on the Xeon based motherboards. I have seen the APIC bus hang this way on my other OS project -- when the APIC was programmed incorrectly, and assume it must be a bug in the APIC, how the APIC is programmed by Linux, etc. I am coding around the problem to prevent such convoluted nesting levels in MDB (this was from testing) but this was the final test for enabling SSB and all the fixes before I post and rc3 patch series which really cleanup up the code, and there's a mystery with send_IPI_allbutself(). Jeff --
A couple of laptop BIOS (e.g. some thinkpads) are unfortunately not NMI safe. There is no known workaround other than not using NMIs on these systems. There's unfortunately no global blacklist for these systems, although having would be useful for a couple of subsystems. -Andi --
I seem to have nailed down the "voodoo" sequence for reproducing it and the sequence of failure on the Acer 9410. Processors 0,1 first set a global breakpoint (schedule) and load registers DR6/DR7 0 -> trigger int1 breakpoint 1 -> trigger int1 breakpoint 0 -> get debugger lock 1 -> spin at debugger lock 0-> NMI all processors but self 1-> gets NMI while spinning at debugger lock 1-> enters NMI code loop and spins 0-> enter debugger console 0-> leave debugger console 0-> release spinning processors 1-> leave NMI code issues IRETD (returns to debugger spinlock and spins) 0-> release debugger lock 1-> get debugger lock 1-> NMI all processors but self ...hard hang in send_IPI_allbutself(APIC_DM_NMI).... If a delay is placed in the code that calls send_IPI_allbutself() that waits until processor 0 has left the int1 exception handler and issued an IRETD, then the hang does not occur. Seems to be the workaround for this problem. This problem seems specific to my Acer 9410 laptop, and as you described seems hardware related, though I am going to attempt to instrument a workaround for it anyway. Jeff --
Hi all, These are my latest patches. Any comments are welcome. http://www.gossamer-threads.com/lists/linux/kernel/909582 http://www.gossamer-threads.com/lists/linux/kernel/909581 http://www.gossamer-threads.com/lists/linux/kernel/909584 http://www.gossamer-threads.com/lists/linux/kernel/909583 Thanks, Takenori Nagano --
I don't consider them competing, just different tools for people from different development backgrounds. GNU and DOS/Windows. Jeff --
Yes, so Andi said a couple of days ago ;) --
That idea sounds familiar, the "suspend2" response, when something new and significantly different is offered, instead of putting it in and letting people choose in configuration, take the position that what is there is good enough, and if the author of the new solution will just drop all their ideas and slap some band-aids on the existing code it will be "gooder enough" without actually offering people a choice of something different. I totally agree with this, the whole idea of a remote machine implies In addition to "Bravo!" I will add that tools which work somewhat differently will increase the chances of having a tool which will work I would suggest that if it meets coding standards and doesn't break anything else it could be included in -mm (assume there's no objection there) and let people beat on it there, with the assumption that unless problems are found it will be promoted. The need for a special setup make spur-of-the-moment investigation of unusual behavior difficult for anyone but a hard-core developer who does daily work on a setup with the remote machine available at hand. I think this new approach would encourage people to do quick checks when the behavior is observed. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot --
To be fair, choice in "leaf" features like a debugger is not entirely comparable to choice in central features. If the infrastructure does not support all use cases reasonably well, its better to fix the infrastructure or replace it by a working one, rather than adding a second infrastructure which is also not general enough. In this case: Make a side-by-side comparison of features and shortcomings of the available debuggers (as in Andi's response), then decide how the best of both worlds can be achieved + used + maintained most easily --- by having both side-by-side, or by taking over some or all of one's features into the other. Either way requires contributors to be interested. -- Stefan Richter -=====-==--- =--- --==- http://arcgraph.de/sr/ --
It's a little too early for that. Right now it's at the phase "how to make it better integrate with the kernel", with the use of existing hooks, adding the needed hooks to be more complete, working as a module, etc. When that is done then the philosophical aspects can come into play, but it's not there yet. OG. --
I have removed the hooks into the /arch/x86 sections and converted the debugger to use kprobes and notify_die as Andi suggested. It also builds and loads as a module. One serious point has to do with NMI handling on SMP since the notify_die handlers use this priorty calling mechanism. I am still testing on SMP but it seems to work -- I just am a little uncomfortable with trusting an interface (notify_die) that can let someone come in and hook the NMI handlers when I MUST BE ABLE TO NMI AND HALT non-focus processors first. I am adding a special NMI state to the chain notifier to handle this case where IT MUST BE CALLED FIRST and IT MUST BE THE ONLY EVENT CALLED. I used the DIE_KERNELDEBUG to hook the keyboard handler in drivers/char/keyboard.c so we have the general hook into kprobes to handle enter debugger events. Jeff --
No. First try to integrate them together so you have the best of both from one code base is what I was saying. I specifically said if they are significantly different and can't be reconciled then it could be merged. --
It depends how you look at the problem. I would agree that the use of gdb + kgdb vs an assembly debugger are completely different cases. The kgdb core in the mainline kernel, can actually allow to write such a front end however. The kgdb core has an API for I/O and it is possible to write an I/O module that implements an in kernel assembly debugger. The kgdb test suite is not a great example, but it is a complete example of using the kgdb core directly without a second machine. If there is truly missing functionality from kgdb in terms of the way the kgdb core is used vs mdb, it would be good to at least consider what is missing. It is entirely possible to add functionality such that mdb could be implemented a kgdb I/O module. In this case you would be able to make use of zero runtime impact when a kgdb I/O module is not configure or make use of it as an early/late/ondemand I would agree that the possibility exists to use the hooks directly, and clearly the mdb code base as it stands in this patch set does not accomplish this. If one were to consider integrating mdb as a kgdb I/O module, it would have a greater degree of platform independence. The primary arch dependencies should be narrowed down to the back tracing / disassembly interface. The SMP / threading / breakpoints / exception handling, would all be shared between the debugger front ends that way. The mdb code base currently relies on re-implementing HW/SW breakpoints for each architecture you desire to support. Unifying some of the debugging technology is a noble goal where it makes sense to do so. Using some of the existing kernel hook points is a first pass requirement before a merge of mdb could be considered for the mainline kernel. Jason. --
> It depends how you look at the problem. I would agree that the use of Yes I left the possibility of a "someone writing a in kernel kgdb UI" out. Indeed that would be a possibility. On the other hand I'm not sure it would save all that much code versus just directly working on top of die notifiers. It's not just a possibility, they are already used by multiple debugger like subsystems. e.g. kprobes is certainly a kind of debugger. -Andi --
UPDATE: As per everyone's recommendations, the debugger has been fully module-ized, and I have run checkpatch.pl and am cleaning up the slew of messages checkpatch spits out of its tailpipe. It would be nice if checkpatch also could FIX those areas it complains about. I tested kprobes with NMI cross processor calls on SMP and I am unable to break it, and the module loads and unloads very well. There is a need for early initialization of the debugger if someone wants to debug kernel startup and I am including support for this with another.config option, but I am concerned about the reliance of kprobes on rcu and if this will break early init of the debugger. The code looks ok, but another set of eyes would be helpful when I post the next patch series. I will generate another patch series after I finished cleaning up the checkpatch.pl report. I am still going through it. Also, whoever wrote "/Documentation/volatiles_are_evil" must not have worked with the busted-ass GNU compiler that optimizes away global variables and busts SMP dependent code. I am not going to remove the volatile declarations needed for SMP coordination in the debugger since the code breaks when removed. GCC will cause massive breakage of SMP code if you do not declare certain variables as volatile. Whoever wrote that section doesn't understand low level SMP coding for operating systems design and aparently has not sent over a week running down an SMP bug only to discover it was caused by the busted-ass GCC compiler arbitrarily deciding to optimize away a low level flag used to signal between processors -- I have spent the time running down Stallman's bugs. That text should be removed from the kernel or qualified that its advertising for GCC's malfunctioning optimization code. Jeff --
The Linux way to handle this is to use gcc memory barriers. mb()/barrier()/wmb()/rmb()/smp_rmb()/smp_wmb() etc. Normally everything that volatile can do can be expressed by them. On x86 such a memory barrier tells gcc that memory might have been clobbered and needs to be flushed and also prevents the compiler from reordering memory accesses. On other architectures it also forces ordering on the CPU level, although that's not needed on x86 (except in some special situations like using write-combining) See Documentation/memory-barriers.txt -Andi --
Andi, I'll instrument this as described in the documentation you referenced and remove the volatile declarations. If this passes testing, I will repost with these corections. Jeff --
Take care though that neither memory barriers nor volatile are what you want if accesses need to be atomic on whatever given data structure. (E.g. bitfield manipulations, counter increments, accesses to virtually anything that is bigger than an integer or a pointer...) -- Stefan Richter -=====-==--- =--- --=== http://arcgraph.de/sr/ --
scripts/Lindent can at least help with some of the whitespace changes. It's long ago though that I used it myself, so I have no idea how well that works. -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ --
Well I think given the fact that kdb is not accepted by Linus, there is little possibility that mdb will be included in mainline kernel. Though I don't know why kgdb is acceptable. Regards Jason Xiao --
