I wonder where this convention of using lower numbers to indicate higher priorities comes from... It seems to be quite prevalent even though it's All right. However this means thread_struct will have to grow in order to hold each task's debug-register allocations and priorities. Would that be acceptable? (This might be a good reason to keep the number of bits down.) Another question: How can a program using the ptrace API ever give up a debug-register allocation? Disabling the register isn't sufficient; a user program should be able to store a watchpoint address in dr1, enable it in dr7, disable it in dr7, and then re-enable it in the expectation that the address stored in dr1 hasn't been overwritten. As far as I can see, ptrace-type allocations have to be permanent (until the task exits or execs, or uses some other to-be-determined API to do the de-allocation.) Alan Stern -
Well, noone seems to mind the wasted debugreg[5..6] words. ;-) I'm inclined to make thread_struct smaller than it is now by making it indirect (debugreg[8] -> one pointer). It feels like this would be pretty safe now that we have TIF_DEBUG anyway. Already nothing should be looking I hadn't really thought about this before, but it's pretty straightforward. Each allocation is for one of %dr[0-3] and for its associated bits in %dr7. %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc. %dr7 & (0x000f0003 << N) for %drN, I guess it is. ((((1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) | ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say. Each allocation owns those 38 bits (70 bits on x86_64). When all those bits are zero, the allocation is not active and might as well not be there (except for whatever semantics you might want to have about an allocation's lifetime as distinct from the event of resetting its contents). gdb already works this way: when it removes a watchpoint, it clears drN to zero when it zeros all the corresponding bits in dr7. (In fact it's in a separate call after changing dr7, because of the ptrace interface.) Your question made me think about the %dr6 issue, too, which I also hadn't thought about before. It looks like your patch handles this correctly, but it's a subtle point that I think warrants some comments in the code. When userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug and eventually looks at dr6 (via ptrace) to see what happened. Kernel watchpoints that come along after the user signal is generated can clobber the CPU %dr6 with new hits, but userland should not perceive this. This works out because what userland sees is thread.debugreg[6], the only place that sets it is do_debug, and a kwatch hit bails out before changing it. Any other new users of the debugreg sharing interface need to be cognizant of the %dr6 issue to avoid stepping on old ptrace uses. Thanks, Roland ...
Yes. In fact, the current existing code does not handle dr6 correctly. It never clears the register, which means you're likely to get into trouble when multiple breakpoints (or watchpoints) are enabled. Here's another complication -- and this is one I can't figure out any easy solutions for. The type of API we've been discussing will work well enough on UP systems, but what about SMP? I don't see any value in having a kernel watchpoint enabled on some CPUs and not others. But then what should happen when a debug register is in use by kwatch and a ptrace request on one CPU usurps it (leaving no other register in which to put it)? Not to mention the difficulties of keeping track of everything when the same kwatch watchpoint is stored in different debug registers on different CPUs. It's really quite a tricky matter. Should a register be allocated to kwatch only when no user process needs it? Should we really go about checking the requirements of every single process whenever a kwatch allocation request comes in? What if the processes which need a particular register aren't running -- should the register then be given to kwatch? What if one of those processes then does start running on one CPU? Alan Stern -
This is a subtle change from the existing ABI, in which userland has to clear %dr6 via ptrace itself. But gdb never does that AFAICT. So it's in fact subject to confusion when two watchpoints are set and the second hits after the first. So gdb ought to be fixed to clear dr6 via ptrace, to work with existing and older kernels. I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? So this gets to the more complicated view of %dr6 handling that I had first had in mind yesterday. Each allocation "owns" one of the low 4 bits in %dr6 too. Only the dr6 bits owned by the userland "raw" allocation (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. So when kwatch swallows a debug exception, it should mask off its bit from %dr6 in the CPU, but not clear %dr6 completely. That way you can have a sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one system call (including interrupt handlers), and when it gets to the To "go about checking the requirements of every single process" is not so hard as it sounds when they're recorded as a single global use count per slot, as your original code does. When you mentioned a "your allocation is available" callback, I was thinking it might come to that being called inside context switch. It's all rather tricky, indeed. The obvious answer is to start simple. ...
Going back to something you mentioned earlier... Yes, you are wrong -- although perhaps you shouldn't be. The current version of do_debug() clears dr7 when a debug interrupt is fielded. As a result, if a system call touches two watchpoint addresses in userspace only the first access will be noticed. This is probably a bug in do_debug(). It would be better to disable each individual userspace watchpoint as it is triggered (or even not to disable it at all). dr7 would be restored when the SIGTRAP is delivered. (But what if the user is blocking or ignoring SIGTRAP?) Moving on... I've worked out a plan for implementing combined user/kernel mode breakpoints and watchpoints (call them "debugpoints" as a catch-all term). It should work transparently with ptrace and should accomodate whatever scheme utrace decides to adopt. There won't need to be a separate kwatch facility on top of it; the new combined facility will handle debugpoints in both userspace and kernelspace. The idea is that callers can register and unregister a struct debugpoint, which contains fields for the type, length, address, and priority as well as three callback pointers (for installed, uninstalled, and triggered). The debug core will keep these structures sorted by priority and will allocate the available debug registers based on the priorities of the userspace and kernelspace requests. Each CPU will have its own array of pointers to these structures, indicating which debugpoints are currently enabled. To work with ptrace, the new scheme will completely virtualize the debug registers. So for example, a ptrace call might request a debugpoint in dr0, but it could end up that the physical register used is really dr2 instead. The various bits in dr6 and dr7 will be mapped in such a way that the entire procedure is transparent to the user. Debugpoints registered in kernelspace or by utrace won't care, of course. There are two things I am uncertain about: vm86 mode and kprobes. I...
The user blocking or ignoring it doesn't come up, because it's a force_sig_info call. However, a debugger will indeed swallow the signal through ptrace/utrace means. In ptrace, the dr7 is always going to get reset because there will always be a context switch out and back in that does it. But with utrace it's now possible to swallow the signal and keep going without a context switch (e.g. a breakpoint that is just doing logging but not stopping). So perhaps we should have a TIF_RESTORE_DR7 that goes into _TIF_WORK_MASK and gets handled in do_notify_resume (or maybe it's TIF_HWBKPT). You should not actually need to disable user watchpoints, because in data watchpoints the exception comes after the instruction completes. Only for instruction watchpoints does the exception come before the instruction executes, and no user watchpoints can be in the address range containing kernel code. SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t. All user watchpoints will be handled via the signal; this is the only way ptrace can report them, and is also the utrace way of doing things. do_debug can happen inside kernel code, and tracing of user-level tasks can only safely do anything at the point just before returning to user mode, where signals are handled. So, getting to send_sigtrap in do_debug is enough to say "one or more user debug exceptions happened". The %dr6 value that collects in the thread state to be seen by ptrace, or by utrace-based things using your new facility, needs to collect all the %dr6 bits that were set by the hardware and weren't consumed by kernel-level tracing. An eventual utrace-based thing might in fact have some other way to tie in so that the event details could just be in some call made by do_debug and not recorded in the thread's virtual %dr6 value. But at least for ptrace, they should collect there if it becomes possible for more than one exception to happen while in kernel mode or in a single user instruction. (A single ...
I think the best approach will be not to reset dr7 at all. Then there won't be any need to worry about restoring it. Leaving a userspace watchpoint enabled while running in the kernel ought not to matter; a system call shouldn't touch any address in userspace more than once or My idea was to put 4 hwbkpt structures in thread_struct, so they would always be available for use by ptrace. However it turned out not to be feasible to replace the debugreg array with something more sophisticated, because of conflicting declarations and problems with the ordering of #includes. So instead I have been forced to replace debugreg[] with a pointer to a structure which can be allocated as needed. This raises the possibility that a PTRACE syscall might fail because the allocation fails. Hopefully that won't be an issue? Alan Stern -
Hmm. That sounds reasonable. But I wonder why the old code clears %dr7. I think that's preferable anyway. Most tasks most of the time will never need that storage, so why not make thread_struct a little smaller? It's not a new issue, anyway, after utrace. The utrace-based ptrace can fail for PTRACE_ATTACH because of OOM too, which wasn't possible before. I think it's survivable. Thanks, Roland -
Roland and Prasanna: Here's my first attempt, lightly tested, at an hwbkpt implementation. It includes copious comments, so it shouldn't be too hard to figure out (if you read the files in the right order). The patch below is meant for 2.6.21-rc2; porting it to -mm shouldn't be very hard. There are still several loose ends and unanswered questions. I pretty much copied the existing code for handling vm86 mode and single-step exceptions, without fully understanding it. The code doesn't virtualize the BS (single-step) flag in DR6 for userspace. It could be added, but I wonder whether it is really needed. Unlike the existing code, DR7 is re-enabled upon returning from a debug interrupt. That means it doesn't have to be enabled when delivering a SIGTRAP. Setting user breakpoints on I/O ports should require permissions checking. I haven't tried to figure out how that works or how to implement it yet. It seems likely that some of the new routines should be marked "__kprobes", but I don't know which, or even what that annotation is supposed to mean. When CPUs go on- or off-line, their debug registers need to be initialized or cleared. I did a little bit of that, but more is needed. In particular, CPU hotplugging and kexec have to take this into account. The parts relating to kernel breakpoints could be made conditional on a Kconfig option. The amount of code space saved would be relatively small; I'm not sure that it would be worthwhile. Probably there are some more issues I haven't thought of. Anyway, let me know what you think. Alan Stern Index: 2.6.21-rc2/include/asm-i386/hwbkpt.h =================================================================== --- /dev/null +++ 2.6.21-rc2/include/asm-i386/hwbkpt.h @@ -0,0 +1,185 @@ +#ifndef _I386_HWBKPT_H +#define _I386_HWBKPT_H + +#include <linux/list.h> +#include <linux/types.h> + +/** + * struct hwbkpt - unified kernel/user-space hardware breakpoint + * @node: inte...
There is only one TF flag, it and the DR_STEP bit in dr6 are part of the unitary thread state. You should not be doing anything at all on I would just leave the I/O breakpoint feature out for the first cut. See if there is demand for it. It requires setting CR4.DE, which we don't do. The validation is relatively simple, comparing against the io_bitmap_ptr data (if any). You can check at insertion time (requiring doing ioperm before inserting an I/O breakpoint), and then check again at exception time if the hit was in kernel mode and ignore it for a user-only breakpoint, or perhaps check again and eject the breakpoint if it's been cleared from the That annotation is for code that is in the kprobes maintenance code path (where you cannot insert kprobes). do_debug has it for the notify_die path that might lead to kprobes. The only thing in your code that should need it is your notifier function, which might get called for an exception that turns out to be a kprobes single-step. There shouldn't In a utrace merge, the user parts can be made conditional on CONFIG_UTRACE. Then with both turned off, the code goes away completely. It's unlikely it will ever be turned off, but it is a clean way to go about things in case Save space in the struct by having just one function for both installed and uninstalled, taking an argument. Probably a caller should be able to pass a null function here to say that the registration call should fail if it can't be installed due to higher-priority or no-callback registrations existing, and that its registration cannot be ejected by another (i.e., an Leave this out. Let the caller embed struct hwbkpt in his larger struct These are the kinds of constraints utrace is there to make doable. (I'm assuming that guarantee is really "will not start running in user mode", since SIGKILL is always possible.) In a utrace merge, I think we want the only entry points for this to be a utrace-based interface that explicitly requires utrace's notion o...
Presumably you mean that hw-breakpoint.c shouldn't do anything at all on single-step exceptions. do_debug does have to know about them, because it has to know whether or not to send a SIGTRAP. Separation of duties; I It's easier simply to ignore the whole issue. :-) Fortunately many other So far I've been developing under 2.6.21-rc, which doesn't have utrace. But eventually this will be submitted by way of -mm, which does. The easiest approach would be to make the whole thing conditional on The actual guarantee I need is that nobody will switch_to() the task while Yes; I had in mind that the user-breakpoint part would be called only by utrace and/or ptrace. That's why the routines aren't EXPORTed. But I wrote it in more general form, to make it easier to understand. I'll add If someone really needs to do that, they can always put their own call to (un)register_kernel_hwbkpt() at the entry(exit) to the complex subsystems. Or perhaps it should be a job for systemtap, which would use hwbkpt to do That may be so, but the only way to access that part of the state is via ptrace. Think of it this way: The debug register settings really should not be part of the thread's virtual state. If we had some other, more logical API for managing breakpoints in a task then ptrace_bps[] wouldn't That would make things much more complicated. It's a lot easier to treat all breakpoints as though they are the same under the hood. Besides, the extra memory usage will show up only on threads that are Another thing to watch out for is that we would have to allow length-8 That doesn't seem quite right. What if you encounter _both_ a single-step exception and a breakpoint at the same time? Probably kprobes should always return NOTIFY_DONE. Which implies that do_debug needs to decide whether or not to issue SIGTRAP. Presumably the condition will be that any of the DR_STEP or DR_TRAPn bits remain set after the notifier chain has run. This means ...
You can't get that. It can always be woken for SIGKILL (which is a good thing). What you are guaranteed is that if it does, it will never return to user mode. So it has to be ok for switching in to use the bits in any intermediate state you might get them, meaning any possible garbage state is harmful only to user mode or is otherwise recoverable (worst case perhaps the exception handler has to know to ignore some traps). This is already true with ptrace and ->thread.debugreg, as well as the normal user registers. In your case, if you wanted to be paranoid you could clear TIF_DEBUG before you touch anything, and set it again only after you're But you don't have an option to avoid interrupting other CPUs to update, which is not necessary or desireable for this usage. That's what I was As things are in utrace, there will continue to be a utrace method of setting the (virtual) "raw" debugregs, even if ptrace per se is not involved. (So all I'm saying really is I'm on a personal campaign against the letter P.) OTOH, your point is well taken. Once your stuff is integrated, there is no real reason that thread-virtualized "raw" debug registers need to be accessible via utrace_regset. Perhaps I should drop it. Then those calls Yeah, I guess that's right. It should still return NOTIFY_STOP when args->err has no other bits set, so notifiers aren't called with zero. Thanks, Roland -
Guess I'll have to take that approach. The new additions to __switch_to() follow a linked list, and obviously it's not safe to do that while the I see your point. If you want to monitor a certain location in kernel space but only while in the context of a specific task, the new framework doesn't provide any way of doing it. One approach could be to use a regular kernel breakpoint and return immediately if the task of interest In practice that might not work. On my machine, at least, reads of DR6 return ones in all the reserved bit positions. Alan Stern -
Does that mean asm("mov %1,%%dr6; mov %%dr6,%0" : "=r" (mask) : "r" (0));
puts in mask the set of reserved bits? We could collect that value at CPU
startup and mask it off args->err, then OR it back into vdr6.
Thanks,
Roland
-That sounds like a rather fragile approach to avoiding a minimal amount of work. Debug exceptions don't occur very often, and when they do it won't matter too much if we go through some extra notifier-chain callouts. It turns out that this won't work correctly unless I use something stronger, like a spinlock or RCU. Either one seems like overkill. Is there any way to find out from within the switch_to_thread_hw_breakpoint routine whether the task is in this unusual state? (By which I mean the task is being debugged and the debugger hasn't told it to start running.) Would (tsk->exit_code == SIGKILL) work? If not, can we add a TIF_DEBUG_STOPPED flag? Or should I just go with a spinlock? Is SIGKILL the only way this can happen? In a similar vein, I need a reliable way to know whether a task has gone through exit_thread(). If it has, then its hw_breakpoint area has been deallocated and a new one must not be allocated. Will (tsk->flags & PF_EXITING) always be true once that happens? Alan Stern -
When single-stepping occurs it happens repeatedly many times, and that What is the problem with just clearing TIF_DEBUG? It just means that in the SIGKILL case, the dying thread won't switch in its local debugregs. The global kernel allocations will already be set in the processor from the previous context, and old user-address allocations do no harm since we That won't necessarily work. There isn't any cheap check that won't also If it's really necessary, but it hasn't proved so for any other switched per-thread state. As long as you aren't doing per-thread kernel-mode It should be, but there might be some stray wake_up_process calls in the kernel that can violate [up]trace's supposed monopoly on TASK_TRACED (or duopoly with SIGKILL, I suppose I should say). If there is no SIGKILL, then the task will just call schedule again nearly immediately to go back to blocking, which will switch out unless there is a second wakeup right PF_EXITING it set after there is no possibility of returning to user mode, but a while before exit_thread, when you might still want kernel-mode breakpoints. If the only per-thread allocations you support are for user mode, then you can certainly refuse to do any when PF_EXITING is set. Thanks, Roland -
Well, I can add in the test for 0, but finding the set of always-on bits in DR6 will have to be done separately. Isn't it possible that different Here's what might happen. Let's say T is being debugged by D, when somebody sends T a SIGKILL. CPU 0 does a __switch_to() to take care of terminating T, and since TIF_DEBUG is still set, it runs the switch_to_thread_hw_breakpoint() routine. The routine begins to install the debug registers for T. At that moment D, running on CPU 1, decides to unregister a breakpoint in T. Clearing TIF_DEBUG now doesn't do any good -- it's too late; CPU 0 has already tested it. CPU 1 goes in and alters the user breakpoint data, maybe even deallocating a breakpoint structure that CPU 0 is about to read. Not a good situation. What I need is some way for CPU 0 to know from the start that the current task is about to die, so it can avoid installing the user breakpoints altogether. Or else there has to be some sort of mutual exclusion so that No way to tell when a task being debugged is started up by anything other than its debugger? Hmmm, in that case maybe it would be better to use RCU. It won't add much overhead to anything but the code for registering Okay, that's easy enough. Below is the current version of the patch, against 2.6.21-rc3. The source file include/asm-i386/hw_breakpoint.h includes an example showing how to set up a kernel breakpoint. Alan Stern Index: 2.6.21-rc2/include/asm-i386/hw_breakpoint.h =================================================================== --- /dev/null +++ 2.6.21-rc2/include/asm-i386/hw_breakpoint.h @@ -0,0 +1,193 @@ +#ifndef _I386_HW_BREAKPOINT_H +#define _I386_HW_BREAKPOINT_H + +#include <linux/list.h> +#include <linux/types.h> + +/** + * + * struct hw_breakpoint - unified kernel/user-space hardware breakpoint + * @node: internal linked-list management + * @triggered: callback invoked when the breakpoint is hit + * @installed: callback invoked when the b...
I don't know, but it seems unlikely. AFAIK all CPUs are presumed to have
Indeed, it is for this sort of thing. Still, it feels like a bit too much
is going on in switch_to_thread_hw_breakpoint for the common case.
It seems to me it ought to be something as simple as this:
if (unlikely((thbi->want_dr7 &~ chbi->kdr7) != thbi->active_tdr7) {
/* Need to make some installed or uninstalled callbacks. */
if (thbi->active_tdr7 & chbi->kdr7)
uninstalled callbacks;
else
installed callbacks;
recompute active_dr7, want_dr7;
}
switch (thbi->active_bkpts) {
case 4:
set_debugreg(0, thbi->tdr[0]);
case 3:
set_debugreg(1, thbi->tdr[1]);
case 2:
set_debugreg(2, thbi->tdr[2]);
case 1:
set_debugreg(3, thbi->tdr[3]);
}
set_debugreg(7, chbi->kdr7 | thbi->active_tdr7);
Only in the unlikely case do you need to worry about synchronization,
whether it's RCU or spin locks or whatever. The idea is that breakpoint
installation when doing the fancy stuff would reduce it to "the four
breakpoints I would like, in order" (tdr[3] containing the highest-priority
one), and dr7 masks describing what dr7 you were using last time you were
running (active_tdr7), and that plus the enable bits you would like to have
set (want_dr7). The unlikely case is when the number of kernel debugregs
consumed changed since the last time you switched in, so you go recompute
active_tdr7. (Put the body of that if in another function.)
For the masks to work as I described, you need to use the same enable bit
(or both) for kernel and user allocations. It really doesn't matter which
one you use, since all of Linux is "local" for the sense of the dr7 enable
bits (i.e. you should just use DR_GLOBAL_ENABLE).
It's perfectly safe to access all this stuff while it might be getting
overwritten, and worst case you switch in some user breakpoints you didn't
want. That only happens in the SIGKILL case, when you never hit user mode
I don't unders...Yes, the code could be reworked by moving some of the data from the CPU hw-breakpoint info into the thread's info. I'll see how much simpler it It isn't quite that easy. Even though the number of user breakpoints may not have changed, their identities may have. So the unlikely case has to encompass two possibilities: the number of installable user breakpoints This shouldn't be necessary. So long as DR_GLOBAL_ENABLE always belongs to the kernel's part of DR7 and DR_LOCAL_ENABLE always belongs to the Maybe. I always had in the back of my mind the possibility that there might be a user I/O breakpoint set. It could be triggered by an interrupt handler even in the SIGKILL case. But since we're not supporting I/O Actually the code _doesn't_ already know what's there; the chbi area doesn't include any storage for the kernel DR7 value. I figured it was at least as easy to read it from the CPU register as to read it from memory. But maybe that's not true; according to my ancient processor manual, moves to/from debug registers take many more clock cycles than moves to/from Okay. That test was for situations when it's necessary to install only the kernel's debug registers, with no thread registers. It can easily I was being overly cautious. If interrupts were enabled then it might be possible to trip a user I/O breakpoint. But since they aren't, that code No. If a debugger has removed some user breakpoints since the last time the thread ran, the chbi->bps[] entries could still be present. Likewise if the previously-running task had more breakpoints than the current one. Of course it doesn't actually hurt to have stale pointers lying around; they refer to breakpoints which aren't enabled. So clearing them isn't This proposed change is wrong. Remember that ptrace breakpoints are virtualized; they are assigned to different debug registers from what the I don't like using DR_LEN_1, because it would force asm/debugreg.h to be #included by any user o...
I don't quite understand that characterization of the kind of change I'm advocating. If the common case path in context switch has really anything Why does it matter? When a new user breakpoint was made the highest-priority one, it ought to update tdr[0..3] right then before the registration call returns. It seems fine to me for it to make an uninstalled callback right away rather than at the thread's next switch-in. But even if you wanted to delay it, you could just set active_dr7 to zero The plan I suggested relies on setting want_dr7 with the enable bits that do include the ones the kernel uses (for contested slots). Of course it works as well to use either bit for this, as long as you're consistent. But as I've said at least twice already, there is no actual meaning whatsoever to choosing one enable bit over the other. It's just confusing and misleading to have the code make special efforts to set one rather than the other for different cases. You talk about them as if they meant something, which keeps making me wonder if you're confused. Since the hardware doesn't care which bit you set, you could overload them to record a bit and a half of information there if really wanted to, but you're not How would that happen? This would mean that some user process has been allowed to enable ioperm for some io port that kernel drivers also send to The purpose of the chbi area is to optimize this path. Make it store whatever precomputed values are most convenient for the hot paths. This path doesn't need num_kbps, it needs kdr7. So precompute that and do that one load, instead of a load of chbi->num_bkps we don't otherwise need plus a load from kdr7_masks that can be avoided altogether on hot paths. I don't really know about the slowness of reading debug registers, though I would guess it is slower than most common operations. But regardless, you can avoid it because kdr7 is something you need anyway, so you're not I don't really get why user breakpoints would be i...
Roland:
Here's the most recent version of the hw-breakpoint patch. Unlike the
version I posted last week, this one actually works with 2.6.21-rc4.
Alan Stern
Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,17 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* Available HW breakpoint lengths */
+#define HW_BREAKPOINT_LEN_1 1
+#define HW_BREAKPOINT_LEN_2 2
+#define HW_BREAKPOINT_LEN_4 4
+#define HW_BREAKPOINT_LEN_EXECUTE 1
+
+/* Available HW breakpoint types */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#endif /* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
#include <asm/tlbflush.h>
#include <asm/cpu.h>
#include <asm/pda.h>
+#include <asm/debugreg.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
*/
void exit_thread(void)
{
+ struct task_struct *tsk = current;
+
/* The process may have allocated an io port bitmap... nuke it. */
if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
- struct task_struct *tsk = current;
struct thread_struct *t = &tsk->thread;
int cpu = get_cpu();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,17 @@ void exit_thread(void)
tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
put_cpu();
}
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
}
void flush_thread(void)
{
struct task_st...Roland: Here's the next update. I haven't tried testing it yet; this is just to get your opinion. I implemented most of the changes we discussed. Ignoring the length for execute breakpoints turned out not to be a good idea because it would affect the way ptrace works, so the code verifies it just like any other kind of breakpoint. I also decided against adding a .bits member. It doesn't really gain very much; the savings in encoding the breakpoint values is trivial -- one line of code on i386. And it helps to have the original length and type values available for use by the ptrace routines. In fact, I decided to add a superfluous bit to the type code. That's to help disambiguate between length and type values; it's easy to mix the two of them up. Likewise, the length macros don't give the encoded values; that's so people can just specify the length directly instead of using the macro. There's a small question about the value of the error_code argument for send_sigtrap(). The value passed into do_debug() isn't available in ptrace_triggered() -- but since it is always 0, that's what I'm using. I'm not sure what it's supposed to mean anyway. Anyway, this version seems to be a fair amount cleaner than the previous. See what you think. Alan Stern Index: usb-2.6/include/asm-i386/hw_breakpoint.h =================================================================== --- /dev/null +++ usb-2.6/include/asm-i386/hw_breakpoint.h @@ -0,0 +1,17 @@ +#ifndef _I386_HW_BREAKPOINT_H +#define _I386_HW_BREAKPOINT_H + +#include <asm-generic/hw_breakpoint.h> + +/* Available HW breakpoint lengths */ +#define HW_BREAKPOINT_LEN_1 1 +#define HW_BREAKPOINT_LEN_2 2 +#define HW_BREAKPOINT_LEN_4 4 +#define HW_BREAKPOINT_LEN_EXECUTE 1 + +/* Available HW breakpoint types */ +#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */ +#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */ +#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read ...
It's easy to explain: Your sample code had a tdr[] array in the thread's hw_breakpoint area and my patch did not; adding it amounts to moving some That's basically what I intend to do. Although instead of keeping track of an active_dr7, I'll keep track of num_kbps as of the last time the thread ran. The unlikely case can be triggered by setting the value to No, I'm not confused and neither are you. I realize there's no functional difference between the two sets of enable bits, since Linux doesn't use hardware task-switching. I just like to keep things neatly separated, I haven't checked the ioperm code to be certain, but it seems like the sort of thing somebody might want to do on occasion. Another aspect perhaps worth mentioning is that user breakpoints are active only when the task is running. Hence breakpoints for user data affected by AIO operations won't necessarily be triggered; with async I/O the transfers can occur while a different task is running. Of course there's nothing new about this; the same has been true for ptrace all I will endeavor to optimize switch_to_thread_hw_breakpoint. However it will turn out that the hot patch really does to use chbi->num_kbps and Not if you have interrupts disabled. Debug register settings are disseminated from one CPU to the others by means of an IPI. Alan Stern -
Sorry I've been slow in responding to your most recent version. I don't really know what more to say. I like things neatly separated too, most especially between clearly meaningful and misleading yet apparently meaningful. Overloading unrelated hardware bits for no material purpose I checked the ioperm code. As far as I can tell, if you have CAP_SYS_RAWIO then you can use ioperm to enable any port you want, so this will always be But the callback is not per-CPU, it is per-registration. When a new registration displaces an old one, or a deregistration stops displacing one, isn't the status change in the data structure and the callback made right away, by the thread doing the registration/deregistration call? Another CPU with interrupts disabled won't have its breakpoints actually I think this is a mistake. On other machines, there is no need for more than one field and .len will go unused. There is no reason not to encode ahead of time. If it's useful for callers to be able to extract the type and length from a struct hw_breakpoint, it's easy to define macros or There's no reason __modify_user_hw_breakpoint can't use an encoded value. It doesn't hurt to add some constant bits to the API values that you just mask out (after checking) as part of the encoding convsersion. In fact, I object to this. The purpose of having macros is to give a constrained set of values that can be used at compile time. Letting people use literal numbers is just asking for people to write their calls unportably. For machines that support only LEN8, I'd planned to define it to some magic bit pattern just so it could be checked and rule out cavalier users who task->thread.trap_no and task->thread.error_code usually store the values from the hardware trap frame when a trap is turned into a signal (e.g. see do_trap). This is the hardware trap number, which is 1 for do_debug. The error code is a word of the hardware trap frame whose meaning is specified differently for each trap n...
Has the same thing happened again? There hasn't been any feedback on the most recent version of hw_breakpoint emailed on April 13: http://marc.info/?l=linux-kernel&m=117661223820357&w=2 I think there are probably still a few small things wrong with it. For instance, the RF setting isn't right; I misunderstood the Intel manual. It should get set only when the latest debug interrupt was for an instruction breakpoint. Alan Stern -
This makes me think about RF a little more. If you ever set it, there are some places we need to clear it too. That is, when the PC is being changed before returning to user mode, which is in signals and in ptrace. If the PC is changing to other than the breakpoint location hit by the handler that set RF, we need clear RF so that the first instruction at the changed PC can be a breakpoint hit of its own and not get masked. In fact, it may also be necessary to clear RF when freshly setting a new instruction breakpoint (when RF is set because the stop was not a debug exception at I sort of wondered from the beginning why it was there. The rationale I can see is to avoid flutter. That is, when unregistering frees up a slot for a lower-priority allocation waiting in the wings, and then the new registration will just displace it again. The priority list diddling is wasted work to get back to just how it was before, but more importantly you don't want to have those callbacks for a momentarily-available slot coming The documentation I have says that RF is set in the trap frame on the stack (i.e. pt_regs.eflags) by every other kind of exception. However, for a debug exception that is due to an instruction breakpoint, RF=0 in the trap frame and the manual explicitly says that the handler must set the bit so that iret will resume and execute it rather than hit the breakpoint again. Is this really "some CPUs"? Or is it actually always as I described above The important thing is that there aren't any difficult races (i.e. what you get with callbacks). If register with no callback followed by unregister on seeing "registered but not installed" return value is simple and cheap, It's no less meaningful than for a kernel allocation. In neither case is there a guarantee you'll keep it forever. What callers I had in mind want is a quick answer when the answer is negative at the time of the call, so It looks right to me. That is, it preserves the existing behavior for ...
I took a look at seqlock.h. It turns out not to be a good match for my requirements; the header file specifically says that it won't work with data that contains pointers. But changing over to regular 32-bit sequence numbers was straightforward. The "switching"/"restart" stuff doesn't need memory barriers because all the communication is between two routines on the same CPU. Nor are memory barriers needed in the rest of the code for the kernel breakpoint updates; the IPI mechanism already provides its own. However there is one oddball case which does seem to require a memory barrier: when a new CPU comes online (either for the first time or during return from hibernation). There's a hook to load the initial debug register values, and it runs in an atomic context so I can't grab the mutex. The hook is called in two places: arch/i386/power/cpu.c: fix_processor_context(), and arch/i386/kernel/smpboot.c: start_secondary(). A memory barrier is necessary to avoid chaos if another CPU should happen to update the kernel breakpoint settings at the same time. If Okay, if I don't worry about machines with two sets of code & data debug registers (HB_TYPES_NUM = 2) then yes, quite a lot of the code is sharable. There will be a few arch-specific hooks to: Store the values into the debug registers; Take care of the DR7 calculations; Do address limit verification (see whether a pointer lies in user space or kernel space). Nothing more seems to be needed. Then there will be unsharable code, including: Dumping the debug registers while creating an aout-type core image; All the legacy ptrace stuff; The notify-handler itself. How should this be arranged so that it can build okay on all platforms, even ones where the low-level support code hasn't been written? Maybe an arch-dependent CONFIG_HW_BREAKPOINT option? Alan Stern -
It seems to me that signal handlers must run with a copy of the original EFLAGS stored on the stack. Otherwise, when the handler returned the former context wouldn't be fully restored. But I don't know enough about the signal handling code to see how to turn off RF in the stored EFLAGS image. Also, what if the signal handler was entered as a result of encountering an instruction breakpoint? In that case you would want to keep RF on to prevent an infinite loop. You're right about wanting to clear RF when changing the PC via ptrace or when setting a new execution breakpoint (provided the new breakpoint's address is equal to the current PC value). Do you know how gdb handles instruction breakpoints, and in particular, That may be what I originally had in mind; I no longer remember. But it doesn't matter. We're up against an API incompatibility here. gdb doesn't allow you to modify breakpoints; it forces you to delete the old one and add a new one. It's only an artifact of the x86 architecture that gdb implements this by reusing debug registers. So even if the modify_user_hw_breakpoint() routine were kept, gdb wouldn't really want to make use of it. On the 386, either GE or LE had to be set for DR breakpoints to work properly. Later on (I don't remember if it was in the 486 or the Pentium) this restriction was removed. I don't know whether those bits do anything My 80386 Programmer's Reference Manual says: ... an instruction-address breakpoint exception is a fault. And: When it detects a fault, the processor automatically sets RF in the flags image that it pushes onto the stack. And: The processor automatically sets RF in the EFLAGS image on the stack before entry into any fault handler. Upon entry into the fault handler for instruction address breakpoints, for example, RF is set in the EFLAGS image on the stack... That seems to be pretty clear. So the behavior can vary according to the I suppose you might register a breakp...
Of course. I'm talking about how the registers get changed to set up the signal handler to start running, not how the interrupted registers are saved on the user stack. There is no issue with the stored eflags image; This does not happen in reality. Breakpoints can only be set by the Starting a signal handler is "warping the PC" equivalent to changing it via ptrace for purposes of this discussion. In case the new PC is the site of AFAICT it never actually uses hardware instruction breakpoints, only data watchpoints. I wouldn't be surprised if noone has ever really used instruction breakpoint settings in x86 hardware debug registers on Linux. (Frankly, I don't much expect them to start either. This level of detail about instruction breakpoints is largely academic. I am a stickler for getting the details right if we're going to allow using them at all. I'm moderately sure they do nothing on modern CPUs. Intel says they're ignored as of Pentium, but recommends setting both bits if you care at all. In practice, I don't think we'll ever hear about the inexactness on a pre-Pentium processor from not setting the bits. But I'd follow the Intel The earlier quote I gave was from an AMD64 manual. A 1995 Intel manual I have says, "All Intel Architecture processors manage the RF flag as follows," and proceeds to give the "all faults except instruction breakpoint" behavior I quoted from the AMD manual earlier. Hence I sincerely doubt that this varies among Intel and AMD processors. Someone else will have to help us know about other makers' processors. So far I have no reason to suspect that Yes, that is really the kind of thing I had in mind. For user breakpoints it I think that is what I suggested an iteration or two ago. Installing new state means making a fresh data structure and installing a pointer to it, I guess my main objection to having .type and .len is the false implied documentation of their presence and names, leading to people thinking they can l...
Ah, okay. Yes, clearly the new EFLAGS for the signal handler should have RF turned off. This should always be true, regardless of Hmmm. I put in a little extra code to account for the possibility that a program might want to set hardware breakpoints in itself. Should The fact that they are machine-specific and implementation-specific No; I have tested only a couple of systems and I don't have a wide Allow me to rephrase: When a debug exception occurs, the real DR6 value should be copied to vdr6, except that kprobes should adjust DR_STEP and hw_breakpoint should adjust the DR_TRAPn bits appropriately. There's some question about what value the debug exception handler should write back to DR6, if anything. When switching to a new task, the DR_TRAPn bits in vdr6 could be de-virtualized somehow and the result loaded into DR6, but again, it might be safest to leave DR6 alone. As for what users expect of the low four bits, you are definitely wrong. My tests with gdb show that it relies on the CPU to clear those bits whenever a data breakpoint is hit; it doesn't clear them itself and it doesn't work properly if the kernel keeps virtualized versions of them set. That's on a Pentium 4 and on an AMD Duron. I did some testing to see how the CPU behaves when the debug handler writes different values back to DR6. The results were: Values written back to DR6 were retained in the register until the next debug exception occurred. When the exception handler read DR6, the 0xffff0ff0 bits were set every time. The 0x00001000 bit was never set, even if it had been turned on before the exception occurred. No matter what values were stored in the low four bits beforehand, when the exception occurred DR6 had only the bit for the debug register which was triggered. If the handler wrote back any of BS, BT, or BD to DR6, then the system misbehaved. I don't know exactly what happened, but my shell process ended and the debug handler got called over and ove...
Do you just mean a register_hw_breakpoint call made on current? That certainly ought to work. That's still "the debugger", i.e. in utracespeak the tracing engine. My point was that there will never be a facility intended for a program to use hw_breakpoint to generate a signal that gets delivered to a handler in the vanilla way. There's always some "outside" agent who asked for the breakpoint and who is responsible for responding to the traps it causes, never the program itself so as it would make sense for The code in bp_show is exactly the kind of wrong I want to prevent. When I say they are machine-specific and implementation-specific, I mean there is no specified part of the interface to which you can presume they correspond directly. The powerpc implementation will not have any field that is set to HW_BREAKPOINT_LEN_8 and may well have none set to the type macros either. If you want to have some machine-specific macros or inlines to Ok. We were both going on what the manual said and I was assuming that Ok. This makes the users' expectations make sense. Maybe we can get the Intel and AMD people to change the manual not to be misleading about this (it says something terse about "never clears" and without more details I read it as "never clears any bit, ever"). What about DR_STEP? i.e., if DR_STEP was set from a single-step and then there was a DR_TRAPn debug exception, is DR_STEP still set? If DR_TRAPn Agreed. I suspect clearing it to zero is the right thing (given what the hardware manuals say), even if it appears that DR_STEP and DR_TRAPn do I think you should post that little patch (and equivalent for x86_64) by There is no black magic about that, it's just saying that seqlock/seqcount does not do any implicit synchronization with your data structure management. If the pointers in question are protected by RCU, there is no problem (if your read_seqcount_retry loop is inside rcu_read_lock). Since the caller supplies the pointers, not requir...
I really don't understand your point here. What's wrong with bp_show? Is it all the preprocessor conditionals? I thought that was how we had agreed portable code should determine which types and lengths were supported on a particular architecture. Consider that the definition of struct hw_breakpoint is in include/asm-generic/. Hence .type and .len are guaranteed to be present on all architectures; we can't just leave them out on some while including them on others. In particular, .len _will_ always be equal to HW_BREAKPOINT_LEN_8 on PPC. (Of course, you're always free to define HW_BREAKPOINT_LEN_8 as 0 in the arch-specific header file if you want, so this doesn't mean as much as it might seem.) Consider also that .type and .len impose no overhead on architectures that don't care about them. The space they use up would be wasted otherwise. It seems that what you want would complicate the x86 implementations significantly without offering any real benefit to others. The one thing which makes sense to me is that some architectures might want to store type and/or length bits in along with the address field. So I added documentation explaining that there may be arch-specific changes to .address while a breakpoint is registered, and I added arch-specific accessors to fetch the true address value. There are I didn't experiment with using DR_STEP. There wasn't any simple way to cause a single-step exception. Perhaps if I were more familiar with Even more surprising was that it stopped and settled back down to normal after a little while! I'm not accustomed to seeing infinite In fact, I don't need the seqcount stuff at all. Just about everything it provides is already covered by RCU. One of the secrets is to move The other secret is to have shared access only to the global data in the RCU-protected structure, which means storing an array of pointers to the highest-priority kernel breakpoints there, as you suggest. The data which gets updated in ...
That part is fine. The problem is fetching the hw_breakpoint.len field
directly and expecting it to contain the API values. In an implementation
done as I've been referring to, there is no need for any field to contain
the HW_BREAKPOINT_LEN_8 value, and it's a waste to store one. If it were
Indeed, that is the natural thing (and all the bits needed) on several.
I hadn't raised this before since I was having so much trouble already
convincing you about storing things in machine-dependent fashion so that
users cannot just use the struct fields directly.
I really think it would be cleanest all around to use just:
struct arch_hw_breakpoint info;
in place of address union, len, type in struct hw_breakpoint. Then each
arch provides hw_breakpoint_get_{kaddr,uaddr,len,type} inlines. For
storing, each arch can define hw_breakpoint_init(addr, len, type) (or
maybe k/u variants). This can be used by callers directly if you want to
keep register_hw_breakpoint to one argument, or could just be internal if
register_hw_breakpoint takes the three more args. If callers use it
directly, there can also be an INIT_ARCH_HW_BREAKPOINT(addr, len, type)
for use in struct hw_breakpoint_init initializers.
On x86 use:
struct arch_hw_breakpoint_info {
union {
const void *kernel;
const void __user *user;
unsigned long va;
} address;
u8 len;
u8 type;
} __attribute__((packed));
It's easy for user mode with gdb. kprobes is simple to use, and it
always does a single-step to execute (a copy of) the instruction that
was overwritten with the breakpoint. So, write a module that does:
int testvar=0;
asm(".globl testme; testme: movl $17,testvar; ret");
void testme();
testinit() {
... register kprobe at &testme ...
... register hw_breakpoint at &testvar ...
testme()
}
Your kprobe handlers don't have to actually do anything at all, if you
are just hacking the low-level code so see what %dr6 values you get at
I still think it's the proper..."A waste to store one"? Waste of what? It isn't a waste of space; the It is now possible for an implementation to store things in a machine-dependent fashion; I have added accessor routines as you suggested. But I also left the fields as they were; the documentation mentions that they won't necessarily contain any particular values. You might want to examine the check in validate_settings() for address alignment; it might not be valid if other values get stored in the low-order bits of the address. This is a tricky point; it's not safe to mix bits around unless you know that the data values are correct, Maybe. I don't see any reason for the unnecessary encapsulation, Yes, of course. I feel foolish for having forgotten. Tests show that my CPU does not clear DR_STEP when a data breakpoint is hit. Conversely, the DR_TRAPn bits are cleared even when a single-step exception occurs. The bizarre behavior from before is still present; the system gets in a long loop when the exception handler leaves any of the 0xe000 bits set in DR6. And it kills my shell process, probably by sending it a SIGTRAP. Oddly enough, this only happens when there's a kernel-space debug exception -- faults in user-space continue to work normally. It's not clear what this means; the behavior indicates a software problem but the dependency on the DR6 value indicates a hardware contribution as well... If you're interested, I can send you the code I used to do this testing We have three things to consider: ptrace, utrace, and hw-breakpoint. Ultimately hw-breakpoint should become part of utrace; we might not want to bother with a standalone version. Furthermore, hw-breakpoint takes over the ptrace's mechanism for breakpoint handling. If we want to allow a configuration where ptrace is present and hw-breakpoint isn't, then I would have to add an alternate implementation containing only support for the legacy interface. It doesn't have to be done now, but it is something to bea...
I added this on top of your patch to make it compile (and look a little nicer).
With that, bptest worked nicely.
---
arch/i386/kernel/kprobes.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
Index: b/arch/i386/kernel/kprobes.c
===================================================================
--- a/arch/i386/kernel/kprobes.c
+++ b/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
#include <asm/cacheflush.h>
#include <asm/desc.h>
#include <asm/uaccess.h>
+#include <asm/debugreg.h>
void jprobe_return_end(void);
@@ -660,16 +661,16 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
-
- /* The DR6 value is stored in args->err */
-#define DR6 (args->err)
-
- if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
- if ((DR6 & ~DR_STEP) == 0)
- ret = NOTIFY_STOP;
- }
+ /*
+ * The %db6 value is stored in args->err.
+ * If DR_STEP is the only bit set and it's ours,
+ * we should eat this exception.
+ */
+ if ((args->err & DR_STEP) &&
+ post_kprobe_handler(args->regs) &&
+ (args->err & ~DR_STEP) == 0)
+ ret = NOTIFY_STOP;
break;
-#undef DR6
case DIE_GPF:
case DIE_PAGE_FAULT:
-Roland:
Here's the next iteration. The arch-specific parts are now completely
encapsulated. validate_settings is in a form which should be workable
on all architectures. And the address, length, and type are passed as
arguments to register_{kernel,user}_hw_breakpoint().
I changed the Kprobes single-step routine along the lines you
suggested, but added a little extra. See what you think.
I haven't tried to modify Kconfig at all. To do it properly would
require making ptrace configurable, which is not something I want to
tackle at the moment.
The test for early termination of the exception handler is now back the
way it was. However I didn't change the test for deciding whether to
send a SIGTRAP. Under the current circumstances I don't see how it
could ever be wrong. (On the other hand, the code will end up calling
send_sigtrap() twice when a ptrace exception occurs: once in the ptrace
trigger routine and once in do_debug. That won't matter will it? I
would expect send_sigtrap() to be idempotent.)
Are you going to the Ottawa Linux Symposium?
Alan Stern
Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,49 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#define __ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+ unsigned long address;
+ u8 len;
+ u8 type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+ return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+ struct hw_breakpoint *bp)
+{
+ return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return bp->info.len;
+}
...I needed the attached patch on top of the bptest patch for the current code. Btw, that is a very nice little tester! Below that is a patch to go on top of your current patch, with x86-64 support. I've only tried a few trivial tests with bptest (including an 8-byte bp), which worked great. It is a pretty faithful copy of your i386 changes. I'm still not sure we have all that right, but you might as well incorporate this into your patch. You should change the x86_64 code in parallel with any i386 changes we decide on later, and I can test it and send you any typo fixups or whatnot. Thanks, Roland
I had already made some of those changes (the ones needed to make Right. I may update a few comments... Alan Stern -
I'll merge this with the rest of the patch. Alan Stern -
People usually read the documentation after the fields named like they can This is why I didn't bring up encoded addresses earlier on. :-) These kinds of issues are why I prefer unambiguously opaque arch-specific encodings. validate_settings is indeed wrong for the natural ppc encoding. The values must be set by a call that can return an error. That means you can't really have a static initializer macro, unless it's intended to mean "unspecified garbage if not used exactly right". I favor just going back I was not suggesting that. CONFIG_PTRACE would require HW_BREAKPOINT on This is incorrect. The usage of notify_die in all other cases, at least of machine exceptions on x86, is to test for == NOTIFY_STOP and when true short-circuit the normal effect of the exception (signal, oops). The notifiers should return NOTIFY_STOP if they consumed the exception wholly. So if a non-ptrace hw breakpoint consumed the exception and left no DR_TRAPn bits set, the thread would generate a second SIGTRAP from the prior single-step. Currently userland expects to have to clear DR_STEP in dr6 via ptrace itself, but does not expect it can get a duplicate SIGTRAP if it doesn't. Thanks, Roland -
Of course, calling register_kernel_hw_breakpoint() with three extra arguments is a waste of an instruction also, if one of those arguments isn't used. And yet it's not clear that either of these really is a waste. Suppose somebody ports code from x86 to PPC64 and leaves a breakpoint length set to HW_BREAKPOINT_LEN_4. Clearly we would want to return an error. This means that the length value _has_ to be tested, even if it won't be used for anything. And this means the length _has_ to be passed All right, I'll change it. And I'll encapsulate those fields. I still think it will accomplish nothing more than hiding some implementation It's below. The patch logs the value of DR6 when each debug interrupt occurs, and it adds another sysfs attribute to the bptest driver. The attribute is named "test", and it contains the value that the IRQ handler will write back to DR6. Combine this with the Alt-SysRq-P change already submitted, and you can get a clear view of what's going I see. So I could add a CONFIG_HW_BREAKPOINT option and make CONFIG_PTRACE depend on it. That will be simple enough. No, because do_debug always writes a 0 to DR6 after reading it; consequently DR_STEP does not remain set on later exceptions. Unless we do something like this we would never know whether we entered the handler because of a single-step exception or not. But the same effect could occur because of a bogus debug exception caused by lazy DR7 switching. I'll have to add back in code to detect that case. Alan Stern Index: usb-2.6/arch/i386/kernel/traps.c =================================================================== --- usb-2.6.orig/arch/i386/kernel/traps.c +++ usb-2.6/arch/i386/kernel/traps.c @@ -802,13 +802,17 @@ fastcall void __kprobes do_int3(struct p * find every occurrence of the TF bit that could be saved away even * by user code) */ +unsigned long dr6test; +EXPORT_SYMBOL(dr6test); + fastcall void __kprobes do_debug(struct pt_regs * regs...
It makes me a little happier, and I at least consider that a substantial Sure. There's no special reason to want to turn hw-breakpoint off, but You don't need to worry about that. Under utrace, CONFIG_PTRACE is already separate and can be turned off. I don't think we need really to Calling send_sigtrap twice during the same exception does happen to be harmless, but I don't think it should be presumed to be. It is just not the right way to go about things that you send a signal twice when there is one signal you want to generate. Also, send_sigtrap is an i386-only function (not even x86_64 has the same). Only x86_64 will share this actual code, but all others will be modelled on it. I think it makes things simplest across the board if the standard form is that when there is a ptrace exception, the notifier does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP arch code. So, hmm. In the old do_debug code, if a notifier returns NOTIFY_STOP, it bails immediately, before the db6 value is saved in current->thread. This is the normal theory of notify_die use, where NOTIFY_STOP means to completely swallow the event as if it never happened. In the event there were some third party notifier involved, it ought to be able to swallow its magic exceptions as before and have no user-visible db6 change happen at the time of that exception. So how about this: get_debugreg(condition, 6); set_debugreg(0UL, 6); /* The CPU does not clear it. */ if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code, SIGTRAP) == NOTIFY_STOP) return; The kprobes notifier uses max priority, so it will run first. Its notifier code uses my version. For a single-step that belongs to it, it will return NOTIFY_STOP and nothing else happens (noone touches vdr6). (I think I'm dredging up old territory by asking what happens when kprobes steps over an insn that hits a data breakpoint, but I don't recall atm.) vdr6 belongs wholly to hw_breakpoint, no o...
Good. My earlier stubbornness was caused by a desire to allow static initializers, but now I see that specifying the values in the So far this work has all been based on the vanilla kernel. Should I What happens when there are two ptrace exceptions at different points during the same system call? Won't we end up sending the signal twice In theory we should get an exception with both DR_STEP and DR_TRAPn set, meaning that neither notifier will return NOTIFY_STOP. But if the kprobes handler clears DR_STEP in the DR6 image passed to the That sounds not quite right. To a user-space debugger, a system call should appear as an atomic operation. If multiple ptrace exceptions occur during a system call, all the relevant DR_TRAPn bits should be set in vdr6 together and all the other ones reset. How can we arrange that? There's also the question of whether to send the SIGTRAP. If extraneous bits are set in DR6 (e.g., because the CPU always sets some extra bits) then we will never get NOTIFY_STOP. Nevertheless, the I disagree. kfree() is documented to return harmlessly when passed a NULL pointer, and lots of places in the kernel have been changed to remove useless tests for NULL before calls to kfree(). This is just I figured there was some reason like that. Alan Stern -
I did the first crack at a powerpc port. I'd appreciate your comments on this patch. It should not be incorporated, isn't finished, probably breaks ptrace, etc. I'm posting it now just to get any thoughts you have raised by seeing the second machine share the intended arch-independent code. I just translated your implementation to powerpc terms without thinking about it much. If you see anything that you aren't sure is right, please tell me and don't presume there is some powerpc-specific reason it's different. More likely I just didn't think it through. In the first battle just to make it compile, the only issue was that you assume every machine has TIF_DEBUG, which is in fact an implementation detail chosen lately by i386 and x86_64. AFAIK the only reason for it there is just to make a cheap test of multiple bits in the hot path deciding to call __switch_to_xtra. Do you rely on it meaning something more precise than just being a shorthand for hw_breakpoint_info!=NULL? Incidentally, I think it would be nice if kernel/hw_breakpoint.c itself had all the #include's for everything it uses directly. arch hw_breakpoint.c files probably only need <asm/hw_breakpoint.h> and one or two others to define what they need before #include "../../../kernel/hw_breakpoint.c". The num_installed/num_kbps stuff feels a little hokey when it's really a flag because the maximum number is one. It seems like I could make it tighter with some more finesse in the arch-specific hook options, so that chbi and thbi each just store dabr, dabr!=0 means "mine gets installed", and the switch in is just chbi->dabr?:thbi->dabr or something like that. As we get more machines, more cleanups along these lines will probably make sense. (Also, before the next person not me or you tries a port, we could use for the generic hw_breakpoint.c to get some comments at the top making explicit what the arch file is expected to define in its types, etc.) With just the included change to the generic code fo...
