Hi all, I have finally got kmmio.c into shape so that it is built in when mmiotrace is built. I'd like to hear your comments on kmmio.c. kmmio.c handles the list of mmio probes with callbacks, list of traced pages, and attaching into the page fault handler and die notifier. It arms, traps and disarms the given pages, this is the core of mmiotrace. mmio-mod.c is a user interface, hooking into ioremap functions and registering the mmio probes. It also decodes the required information from trapped mmio accesses via the pre and post callbacks in each probe. Currently, hooking into ioremap functions works by redefining the symbols of the target (binary) kernel module, so that it calls the traced versions of the functions. The most notable changes done since the last discussion are: - kmmio.c is a built-in, not part of the module - direct call from fault.c to kmmio.c, removing all dynamic hooks - prepare for unregistering probes at any time - make kmmio re-initializable and accessible to more than one user - rewrite kmmio locking to remove all spinlocks from page fault path Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe() or is there a better way? The function called via call_rcu() itself calls call_rcu() again, will this work or break? There I need a second grace period for RCU after the first grace period for page faults. Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack that next. At some point I will start looking into how to make mmiotrace a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should make the user space part of mmiotracing as simple as 'cat /debug/trace/mmio > dump.txt'. Ftrace lives in sched-devel http://people.redhat.com/mingo/sched-devel.git/README and I have nothing for that yet. This patch is against torvalds/linux-2.6.git master. Home page of the out-of-tree version: http://nouveau.freedesktop.org/wiki/MmioTrace Thanks, Pekka. arch/x86/Kconfig.debug | 33 +++ ...
Hi Pekka. n is default so no need to be explicit. If you prefer being explcit then use: Sam --
Please feed the diff through scritps/checkpatch.pl and consider addressing It's a semaphore. Please do convert it to a mutex. Andy, I'd say that addition of new semaphores is worth a warning - they're rarely legitimate. --
I'm not sure that any semaphore should be a warning, but the initializer for semaphore used as binary mutex (DECLARE_MUTEX and init_MUTEX) are worth it. --
It looks like a mutex, it acts like a mutex, but it isn't a mutex, it's a trap for the unwary. Weird. I was annoyed by it before; now I see a fellow developer actually getting into that trap. I'd say, rename DECLARE_MUTEX to DECLARE_SEMAPHORE and let external code be fixed one way or another (i.e. stick with the "mutex" name or stick with the semaphore functionality if it's really needed). -- Regards, Pavel Roskin --
I like the fact that in evey architecture its defined as: #define DECLARE_MUTEX(name) __DECLARE_SEMAPHORE_GENERIC(name,1) -apw --
It's a semaphore used as mutex. Until we got struct mutex this was perfectly fine and now we're phasing it out and will hopefully get rid of it soon. It just takes some time to convert all users. --
Ok, so that would be the following, work for everyone? WARNING: mutexes are preferred for single holder semaphores #1: FILE: Z95.c:1: + DECLARE_MUTEX(&foo); WARNING: mutexes are preferred for single holder semaphores #3: FILE: Z95.c:3: + init_MUTEX(&foo); -apw --
yeah. Acked-by: Ingo Molnar <mingo@elte.hu> also i guess init_MUTEX_LOCKED() should emit a "this should be a completion" warning. i guess non-DEFINE_SPINLOCK old-style spinlock definition: spinlock_t lock = SPIN_LOCK_UNLOCKED; should emit a 'use DEFINE_SPINLOCK' warning as well? Ingo --
Thats easy enough. Though your tone here implies its less definatly wrong than the other use forms. Do we want gentle language here? Those (SPIN_LOCK_UNLOCKED & RW_LOCK_UNLOCKED) we already pick up and indicate are deprecated. -apw --
On Mon, 25 Feb 2008 14:49:22 -0800 I checked that, but I didn't think any of them were worth fixing. And since this is a work in progress and a in a state which I do not want I believe the 'volatile' belongs here, it is also in the declaration of These are all in testmmiotrace.c which calls the traced ioremap functions directly. These direct calls will go away and it will Since testmmiotrace does not use v for anything other than return value of ioread8/16/32(), I wanted to prevent the compiler from optimizing it away. Now thinking it again, ioread*() must guarantee that the read is always executed. I'll remove v altogether. Patch for the other issues you mentioned is brewing. Thanks. -- Pekka Paalanen http://www.iki.fi/pq/ --
Hey, Pekka, Should that text read something like: if (condition != DIE_TRAP || !ctx->active) Presumably you won't be active if something else is going wrong, but one That only checks the first page; if the probed region partially overlaps another one found later in memory, the registration will succeed. Maybe you want to decrement kmmio_count if you decide to return -EEXIST (or hold off on the increment until after the test)? In general, I worry about what happens if an interrupt handler generates traced MMIO traffic while a fault handler is active. It looks a lot like the "all hell breaks loose" scenario mentioned in the comments. Is there some way of preventing that which I missed? Otherwise, maybe, should the ioremap() wrappers take an additional argument, being an IRQ to disable while trace handlers are active? jon --
On Tue, 26 Feb 2008 10:20:08 -0700
The last thing in kmmio.c is:
static int kmmio_die_notifier(struct notifier_block *nb, unsigned long val,
void *args)
{
struct die_args *arg = args;
if (val == DIE_DEBUG)
if (post_kmmio_handler(arg->err, arg->regs) == 1)
return NOTIFY_STOP;
return NOTIFY_DONE;
}
so post_kmmio_handler() is not the die_notifier callback, and the handler
is limited to debug calls, which I assume can only come from
single-stepping. The variable 'condition' is something that on x86_64 comes
from get_debugreg(condition, 6); in arch/x86/kernel/traps_64.c do_debug().
True. I am assuming ioremap*() cannot return regions that overlap in
their kernel virtual addresses. Clearly this does not hold for the ISA
region, but tracing the ISA region has triggered hard freezes, so we
specifically exclude that.
Even if ioremap*() would return overlapping areas, I don't think it would
break things. It just means there are two probes on the same area. If an
MMIO access hits that area, the "first" probe gets selected and executed.
The only thing lost is which probe was actually hit, but then there's no
way to know that, anyway.
kmmio_fault_page objects are shared and reference-counted, so they should
Yes, good catch. Then again, maybe I would not even need the failure case
at all. I think I'll keep it there, though, maybe it catches something
I have thought of something like this, how could it happen. My current
understanding is the following.
kmmio_handler(), where-ever it was triggered, executes with interrupts
disabled. It modifies the saved CPU state: disable interrupts and
enable single-stepping (what's the correct term?). After disarming
the page, it returns. The faulted instruction is single-stepped,
all the time the interrupts are disabled. It hits the debug trap and
we end up in post_kmmio_handler(), still interrupts disabled. The
post handler does its thing, restores the CPU state changed in
kmmio_handler(), and ...