Hi all,
I've made some progress on mmiotrace and now it can actually work as a
built-in. Previously mmio-mod.c was full of locking bugs, which never
triggered, because a traced (binary) kernel module depended on the
mmiotrace module.Changes between preview 1 and preview 2 include:
- Bug and style fixes in Kconfig.debug, Makefile, testmmiotrace.c, mutex
usage, and failure path in register probe function.
- RCU read-lock held over single stepping in kmmio.c
- mmio-mod.c is now a built-in with rewritten locking
- Added debugfs file to enable/disable mmiotracing, since this is not a
module anymore. (Temporary solution)
- Marker file moved from /proc into debugfs.
- mmiotrace entrypoints called directly from ioremap.c, which is why
this cannot be a module anymore.It still uses relay for pushing data into user space, this would be
replaced by ftrace, along with the "enabled" debugfs file.The kernel argument mmiotrace.enable_now=1 can be used to enable
mmiotracing at boot. I have tried this and it seems to work well,
I get about 80k events from boot until starting the user space
logger program by hand. Buffers are big enough that no event is
missed due to out-of-memory.I have also tried tracing Nouveau, started X, and disabled
mmiotracing while in X. Works fine, the logger program just starts
to hog cpu. The logger program will become 'cat' when we get to ftrace.Hooking or modifying kernel module files is not needed anymore, mmiotrace
will trace everything that is ioremapped after it has been enabled.
Well, everything that __ioremap() actually maps, not including e.g.
ISA area.The only thing checkpatch.pl complains about is the use of "volatile",
but that is part of the original declaration of iounmap().The next step would be going for ftrace framework.
This patch is against torvalds/linux-2.6.git master. I will reply to
this email with a patch for Ingo's x86/testing branch.Signed-off-by: Pekka Paalanen <pq@iki.fi>
---
arch/x86/Kconfig.debug ...
Hi all,
jb17some reported a recursive probe hit when mmiotracing the nvidia
proprietary driver (the blob) on SMP. I'll introduce the basic mechanisms
in mmiotrace, provide some test results and then ask the question:
Why does disarm_kmmio_fault_page() occasionally not do its job on SMP?Mmiotrace works by hooking into __ioremap() and marks the returned pages
as not present. Any access to these pages, specifically memory mapped IO
access, triggers a page fault. The fault handler, kmmio_handler(), marks
the hit page as present by calling disarm_kmmio_fault_page(), and sets
the TR flag. It also clears IF flag to prevent interrupts being enabled
when returning from the fault. The faulting instruction is re-executed,
this time without a fault, and the debug trap is hit. The debug trap
handler, post_kmmio_handler() re-arms the page and restores cpu flags.kmmio_handler() and post_kmmio_handler() need to share some state, and
they must be called strictly in pairs, hence clearing IF and holding locks
from the first handler to the second. This is a per-cpu requirement,
different cpus are allowed to run concurrently here.A recursive probe hit means that kmmio_handler() is called twice without a
a call to post_kmmio_handler() in between. This situation is explicitly
checked for (if (ctx->active)), and the current solution is to ignore the
fault and fall through to do_page_fault() triggering an error there.
According to experience, this does not happen on a uniprocessor machine.However, on an SMP machine this can occasionally occur. I have reproduced
it on my Core 2 Duo laptop while tracing the blob. Recursive probe hit
is very rare compared to the events logged, I can run two glxgears at the
same time for half an hour generating at least millions of events and never
hit it. Repeatedly start and stop a single glxgears, and I have a fairly
good chance of hitting it. It is random, but reproducible.Here is a kernel oops log via netconsole:
kmmio: recursive probe hit on CPU 0, f...
On Fri, 28 Mar 2008 22:25:00 +0200
It appears this happens:
CPU 0 CPU 1
,---> fault fault
| disarm disarm
| single step
| arm
| single step
'--------'
armand the both cpus are faulting on the same page. I guess one cpu is running
an nvidia interrupt service.
I see three possible solutions:A) Like in this patch, just disarm again and hope for the best.
Seems to work ok. I also compare the fault address to the saved address
ctx->addr. If they are equal, it is a "double probe hit" and harmless.
If they are not equal, it is a real "recursive probe hit" and something
more is wrong. With these definitions, recursive probe hits are gone inB) Acquire a spinlock in kmmio_handler() and release it in
post_kmmio_handler(). I don't like this one since I spent some effort
making the fault path spinlockless, but at least this would be a
completely separate spinlock. Or we could use per-page spinlocks.C) Vegard mentioned something about per-cpu page tables for kmemcheck.
This would be the ultimate solution, because it would solve two problems:
- recursive probe hits
- missed events due to another cpu disarming the page for single stepping
Would it be possible to have a single temporary per-cpu pte?I understood kmemcheck has similar issues. Of course, one could force the
system down to a single running CPU, but that feels nasty.Which way to go?
I choose A) as the current workaround, keeping in mind that I will loose
events on SMP. C) would be the only reliable SMP solution on tracing point
of view.Thanks.
--
Pekka Paalanen
http://www.iki.fi/pq/
--
On Sun, 30 Mar 2008 20:26:08 +0300
One more idea:
D) Emulate the faulting instruction.
In __ioremap(), do the mapping, but steal it for mmiotrace's personal use,
and return a bogus mapping that is identifiable in #pf handler. When
something accesses the bogus mapping, emulate and step over the faulting
instruction using the stolen IO memory mapping. This would get rid of
the debug trap and single stepping, and also remove the need to disarm
the mmio page, which means tracing would work reliably on SMP without
any page table kludges. This would also remove the yet another instruction
decoding code that mmiotrace has.The catch is the instruction emulation. I see KVM has some emulation code,
but I cannot understand it without a deep study that would take me weeks.
Is that general enough to be used, or could it be generalized?
Mmiotrace, apart from executing the instruction with a modified address,
would need to extract the type of IO memory access, width and the data
read/written. And since it is dealing with IO memory, the emulation
should be very careful to access the hardware exactly like the original
instruction would have.Maybe also kmemcheck could use this approach, since the current approach
is very much like in mmiotrace: #pf, show page, single step, #db trap,
hide page.Are there other x86(_64) instruction emulation facilities in the kernel
I might use?Or, if the emulation cannot be used, what would it take to make at least
instruction decoding general enough so that mmiotrace could use that instead
of its own decoding?I fear modifying KVM emulation code is a too heavy job for me personally.
--
Pekka Paalanen
http://www.iki.fi/pq/
--
It should not be too difficult to modify x86_emulate.c to do everything
through a function vector. However there is a simpler (for you)
solution: run the driver-to-be-reverse-engineered in a kvm guest, and
modify kvm userspace to log accesses to mmio regions. This requires the
not-yet-merged pci passthrough support. You can reverse engineer
Windows drivers with this as well.This won't work for kmemcheck smp though.
--
Any sufficiently difficult bug is indistinguishable from a feature.--
On Sat, 05 Apr 2008 10:36:50 +0300
This is a very interesting idea, I didn't know it would be possible.
I think this also is a new project and I'd be happy to let someone else
take it over. We could still use the log format and tools from
mmiotrace, at least for starters.This puts mmiotrace in a new ligth: there's something better(?)
coming in the future. But mmiotrace is here now, and I'd still like to
see it in mainline. Within ftrace framework, mmiotrace will be very
easy to use even for a Linux newbie. The KVM approach would be targeted
to developers, as I suspect setting it up (even when everything has been
merged into mainline) is more work than running mmiotrace.Ingo, what do you think? In my opinion let's make mmiotrace force the
system down to UP via CPU hotplugging while MMIO tracing is active.
This should guarantee reliable traces in the easiest way possible. The
current workaround is not reliable on SMP. "Works enough" is what I'm
thinking of.Thanks.
--
Pekka Paalanen
http://www.iki.fi/pq/
--
For kmemcheck, I'd prefer the per-CPU page tables suggested by Ingo.
I'm having hard time understanding why that's a "ugly hack" compared
to using kvm for this...
--
It's not an ugly hack, but will be very very difficult. With mmu
notifiers it's probably doable though:- the linux page tables are never loaded into cr3, but rather kept as a
reference
- page faults are by instantiating ptes into shadow page tables (which
track the linux page tables)
- mmu notifiers are used to drop shadow ptes when the linux ptes change--
Any sufficiently difficult bug is indistinguishable from a feature.--
Actually, paravirt_ops is a much better match, as it also provides hooks
for setting cr3. I think you can implement per-cpu pagetables using
paravirt_ops without modifying core mm code at all.--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.--
Yes, Ingo Molnar has suggested per-cpu page tables, but that's so far
away from what I am capable of, so unless Ingo wants to do it himself,
I fear it will never be done ;-) [I also believe the resulting code
would be too ugly and too un-useful for the rest of the kernel that it
would probably not ever be merged. But that's a different story.] But
I do think this is the best solution in terms of reliability.We do indeed limit maxcpus to 1 at run-time if the kernel is compiled
with CONFIG_SMP. kmemcheck is a debugging facility, and as such,
actual multiprocessor support is not critical for the purpose ofI think that would be extremely difficult to do. I am personally
trying to stay as far away from opcode decoding (and recoding!
*shudder*) as possible. I do the minimal decoding for operand sizes,They are indeed very much the same. I wish somebody had told me about
mmiotrace when I first started working on kmemcheck! :-)I don't think I can be of much more help than that. Just my opinion on things.
Kind regards,
Vegard Nossum
--
On Thu, 3 Apr 2008 23:40:28 +0200
Actually, I think kmemcheck should support SMP even more than mmiotrace.
Mmiotrace is just a reverse engineering tool for hardware drivers, but
kmemcheck as a debugging tool could be valuable specifically on SMP,
think about racing CPUs using and initializing memory. Dropping to UPTo be honest, I don't know how easy or difficult it is. If there was a
general decoding facility, it shouldn't be that hard to use, if someone
else makes the facility for us ;-)
Could all relevant instructions be represented as three-address-code or
in some other simple form?
This seems to require some experimenting with code.--
Pekka Paalanen
http://www.iki.fi/pq/
--
Hi,
I may of course be wrong, but... Shouldn't the post_kmmio_handler(),
called from the die notifier chain, check for the DR_STEP condition?
This makes sure that the function is not called in the cases where the
source of the debug exception was not a single-stepping event. Though
I guess you'll also have other checks in place to notice that the
interrupt was not the one you were expecting. I guess a little extra
safety won't hurt though?if (!(condition & DR_STEP))
return;Vegard
--
On Fri, 28 Mar 2008 00:13:48 +0100
I guess that would be appropriate. I think it should go into this
function:+static int kmmio_die_notifier(struct notifier_block *nb, unsigned long val,
+ void *args)
+{
+ struct die_args *arg = args;
+
+ if (val == DIE_DEBUG)
+ if (post_kmmio_handler(arg->err, arg->regs) == 1)
+ return NOTIFY_STOP;
+
+ return NOTIFY_DONE;
+}On the other hand I am thinking of not using the die notifier chain
at all and adding a direct call from do_debug() or something.
This is the last dynamic hook remaining from the out-of-tree module
era of mmiotrace.I guess with the notifier list there is a possibility that another
module intercepts my single step trap, so that this is never called,
which would leave mmiotrace half blind, and also trigger a recursive
probe hit.btw. what if someone uses kmemcheck and mmiotrace at the same time?
Mmiotrace will not fiddle with any other pages than returned via
__ioremap(), but can kmemcheck "hide" the same mmio pages?
Also keeping in mind, that some day I'd like to make mmiotrace able
to catch mmio accesses originating in user space.Thanks.
--
Pekka Paalanen
http://www.iki.fi/pq/
--
From bc32dcfb4048bb74c774fd3de724bd38e8b98c2f Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sun, 9 Mar 2008 13:52:25 +0200
Subject: [PATCH] x86 mmiotrace: update preview 1 to preview 2Kconfig.debug, Makefile and testmmiotrace.c style fixes.
Use real mutex instead of mutex.
Fix failure path in register probe func.
kmmio: RCU read-locked over single stepping.
Generate mapping id's.
Make mmio-mod.c built-in and rewrite its locking.
Add debugfs file to enable/disable mmiotracing.
kmmio: use irqsave spinlocks.
Lots of cleanups in mmio-mod.c
Marker file moved from /proc into debugfs.
Call mmiotrace entrypoints directly from ioremap.c.Signed-off-by: Pekka Paalanen <pq@iki.fi>
---
arch/x86/Kconfig.debug | 20 +--
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/ioremap.c | 9 +-
arch/x86/mm/kmmio.c | 72 ++++-----
arch/x86/mm/mmio-mod.c | 397 ++++++++++++++++++++++++++++---------------
arch/x86/mm/testmmiotrace.c | 15 +-
include/linux/mmiotrace.h | 18 ++-
7 files changed, 332 insertions(+), 201 deletions(-)diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 177fd06..4c4fea0 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -183,22 +183,19 @@ config SYSPROFconfig MMIOTRACE_HOOKS
bool
- default nconfig MMIOTRACE
- tristate "Memory mapped IO tracing"
+ bool "Memory mapped IO tracing"
depends on DEBUG_KERNEL && RELAY && DEBUG_FS
select MMIOTRACE_HOOKS
- default n
+ default y
help
- This will build a kernel module called mmiotrace.
- Making this a built-in is heavily discouraged.
-
- Mmiotrace traces Memory Mapped I/O access and is meant for debugging
- and reverse engineering. The kernel module offers wrapped
- versions of the ioremap family of functions. The driver to be traced
- must be modified to call these wrappers. A user space program is
- required to collect the MMIO data.
+ Mmiotrace traces Memory Mapped ...
