mmiotrace bug: recursive probe hit

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <linux-kernel@...>, Ingo Molnar <mingo@...>
Cc: Pekka Paalanen <pq@...>, Christoph Hellwig <hch@...>, Arjan van de Ven <arjan@...>, Pavel Roskin <proski@...>, Steven Rostedt <rostedt@...>, Peter Zijlstra <a.p.zijlstra@...>, <penberg@...>, <vegard.nossum@...>
Date: Friday, March 28, 2008 - 4:25 pm

Hi all,

jb17some reported a recursive probe hit when mmiotracing the nvidia
proprietary driver (the blob) on SMP. I'll introduce the basic mechanisms
in mmiotrace, provide some test results and then ask the question:
Why does disarm_kmmio_fault_page() occasionally not do its job on SMP?

Mmiotrace works by hooking into __ioremap() and marks the returned pages
as not present. Any access to these pages, specifically memory mapped IO
access, triggers a page fault. The fault handler, kmmio_handler(), marks
the hit page as present by calling disarm_kmmio_fault_page(), and sets
the TR flag. It also clears IF flag to prevent interrupts being enabled
when returning from the fault. The faulting instruction is re-executed,
this time without a fault, and the debug trap is hit. The debug trap
handler, post_kmmio_handler() re-arms the page and restores cpu flags.

kmmio_handler() and post_kmmio_handler() need to share some state, and
they must be called strictly in pairs, hence clearing IF and holding locks
from the first handler to the second. This is a per-cpu requirement,
different cpus are allowed to run concurrently here.

Here is the relevant code, quoted from my preview 2 email.


A recursive probe hit means that kmmio_handler() is called twice without a
a call to post_kmmio_handler() in between. This situation is explicitly
checked for (if (ctx->active)), and the current solution is to ignore the
fault and fall through to do_page_fault() triggering an error there.
According to experience, this does not happen on a uniprocessor machine.

However, on an SMP machine this can occasionally occur. I have reproduced
it on my Core 2 Duo laptop while tracing the blob. Recursive probe hit
is very rare compared to the events logged, I can run two glxgears at the
same time for half an hour generating at least millions of events and never
hit it. Repeatedly start and stop a single glxgears, and I have a fairly
good chance of hitting it. It is random, but reproducible.

Here is a kernel oops log via netconsole:

kmmio: recursive probe hit on CPU 0, for address 0xffffc20004500140. Ignoring.
BUG: unable to handle kernel paging request at ffffc20004500140
IP: [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22
PGD 3e8d9067 PUD 3e8da067 PMD 3cd58067 PTE 80000000d600017a
Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0 
Modules linked in: nvidia(P) netconsole snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss sha256_generic cpufreq_stats
acpi_cpufreq video output fan button arc4 ecb snd_hda_intel snd_pcm snd_timer
snd snd_page_alloc iwl4965 mac80211 ehci_hcd uhci_hcd usbcore ohci1394
ieee1394 psmouse e1000 pcspkr sg i2c_i801 i2c_core yenta_socket
rsrc_nonstatic [last unloaded: nvidia]
Pid: 0, comm: swapper Tainted: P	 2.6.25-rc6-sched-devel.git-x86-latest.git #2
RIP: 0010:[<ffffffffa04e7b38>]  [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22
RSP: 0018:ffffffff806cfe38  EFLAGS: 00010106
RAX: 0000000000000050 RBX: ffff810019a49000 RCX: 0000000000000140
RDX: ffffc20004500000 RSI: ffff810019a49000 RDI: ffff81001ae5a7f0
RBP: ffff810019b1f148 R08: ffff81001ad6a000 R09: ffffffff806cfe18
R10: ffffffff80428f60 R11: ffff81003cc63140 R12: ffff81001ae5a7f0
R13: 0000000000000140 R14: 0000000000000001 R15: ffff8100194b3000
FS:  0000000000000000(0000) GS:ffffffff80662000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffc20004500140 CR3: 000000001af2a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff80684000, task ffffffff8061f4e0)
Stack:  ffffffffa03b978e ffff810019a49000 ffff81001ad6ebf0 ffffc200040b0000
 0000000000000001 ffff8100194b3000 ffffffffa04e8655 ffffc200040b0000
 ffff810019a49000 ffffc200040b0000 ffff810019b1c180 ffffffff806cff18
Call Trace:
 <IRQ>  [<ffffffffa03b978e>] ? :nvidia:_nv008123rm+0x2a/0xc7
 [<ffffffffa04e8655>] ? :nvidia:_nv003822rm+0x163/0x256
 [<ffffffffa04e6b82>] ? :nvidia:_nv003815rm+0x4f/0x76
 [<ffffffffa04edc13>] ? :nvidia:rm_isr+0xaa/0x10a
 [<ffffffffa059ac61>] ? :nvidia:nv_kern_isr+0x7d/0xb1
 [<ffffffff8025f567>] ? handle_IRQ_event+0x20/0x55
 [<ffffffff802609c5>] ? handle_fasteoi_irq+0x9c/0xdc
 [<ffffffff8020e95a>] ? do_IRQ+0x105/0x17f
 [<ffffffff8020b7e6>] ? ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80421c12>] ? poll_idle+0x16/0x63
 [<ffffffff80421c17>] ? poll_idle+0x1b/0x63
 [<ffffffff80421c12>] ? poll_idle+0x16/0x63
 [<ffffffff80422039>] ? cpuidle_idle_call+0x7e/0xad
 [<ffffffff80421fbb>] ? cpuidle_idle_call+0x0/0xad
 [<ffffffff8020a1f5>] ? cpu_idle+0x92/0xd0
 [<ffffffff804c6025>] ? rest_init+0x69/0x6b

Code: 89 c8 d1 e8 89 c0 0f b7 04 42 0f b7 c0 c3 89 d1 b8 00 00 00 00 39 96
0c 02 00 00 76 11 48 8b 96 40 02 00 00 89 c8 c1 e8 02 89 c0 <8b> 04 82 f3
c3 39 96 14 02 00 00 76 0c 89 d2 48 8b 86 58 02 00 
RIP  [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22
 RSP <ffffffff806cfe38>
CR2: ffffc20004500140
---[ end trace 0a8c36f3081eaf7e ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!


Next, after discussion with Enberg and Nossum, I tried the following patch:

@@ -272,6 +272,9 @@ int kmmio_handler(struct pt_regs *regs, unsigned long addr)
                pr_emerg("kmmio: recursive probe hit on CPU %d, "
                                        "for address 0x%08lx. Ignoring.\n",
                                        smp_processor_id(), addr);
+               pr_emerg("kmmio: previous hit was at 0x%08lx.\n",
+                                       ctx->addr);
+               disarm_kmmio_fault_page(faultpage->page, NULL);
                goto no_kmmio_ctx;
        }
        ctx->active++;
@@ -322,8 +325,11 @@ static int post_kmmio_handler(unsigned long condition, struct pt_regs *regs)
        int ret = 0;
        struct kmmio_context *ctx = &get_cpu_var(kmmio_ctx);
 
-       if (!ctx->active)
+       if (!ctx->active) {
+               pr_notice("kmmio: spurious debug trap on CPU %d.\n",
+                                       smp_processor_id());
                goto out;
+       }

The result is occasional two lines:

kmmio: recursive probe hit on CPU 1, for address 0xffffc20004400140. Ignoring.
kmmio: previous hit was at 0xffffc20004400140.

The addresses vary, but are always the same for a pair of lines.
The "spurious debug trap" is never seen, and no kernel oopses. Note, that I
(unintentionally) still fall through to do_page_fault() after disarming.

Hypothesis 1:
Something gets to run on this cpu between exiting kmmio_handler() and
entering post_kmmio_handler(), and it triggers a new kmmio probe hit.
This is unlikely, as there should be a spurious debug trap afterwards.

Hypothesis 2:
disarm_kmmio_fault_page() does not work.
But it works most of the time, and on UP it seems to work always.

If hypothesis 2 is true, when the faulted instruction is re-executed,
it faults again. This is supported by the fault address being the same
at both times.

Could something else (apart from NMI) get to run in between fault and
debug handlers?

Is there something wrong in arm/disarm_kmmio_fault_page() functions?
Could it be a race between cpus?
Does the fall through to do_page_fault() trigger a proper fix to page
tables in case my disarming is bad?
Is the debugging hack an acceptable workaround?

I hope using a binary blob as a test case does not offend anyone, but
the main purpose of mmiotrace is to reverse engineer binary blobs.

I would appreciate any insight to the problem.


Thanks,
Pekka.

-- 
Pekka Paalanen
http://www.iki.fi/pq/
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] mmiotrace full patch, preview 2, Pekka Paalanen, (Sun Mar 9, 10:40 am)
mmiotrace bug: recursive probe hit, Pekka Paalanen, (Fri Mar 28, 4:25 pm)
Re: mmiotrace bug: recursive probe hit, Pekka Paalanen, (Sun Mar 30, 1:26 pm)
Re: mmiotrace bug: recursive probe hit, Pekka Paalanen, (Thu Apr 3, 5:07 pm)
Re: mmiotrace bug: recursive probe hit, Avi Kivity, (Sat Apr 5, 3:36 am)
Re: mmiotrace bug: recursive probe hit, Pekka Paalanen, (Sun Apr 6, 1:32 pm)
Re: mmiotrace bug: recursive probe hit, Pekka Enberg, (Sat Apr 5, 3:40 am)
Re: mmiotrace bug: recursive probe hit, Avi Kivity, (Sat Apr 5, 8:39 am)
Re: mmiotrace bug: recursive probe hit, Avi Kivity, (Sat Apr 5, 11:58 am)
Re: mmiotrace bug: recursive probe hit, Vegard Nossum, (Thu Apr 3, 5:40 pm)
Re: mmiotrace bug: recursive probe hit, Pekka Paalanen, (Fri Apr 4, 9:18 am)
Re: [RFC] mmiotrace full patch, preview 2, Vegard Nossum, (Thu Mar 27, 7:13 pm)
Re: [RFC] mmiotrace full patch, preview 2, Pekka Paalanen, (Fri Mar 28, 2:24 pm)
Re: [RFC] mmiotrace full patch, preview 2, Pekka Paalanen, (Sun Mar 9, 10:46 am)