Hi all, jb17some reported a recursive probe hit when mmiotracing the nvidia proprietary driver (the blob) on SMP. I'll introduce the basic mechanisms in mmiotrace, provide some test results and then ask the question: Why does disarm_kmmio_fault_page() occasionally not do its job on SMP? Mmiotrace works by hooking into __ioremap() and marks the returned pages as not present. Any access to these pages, specifically memory mapped IO access, triggers a page fault. The fault handler, kmmio_handler(), marks the hit page as present by calling disarm_kmmio_fault_page(), and sets the TR flag. It also clears IF flag to prevent interrupts being enabled when returning from the fault. The faulting instruction is re-executed, this time without a fault, and the debug trap is hit. The debug trap handler, post_kmmio_handler() re-arms the page and restores cpu flags. kmmio_handler() and post_kmmio_handler() need to share some state, and they must be called strictly in pairs, hence clearing IF and holding locks from the first handler to the second. This is a per-cpu requirement, different cpus are allowed to run concurrently here. Here is the relevant code, quoted from my preview 2 email.A recursive probe hit means that kmmio_handler() is called twice without a a call to post_kmmio_handler() in between. This situation is explicitly checked for (if (ctx->active)), and the current solution is to ignore the fault and fall through to do_page_fault() triggering an error there. According to experience, this does not happen on a uniprocessor machine. However, on an SMP machine this can occasionally occur. I have reproduced it on my Core 2 Duo laptop while tracing the blob. Recursive probe hit is very rare compared to the events logged, I can run two glxgears at the same time for half an hour generating at least millions of events and never hit it. Repeatedly start and stop a single glxgears, and I have a fairly good chance of hitting it. It is random, but reproducible. Here is a kernel oops log via netconsole: kmmio: recursive probe hit on CPU 0, for address 0xffffc20004500140. Ignoring. BUG: unable to handle kernel paging request at ffffc20004500140 IP: [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22 PGD 3e8d9067 PUD 3e8da067 PMD 3cd58067 PTE 80000000d600017a Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: nvidia(P) netconsole snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sha256_generic cpufreq_stats acpi_cpufreq video output fan button arc4 ecb snd_hda_intel snd_pcm snd_timer snd snd_page_alloc iwl4965 mac80211 ehci_hcd uhci_hcd usbcore ohci1394 ieee1394 psmouse e1000 pcspkr sg i2c_i801 i2c_core yenta_socket rsrc_nonstatic [last unloaded: nvidia] Pid: 0, comm: swapper Tainted: P 2.6.25-rc6-sched-devel.git-x86-latest.git #2 RIP: 0010:[<ffffffffa04e7b38>] [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22 RSP: 0018:ffffffff806cfe38 EFLAGS: 00010106 RAX: 0000000000000050 RBX: ffff810019a49000 RCX: 0000000000000140 RDX: ffffc20004500000 RSI: ffff810019a49000 RDI: ffff81001ae5a7f0 RBP: ffff810019b1f148 R08: ffff81001ad6a000 R09: ffffffff806cfe18 R10: ffffffff80428f60 R11: ffff81003cc63140 R12: ffff81001ae5a7f0 R13: 0000000000000140 R14: 0000000000000001 R15: ffff8100194b3000 FS: 0000000000000000(0000) GS:ffffffff80662000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffc20004500140 CR3: 000000001af2a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff80684000, task ffffffff8061f4e0) Stack: ffffffffa03b978e ffff810019a49000 ffff81001ad6ebf0 ffffc200040b0000 0000000000000001 ffff8100194b3000 ffffffffa04e8655 ffffc200040b0000 ffff810019a49000 ffffc200040b0000 ffff810019b1c180 ffffffff806cff18 Call Trace: <IRQ> [<ffffffffa03b978e>] ? :nvidia:_nv008123rm+0x2a/0xc7 [<ffffffffa04e8655>] ? :nvidia:_nv003822rm+0x163/0x256 [<ffffffffa04e6b82>] ? :nvidia:_nv003815rm+0x4f/0x76 [<ffffffffa04edc13>] ? :nvidia:rm_isr+0xaa/0x10a [<ffffffffa059ac61>] ? :nvidia:nv_kern_isr+0x7d/0xb1 [<ffffffff8025f567>] ? handle_IRQ_event+0x20/0x55 [<ffffffff802609c5>] ? handle_fasteoi_irq+0x9c/0xdc [<ffffffff8020e95a>] ? do_IRQ+0x105/0x17f [<ffffffff8020b7e6>] ? ret_from_intr+0x0/0xf <EOI> [<ffffffff80421c12>] ? poll_idle+0x16/0x63 [<ffffffff80421c17>] ? poll_idle+0x1b/0x63 [<ffffffff80421c12>] ? poll_idle+0x16/0x63 [<ffffffff80422039>] ? cpuidle_idle_call+0x7e/0xad [<ffffffff80421fbb>] ? cpuidle_idle_call+0x0/0xad [<ffffffff8020a1f5>] ? cpu_idle+0x92/0xd0 [<ffffffff804c6025>] ? rest_init+0x69/0x6b Code: 89 c8 d1 e8 89 c0 0f b7 04 42 0f b7 c0 c3 89 d1 b8 00 00 00 00 39 96 0c 02 00 00 76 11 48 8b 96 40 02 00 00 89 c8 c1 e8 02 89 c0 <8b> 04 82 f3 c3 39 96 14 02 00 00 76 0c 89 d2 48 8b 86 58 02 00 RIP [<ffffffffa04e7b38>] :nvidia:_nv003832rm+0x1d/0x22 RSP <ffffffff806cfe38> CR2: ffffc20004500140 ---[ end trace 0a8c36f3081eaf7e ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! Next, after discussion with Enberg and Nossum, I tried the following patch: @@ -272,6 +272,9 @@ int kmmio_handler(struct pt_regs *regs, unsigned long addr) pr_emerg("kmmio: recursive probe hit on CPU %d, " "for address 0x%08lx. Ignoring.\n", smp_processor_id(), addr); + pr_emerg("kmmio: previous hit was at 0x%08lx.\n", + ctx->addr); + disarm_kmmio_fault_page(faultpage->page, NULL); goto no_kmmio_ctx; } ctx->active++; @@ -322,8 +325,11 @@ static int post_kmmio_handler(unsigned long condition, struct pt_regs *regs) int ret = 0; struct kmmio_context *ctx = &get_cpu_var(kmmio_ctx); - if (!ctx->active) + if (!ctx->active) { + pr_notice("kmmio: spurious debug trap on CPU %d.\n", + smp_processor_id()); goto out; + } The result is occasional two lines: kmmio: recursive probe hit on CPU 1, for address 0xffffc20004400140. Ignoring. kmmio: previous hit was at 0xffffc20004400140. The addresses vary, but are always the same for a pair of lines. The "spurious debug trap" is never seen, and no kernel oopses. Note, that I (unintentionally) still fall through to do_page_fault() after disarming. Hypothesis 1: Something gets to run on this cpu between exiting kmmio_handler() and entering post_kmmio_handler(), and it triggers a new kmmio probe hit. This is unlikely, as there should be a spurious debug trap afterwards. Hypothesis 2: disarm_kmmio_fault_page() does not work. But it works most of the time, and on UP it seems to work always. If hypothesis 2 is true, when the faulted instruction is re-executed, it faults again. This is supported by the fault address being the same at both times. Could something else (apart from NMI) get to run in between fault and debug handlers? Is there something wrong in arm/disarm_kmmio_fault_page() functions? Could it be a race between cpus? Does the fall through to do_page_fault() trigger a proper fix to page tables in case my disarming is bad? Is the debugging hack an acceptable workaround? I hope using a binary blob as a test case does not offend anyone, but the main purpose of mmiotrace is to reverse engineer binary blobs. I would appreciate any insight to the problem. Thanks, Pekka. -- Pekka Paalanen http://www.iki.fi/pq/ --
| Tomasz Kłoczko | Is it time for remove (crap) ALSA from kernel tree ? |
| Aubrey | O_DIRECT question |
| David Miller | Slow DOWN, please!!! |
| Linus Torvalds | Linux 2.6.27-rc8 |
git: | |
| Francis Moreau | emacs and git... |
| Linus Torvalds | I'm a total push-over.. |
| Keith Packard | Re: parsecvs tool now creates git repositories |
| Andreas Hildebrandt | CVS-$Id:$ replacement in git? |
| Jason Dixon | Wasting our Freedom |
| Richard Stallman | Real men don't attack straw men |
| Edwin Eyan Moragas | poll(2) vs kqueue(2) performance |
| James Hartley | scp batch mode? |
| Chris Peterson | [PATCH] drivers/net: remove network drivers' last few uses of IRQF_SAMPLE_RANDOM |
| Karen Xie | [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator |
| Lennert Buytenhek | [PATCH 14/39] mv643xx_eth: remove port serial status register bit defines |
| Andrew Morton | Re: [Bugme-new] [Bug 11036] New: atl1 tx busy and hw csum wrong |
| high memory | 2 hours ago | Linux kernel |
| semaphore access speed | 5 hours ago | Applications and Utilities |
| the kernel how to power off the machine | 6 hours ago | Linux kernel |
| Easter Eggs in windows XP | 9 hours ago | Windows |
| Shared swap partition | 10 hours ago | Linux general |
| Root password | 10 hours ago | Linux general |
| Where/when DNOTIFY is used? | 12 hours ago | Linux kernel |
| How to convert Linux Kernel built-in module into a loadable module | 14 hours ago | Linux kernel |
| Linux 2.6.24 and I/O schedulers | 15 hours ago | Linux kernel |
| USB Driver -- Interrupt Polling -- A Little Help Please | 20 hours ago | Linux general |
