Re: Replacement for page fault notifiers?

Previous thread: [PATCH 2/6] SUNRPC: export svc_sock_update_bufs by Jeff Layton on Tuesday, January 8, 2008 - 12:33 pm. (17 messages)

Next thread: [GIT PULL] please pull infiniband.git by Roland Dreier on Tuesday, January 8, 2008 - 1:25 pm. (1 message)
From: Pekka Paalanen
Date: Tuesday, January 8, 2008 - 12:06 pm

Hello,

the commit "x86: optimize page faults like all other achitectures and
kill notifier cruft" by Christoph Hellwig removed the page fault
notifiers (e.g. register_page_fault_notifier()) completely in 2.6.24.
Mmio-trace [1] is using them for tracking memory mapped IO, and also
other people have mentioned they need it [2,3].

This is roughly how I understand mmio-trace is working. In a loadable
kernel module calls to __ioremap() are diverted to an mmio-trace
specific function. The function calls the real __ioremap() and marks
the returned pages as not present. Access to the mapped IO memory
causes a page fault. Mmio-trace's page fault handler marks the page
present, records the event, single-steps the faulting instruction, and
finally marks the page as not present again.

- Is there anything coming to replace register_page_fault_notifier()?
- Has someone found another way to accomplish the same without patching
the kernel?
- Should I start writing a patch to bring back the notifiers, based on
the commit that removed them?

People started to complain to me that mmio-trace does not compile for
2.6.24. At least the Nouveau project is a regular user of mmio-trace
for reverse-engineering, and the tool should be easy to use. Therefore
I would prefer not to force casual contributors to patch their kernels
as it might be too much for some. I am hoping that 2.6.25, if not
2.6.24, would contain a mechanism to accomplish what I need.

I am not a subscriber, so please keep me in CC. Also I CC'd the people
who wrote about the same issue in the past.

Thanks.


[1] http://nouveau.freedesktop.org/wiki/MmioTrace
[2] http://marc.info/?l=linux-kernel&m=119540769014745&w=2
[3] http://article.gmane.org/gmane.linux.kernel/609521

-- 
Pekka Paalanen
http://www.iki.fi/pq/
--

From: Christoph Hellwig
Date: Tuesday, January 8, 2008 - 1:32 pm

Please make mmio tracing generic enough so it works for every device and
submit it for inclusion as a config option.
--

From: Dave Airlie
Date: Tuesday, January 8, 2008 - 3:13 pm

Christoph,

it is generic code, it works by patching the module you load and is in
no way nvidia specific, I think we should revert your chance
as the notifiers have a valid user and one it has been planned to upstream..

Dave.
--

From: Benjamin Herrenschmidt
Date: Tuesday, January 8, 2008 - 5:26 pm

Agreed, it looks like a sane enough user of page fault notifiers, it's
going to be upstream, and it's useful to fight evil binary drivers so
that sounds like a good justification to bring those back in :-)

Cheers,
Ben.


--

From: Pavel Roskin
Date: Wednesday, January 9, 2008 - 11:18 am

The fault notifiers are used in a special version of MadWifi (Atheros
wireless driver with some non-free parts) to figure out which registers
are being accessed.  The tracing capability has been essential for the
development of the free replacement, ath5k.

Considering that ath5k is being developed in the kernel tree, whereas
madwifi-trace needs 2.6.23, one needs different kernels for tracing and
development.  Lifting this requirement would be a time saver for the the
developers of ath5k.  And once 2.6.24 is released, ath5k developers will
be in the position to ask users to downgrade their kernels to get traces
for their cards.  That's something I'd like to avoid.

I looked at ways to implement the page fault handler using kprobes, but
I could not find a simple solution.  The problem is that kprobes uses
the fault notifier for its own needs.  I'm still hopeful that somebody
with a better understanding of kprobes will be able to implement fault
notification without patching the kernel.  But if no such solution is
found, I would also support reverting the patch that removed fault
notifiers on i386.

-- 
Regards,
Pavel Roskin
--

From: Christoph Hellwig
Date: Wednesday, January 9, 2008 - 11:21 am

With your fixation on not patching the kernel you sound like a windows
developer.  There is no problem with patching the kernel, but the best
patching of that kernel is that which happens upstream so please folks
start submitting an mmiotrace variant for kernel inclusion.

Then again I don't quite get why you absolutely want to track pagefaults
anyway.  Just hooking into ioremap and read*/write* and iomap +
ioread*/iowrite* would be much easier and also a lot faster.  It would
also allow adding a nice sysfs attribute to enable this per-device.

--

From: Benjamin Herrenschmidt
Date: Wednesday, January 9, 2008 - 12:58 pm

Because binary drivers don't use ioread/iowrite/readX/writeX (or if they
do, they've long been inlined in the blob). The only solution is to map
things as non-accessible and trap the page faults.

It's a sane thing to do, Christoph, I don't think it's a unreasonable
request to put the hooks back in.

Cheers,
Ben.


--

From: Christoph Hellwig
Date: Wednesday, January 9, 2008 - 1:22 pm

As said a few times before there's simply no way we're going to put
exactly that crap back.  For one the patch removed a whole lot of
crud from the kprobes code that simply isn't going to come back just
because there are some pagefault notifiers.  Second the page fault
notifiers were horribly implemented and quite inefficient.  And third
we're not going to put something in just for out of tree code.

Please submit the mmiotrace for 2.6.25 and we'll find some way to make
it work.

--

From: Arjan van de Ven
Date: Wednesday, January 9, 2008 - 5:42 pm

On Wed, 9 Jan 2008 20:22:54 +0000

I'm btw all in favor of making mmio tracing full fledged kernel infrastructure.
This doesn't mean "notifier" imo; this means a real flag in the struct page,
and then the page fault code can do

if (page->flags & FLAG_MMIO_TRACED)
	mmio_trace(page, regs, whatever..);

(probably surrounded by a CONFIG_ ifdef)
THis is a TON lighter than a notifier chain, and actually what you want,
you don't really want a notifier, you want a call back when a special kind of
page is touched.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Andi Kleen
Date: Wednesday, January 9, 2008 - 5:47 pm

That would assume that your mmio area has a struct page. In most PCs 
the ones in the PCI hole don't

-Andi
--

From: Arjan van de Ven
Date: Wednesday, January 9, 2008 - 5:55 pm

On Thu, 10 Jan 2008 01:47:16 +0100

so you also call the function for all traps on pages without struct page;
that should be extremely rare anyway, and the mmio_trace code can then 
look the page up.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Andi Kleen
Date: Wednesday, January 9, 2008 - 6:01 pm

But there's really no page to look up. These are typically !pfn_valid()

You would need to create an own backing infrastructure. Perhaps you can reuse
some of that from the future PAT code, but integrating that into the page fault
will be probably ugly.

-Andi
--

From: Benjamin Herrenschmidt
Date: Wednesday, January 9, 2008 - 5:58 pm

Not that rare if you use a modern DRI :-)

In fact, the thing here is that it's mostly kernel mappings though, not
user mappings. So we never get there, we die before we even reach
generic code most of the time iirc.

The whole ioremap stuff is very platform specific, not everybody even
uses the code in mm/vmalloc.c for it, so at this stage, I see no other
option but a hook in do_page_fault().

Cheers,
Ben.

--

From: Benjamin Herrenschmidt
Date: Wednesday, January 9, 2008 - 5:56 pm

Except that MMIO has no struct page's...

Cheers,
Ben.


--

From: Matt Mackall
Date: Wednesday, January 9, 2008 - 7:03 pm

That makes it way too easy for drivers of questionable legality to just
clear that bit. Also, we've got a shortage of page bits, etc.

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Pavel Roskin
Date: Wednesday, January 9, 2008 - 7:21 pm

If we ever have this problem, the bit can be changed in the kernel to
fool those drivers (I hope the shortage is not so dire that there will
be no more bits left).

I don't think evil drivers should be a problem per se.  There are
existing non-free drivers that still need to be traced over and over
again.  I guess ndiswrapper could use tracing for Windows drivers that
don't know anything about Linux page flags.

Last but not least, mmiotrace should be useful for free drivers in the
first place to have a legitimate reason to be in the kernel.

-- 
Regards,
Pavel Roskin
--

From: Matt Mackall
Date: Wednesday, January 9, 2008 - 7:30 pm

That's what we call an arms race. And yes, page bits are close to

I definitely agree that tracing MMIO is an extremely useful feature.
I've implemented it myself once already for 2.4.

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Pekka Paalanen
Date: Thursday, January 10, 2008 - 11:44 am

On Wed, 9 Jan 2008 20:22:54 +0000

Fair enough. Thanks to all for the encouraging words. I will start to
work towards sending mmio-trace to kernel.

We'll see how far I get before 2.6.25 closes. I'm new to all this and
will need help, especially in making mmio-trace SMP-safe, for instance.

-- 
Pekka Paalanen
http://www.iki.fi/pq/
--

From: Pavel Roskin
Date: Wednesday, January 9, 2008 - 1:24 pm

I just assumed (wrongly, as it seems) that the API change didn't remove
any useful functionality.

Also, I don't want to ask users with unusual hardware to patch their

The problem is that mmiotrace only makes sense in combination with
non-free drivers, and I'm not sure it would be welcome in the kernel
even if you say so.

The hooks for page faults could be useful for something (debugging, some
non-traditional swap implementations), but I don't want to invent

Exactly.  In MadWifi, they are inlined on i386 and x86_64, and I cannot

Seconded.

-- 
Regards,
Pavel Roskin
--

From: Christoph Hellwig
Date: Wednesday, January 9, 2008 - 1:26 pm

No, this was exactly the correct assumption.  Out of tree modules simply

It can trace every driver in theory although it's of course not
really interesting for free drivers.  But it's not actually
functionality used by the drivers but functionlity to trace the drivers

Maybe it's time to do some simple xoring in the ioremap return value
to force them not to do such stupid things..

--

From: Pavel Roskin
Date: Wednesday, January 9, 2008 - 1:41 pm

OK, that's reassuring.  Maybe it could be used for profiling or to find
bugs that are not obvious otherwise.

-- 
Regards,
Pavel Roskin
--

From: Valdis.Kletnieks
Date: Wednesday, January 9, 2008 - 1:44 pm

If an open-source driver did exactly the same thing, would it still be
stupid?  If so, what should it be doing instead?
Previous thread: [PATCH 2/6] SUNRPC: export svc_sock_update_bufs by Jeff Layton on Tuesday, January 8, 2008 - 12:33 pm. (17 messages)

Next thread: [GIT PULL] please pull infiniband.git by Roland Dreier on Tuesday, January 8, 2008 - 1:25 pm. (1 message)