Re: [PATCH] ummunotify: Userspace support for MMU notifications

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Jason Gunthorpe
Date: Monday, April 12, 2010 - 4:59 pm

On Mon, Apr 12, 2010 at 04:03:59PM -0700, Andrew Morton wrote:


Just to summarize some of the key points of this thingy, as related to
your comments:
 1) It is really very narrowly focused on a particular problem MPI and
    RDMA have due to the way their APIs don't really match. Roland
    tried to make the interface general..  Maybe that is a mistake ..
 2) A 'self-tracing' scheme is used, again, because of an API
    mistmatching between a MPI library and it's own
    applications. Attempting to hook the appropriate calls has
    proven unsatisfactory (missing cases, and slow).
 3) Being intended for MPI applications, performance is a huge
    concern. Synchronous operation is very undesirable. Tracing APIs
    are lossy - and there is no recovery option if an event is lost.
 4) Realistically the only thing MPI cares about is if a virtual page
    is unmapped/remapped. Loosing events is unacceptable.
 5) This isn't really tracing. There is no queue. There aren't really
    events. This works more like the diry/access bit in a page table,
    it doesn't matter how many times something has been modified, only
    that it has at least once since last time you looked.
    
    This means the memory used is proportional to the number of
    page-ranges you watch, and the number of events against those
    page-ranges doesn't matter. No other API has this property.

Basically, this entire scheme is designed to detect that when a == b,
the internal state held by some_mpi_call is no longer valid, in
this kind of situation:
 a = mmap(ONE_PAGE);
 some_mpi_call(a);
 munmap(a);
 b = mmap(ONE_PAGE);   // Kernel picks b == a
 some_mpi_call(b);

All the races you point out, just don't matter for the MPI use
case. Essentially, if the app hits those races, then it is using the
MPI library in a buggy way.

That said, this could be explained better in the documentation file. :)

I'm sure Eric can go through the rest of your questions in greater
detail..


The only case that matters for the generation counter optimization is
a false negative. As long as user space does:

u64 val = *counter;
if (val != last_counter)
   last_counter = val;

Then you can get false positives as you point out, but never a false
negative. A false positive results in an extra syscall and the kernel
just returns no data.

Regards,
Jason
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH] ummunotify: Userspace support for MMU notifica ..., Jason Gunthorpe, (Mon Apr 12, 4:59 pm)
[PATCH] ummunotify: fix umn-test build, Randy Dunlap, (Wed Apr 14, 9:43 am)
Re: [PATCH] ummunotify: fix umn-test build, Eric B Munson, (Sat Apr 17, 10:44 am)
Re: [PATCH] ummunotify: fix umn-test build, Roland Dreier, (Sun Apr 18, 7:38 am)