Re: [RFC] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Mel Gorman <mel@...>
Cc: <wli@...>, <agl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Friday, April 25, 2008 - 10:28 am

On Mon, Apr 21, 2008 at 07:36:22PM +0100, Mel Gorman wrote:

[This is one of those patches which is best read applied, diff has not
been friendly to the reviewer.]

Overall I think we should be sanitising these semantics.  So I would
like to see this stack progressed.


Ok, you zap out the reservation when the VMA is opened.  How does that
tie in with the VMA modifications which occur when we mprotect a page in
the middle of a map?

From my reading of vma_adjust and vma_split, I am not convinced you
would maintain the reservation correctly.  I suspect that the original
VMA will retain the whole reservation which it will then not be able to
use.  The new VMAs would not have any reservation and might then fail on
fault dispite the total reservation being sufficient.


In the read-only case you only create a reservation for the first mmap
of a particular offset in the file.  I do not think this will work as
intended.  If we consider a process which forks, and each process then
mmaps the same offset.  The first will get a reservation for its mmap,
the second will not.  This seems to violate the "mapper is guarenteed
to get sufficient pages" guarentee for the second mapper.  As the
pages are missing and read-only we know that we actually could share the
pages so in some sense this might make sense _if_ we could find and
share the pages at fault time.  Currently we do not have the information
required to find these pages so we would have to allocate pages for each
mmap.

As things stand I think that we should be using 'chg = to - from' for
all private mappings.  As each mapping is effectivly independant.


Whats not clear from the diff is that this change leaves us with two
cases where we apply region_chg() and one where we do not, but we then
always apply region_add().  Now when writing that region code I intended
the region_chg/region_add as prepare/commit pair with the former
performing any memory allocation we might require.  It is not safe to
call region_add without first calling region_chg.  Yes the names are not
helpful.  That region_add probabally should be:

        if (vma->vm_flags & VM_SHARED || !(vma->vm_flags & VM_MAYWRITE))
		region_add(&inode->i_mapping->private_list, from, to);



-apw
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [RFC] Reserve huge pages for reliable MAP_PRIVATE hugetl..., Andy Whitcroft, (Fri Apr 25, 10:28 am)