The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.invalidate_range_begin/end() is frequently called with only mmap_sem
held. If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible.In two cases we use invalidate_range_begin/end to invalidate
single pages because the pair allows holding off new references
(idea by Robin Holt).do_wp_page(): We hold off new references while update the pte.
xip_unmap: We are not taking the PageLock so we cannot
use the invalidate_page mmu_rmap_notifier. invalidate_range_begin/end
stands in.Comments state that mmap_sem must be held for
remap_pfn_range() but various drivers do not seem to do this.Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>---
mm/filemap_xip.c | 5 +++++
mm/fremap.c | 3 +++
mm/hugetlb.c | 3 +++
mm/memory.c | 24 ++++++++++++++++++++++--
mm/mmap.c | 2 ++
mm/mremap.c | 7 ++++++-
6 files changed, 41 insertions(+), 3 deletions(-)Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c 2008-01-31 20:56:03.000000000 -0800
+++ linux-2.6/mm/fremap.c 2008-01-31 20:59:14.000000000 -0800
@@ -15,6 +15,7 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>#include <asm/mmu_context.h>
#include <asm/cacheflush.h>
@@ -211,7 +212,9 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(&mapping->i_mmap_lock);
}+ mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
err = populate_range(mm, vma, start, size, pgoff);
+ mmu_notifier(invalidate_range_end, mm, start, start + size, 0);
if (!err &&...
Christoph,
The following code in do_wp_page is a problem.
We are getting this callout when we transition the pte from a read-only
to read-write. Jack and I can not see a reason we would need that
callout. It is causing problems for xpmem in that a write fault goes
to get_user_pages which gets back to do_wp_page that does the callout.XPMEM only allows either faulting or invalidating to occur for an mm.
As you can see, the case above needs it to be in both states.Thanks,
--
Right. You placed it there in the first place. So we can drop the code
from do_wp_page?--
No, we need a callout when we are becoming more restrictive, but not
when becoming more permissive. I would have to guess that is the case
for any of these callouts. It is for both GRU and XPMEM. I would
expect the same is true for KVM, but would like a ruling from Andrea on
that.Thanks,
Robin
--
I still hope I don't need to take any lock in _range_start and that
losing coherency (w/o risking global memory corruption but only
risking temporary userland data corruption thanks to the page pin) is
ok for KVM.If I would have to take a lock in _range_start like XPMEM is forced to
do (GRU is by far not forced to it, if it would switch to my #v5) then
it would be a problem.
--
do_wp_page is entered when the pte shows that the page is not writeable
and it makes the page writable in some situations. Then we do not
invalidate the remote reference.However, when we do COW then a *new* page is put in place of the existing
readonly page. At that point we need to remove the remote pte that is
readonly. Then we install a new pte pointing to a *different* page that is
writable.Are you saying that you get the callback when transitioning from a read
only to a read write pte on the *same* page?--
I believe that is what we saw. We have not put in any more debug
information yet. I will try to squeze it in this weekend. Otherwise,
I will probably have to wait until early Monday.Thanks
Robin
--
I hate it when I am confused. I misunderstood what Dean had been saying.
After I looked at his test case and remembering his screen at the time
we were discussing, I am nearly positive that both the parent and child
were still running (no exec, no exit). We would therefore have two refs
on the page and, yes, be changing the pte which would warrant the callout.
Now I really need to think this through more. Sounds like a good thing
for Monday.Thanks,
Robin
--
do_wp_page can reach the _end callout without passing the _begin
callout. This prevents making the _end unles the _begin has also
been made.Index: mmu_notifiers-cl-v5/mm/memory.c
===================================================================
--- mmu_notifiers-cl-v5.orig/mm/memory.c 2008-02-01 04:44:03.000000000 -0600
+++ mmu_notifiers-cl-v5/mm/memory.c 2008-02-01 04:46:18.000000000 -0600
@@ -1564,7 +1564,7 @@ static int do_wp_page(struct mm_struct *
{
struct page *old_page, *new_page;
pte_t entry;
- int reuse = 0, ret = 0;
+ int reuse = 0, ret = 0, invalidate_started = 0;
int page_mkwrite = 0;
struct page *dirty_page = NULL;@@ -1649,6 +1649,8 @@ gotten:
mmu_notifier(invalidate_range_begin, mm, address,
address + PAGE_SIZE, 0);
+ invalidate_started = 1;
+
/*
* Re-check the pte - we dropped the lock
*/
@@ -1687,7 +1689,8 @@ gotten:
page_cache_release(old_page);
unlock:
pte_unmap_unlock(page_table, ptl);
- mmu_notifier(invalidate_range_end, mm,
+ if (invalidate_started)
+ mmu_notifier(invalidate_range_end, mm,
address, address + PAGE_SIZE, 0);
if (dirty_page) {
if (vma->vm_file)
--
Argh. Did not see this soon enougn. Maybe this one is better since it
avoids the additional unlocks?--
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Andrew Morton | 2.6.25-mm1 |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Natalie Protasevich | [BUG] New Kernel Bugs |
