Re: [PATCH 01 of 12] Core of mmu notifiers

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Robin Holt <holt@...>
Cc: Christoph Lameter <clameter@...>, Nick Piggin <npiggin@...>, Jack Steiner <steiner@...>, Peter Zijlstra <a.p.zijlstra@...>, <kvm-devel@...>, Kanoj Sarcar <kanojsarcar@...>, Roland Dreier <rdreier@...>, Steve Wise <swise@...>, <linux-kernel@...>, Avi Kivity <avi@...>, <linux-mm@...>, <general@...>, Hugh Dickins <hugh@...>, <akpm@...>, Rusty Russell <rusty@...>
Date: Wednesday, April 23, 2008 - 11:59 am

On Wed, Apr 23, 2008 at 09:47:47AM -0500, Robin Holt wrote:

That's not entirely true, you can close the file just fine it by
killing the tasks leading to an mmput. From an user prospective in KVM
terms, it won't make a difference as /dev/kvm will remain open and
it'll pin the module count until the kvm task is killed anyway, I
assume for GRU it's similar.

Until I had the idea of how to implement an mm_lock to ensure the
mmu_notifier_register could miss a running invalidate_range_begin, it
wasn't even possible to implement a mmu_notifier_unregister (see EMM
patches) and it looked like you were ok with that API that missed
_unregister...


Correct, but it will keep the vmalloc ram pinned during the
runtime. There's no reason to keep that ram allocated per-VM while the
VM runs. We only need it during the startup and teardown.


Correct, I already thought about that. This is exactly why I'm
deferring this for later! Or those perfectionism not needed for
KVM/GRU will keep delaying indefinitely the part that is already
converged and that's enough for KVM and GRU (and for this specific
bit, actually enough for XPMEM as well).

We can make a second version of mm_lock_slow to use if mm_lock fails,
in mmu_notifier_unregister, with N^2 complexity later, after the
mmu-notifier-core is merged into mainline.


As said 1/N isn't enough for XPMEM anyway. 1/N has to only include the
absolute minimum and zero risk stuff, that is enough for both KVM and
GRU.


Yes exactly, but you've to do that anyway, if mmu_notifier_unregister
fails because some driver already allocated all vmalloc space (even
x86-64 hasn't indefinite amount of vmalloc because of the vmalloc
being in the end of the address space) unless we've a N^2 fallback,
but the N^2 fallback will make the code more easily dosable and
unkillable, so if I would be an admin I'd prefer having to quickly
kill -9 a task in O(N) than having to wait some syscall that runs in
O(N^2) to complete before the task quits. So the fallback to a slower
algorithm isn't necessarily what will really happen after 2.6.26 is
released, we'll see. Relaying on ->release for the module unpin sounds
preferable, and it's certainly the only reliable way to unregister
that we'll provide in 2.6.26.


But you don't need any browse the list for this, keep a flag in your
structure after the mmu_notifier struct, set the bitflag after
mmu_notifier_register returns, and clear the bitflag after ->release
runs or after mmu_notifier_unregister returns success. What's the big
deal to track if you've to call mmu_notifier_register a second time or
not? Or you can create a new structure every time somebody asks to
reattach.


We can add it later, and we can keep discussing on what's the best
model to implement it as long as you want after 2.6.26 is released
with mmu-notifier-core so GRU/KVM are done. It's unlikely KVM will use
mmu_notifier_unregister anyway as we need it attached for the whole
lifetime of the task, and only for the lifetime of the task.

This is the patch to add it, as you can see it's entirely orthogonal,
backwards compatible with previous API and it doesn't duplicate or
rewrite any code.

Don't worry, any kernel after 2.6.26 will have unregister, but we can't
focus on this for 2.6.26. We can also consider making
mmu_notifier_register safe against double calls on the same structure
but again that's not something we should be doing in 1/N and it can be
done later in a backwards compatible way (plus we're perfectly fine
with the API having not backwards compatible changes as long as 2.6.26
can work for us).

---------------------------------
Implement unregister but it's not reliable, only ->release is reliable.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -119,6 +119,8 @@
 
 extern int mmu_notifier_register(struct mmu_notifier *mn,
 				 struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+				   struct mm_struct *mm);
 extern void __mmu_notifier_release(struct mm_struct *mm);
 extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
 					  unsigned long address);
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -106,3 +106,29 @@
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_register);
+
+/*
+ * mm_users can't go down to zero while mmu_notifier_unregister()
+ * runs or it can race with ->release. So a mm_users pin must
+ * be taken by the caller (if mm can be different from current->mm).
+ *
+ * This function can fail (for example during out of memory conditions
+ * or after vmalloc virtual range shortage), so the only reliable way
+ * to unregister is to wait release() to be called.
+ */
+int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data data;
+	int ret;
+
+	BUG_ON(!atomic_read(&mm->mm_users));
+
+	ret = mm_lock(mm, &data);
+	if (unlikely(ret))
+		goto out;
+	hlist_del(&mn->hlist);
+	mm_unlock(mm, &data);
+out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_unregister);

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Tue Apr 22, 9:51 am)
Re: [PATCH 00 of 12] mmu notifier #v13, Jack Steiner, (Tue Apr 22, 8:31 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Robin Holt, (Tue Apr 22, 2:22 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Tue Apr 22, 2:43 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Christoph Lameter, (Tue Apr 22, 4:28 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Robin Holt, (Tue Apr 22, 3:42 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Christoph Lameter, (Tue Apr 22, 4:30 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Wed Apr 23, 9:33 am)
Re: [PATCH 10 of 12] Convert mm_lock to use semaphores after..., Christoph Lameter, (Tue Apr 22, 4:26 pm)
Re: [PATCH 10 of 12] Convert mm_lock to use semaphores after..., Christoph Lameter, (Tue Apr 22, 7:19 pm)
Re: [PATCH 05 of 12] Move the tlb flushing into free_pgtable..., Christoph Lameter, (Tue Apr 22, 4:25 pm)
[PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 9:51 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Jack Steiner, (Wed Apr 23, 1:09 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 1:45 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Tue Apr 22, 4:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 6:35 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Tue Apr 22, 7:20 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 12:26 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:15 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 1:24 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:21 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 2:34 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 22, 7:07 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 9:36 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Wed Apr 23, 10:47 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:59 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:09 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 2:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:27 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 2:37 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:46 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Jack Steiner, (Tue Apr 22, 8:28 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 12:37 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 6:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Thu Apr 24, 2:49 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Thu Apr 24, 5:51 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Thu Apr 24, 11:39 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Thu Apr 24, 1:41 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Sat Apr 26, 9:17 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Sun Apr 27, 8:27 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Mon Apr 28, 4:34 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Mon Apr 28, 8:10 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Hugh Dickins, (Tue Apr 29, 6:49 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 9:32 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Mon Apr 28, 9:28 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 11:30 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 29, 11:50 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 12:03 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed May 7, 11:00 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Sat Apr 26, 10:04 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 2:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 2:25 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 22, 4:31 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Eric Dumazet, (Tue Apr 22, 10:56 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 11:15 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Eric Dumazet, (Tue Apr 22, 11:37 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 12:46 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Avi Kivity, (Tue Apr 22, 11:24 am)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outside ..., Christoph Lameter, (Tue Apr 22, 4:24 pm)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outside ..., Christoph Lameter, (Tue Apr 22, 7:14 pm)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outside ..., Christoph Lameter, (Wed Apr 23, 2:02 pm)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outside ..., Andrea Arcangeli, (Wed Apr 23, 12:15 pm)
Re: [PATCH 03 of 12] get_task_mm should not succeed if mmput..., Christoph Lameter, (Tue Apr 22, 4:23 pm)
Re: [PATCH 03 of 12] get_task_mm should not succeed if mmput..., Christoph Lameter, (Tue Apr 22, 7:13 pm)
Re: [PATCH 02 of 12] Fix ia64 compilation failure because of..., Christoph Lameter, (Tue Apr 22, 4:22 pm)