login
Header Space

 
 

Re: [PATCH 08 of 11] anon-vma-rwsem

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Christoph Lameter <clameter@...>, Jack Steiner <steiner@...>, Robin Holt <holt@...>, Nick Piggin <npiggin@...>, Peter Zijlstra <a.p.zijlstra@...>, <kvm-devel@...>, Kanoj Sarcar <kanojsarcar@...>, Roland Dreier <rdreier@...>, Steve Wise <swise@...>, <linux-kernel@...>, Avi Kivity <avi@...>, <linux-mm@...>, <general@...>, Hugh Dickins <hugh@...>, Rusty Russell <rusty@...>, Anthony Liguori <aliguori@...>, Chris Wright <chrisw@...>, Marcelo Tosatti <marcelo@...>, Eric Dumazet <dada1@...>, Paul E. McKenney <paulmck@...>
Date: Wednesday, May 7, 2008 - 6:22 pm

On Wed, May 07, 2008 at 02:36:57PM -0700, Linus Torvalds wrote:

I'll let you discuss with Christoph and Robin about it. The moment I
heard the schedule inside ->invalidate_page() requirement I reacted
the same way you did. But I don't see any other real solution for XPMEM
other than spin-looping for ages halting the scheduler for ages, while
the ack is received from the network device.

But mm_lock is required even without XPMEM. And srcu is also required
without XPMEM to allow ->release to schedule (however downgrading srcu
to rcu will result in a very small patch, srcu and rcu are about the
same with a kernel supporting preempt=y like 2.6.26).


I think it's a great smp scalability optimization over the global lock
you're proposing below.


Unfortunately the lock you're talking about would be:

static spinlock_t global_lock = ...

There's no way to make it more granular.

So every time before taking any ->i_mmap_lock _and_ any anon_vma->lock
we'd need to take that extremely wide spinlock first (and even worse,
later it would become a rwsem when XPMEM is selected making the VM
even slower than it already becomes when XPMEM support is selected at
compile time).


mmu_notifier_register can take ages. No problem.


mmu_notifier_register is fine to be hundred times slower (preempt-rt
will turn all locks in spinlocks so no problem).


Sure, I'll split it from the rest if the mmu-notifier-core isn't merged.

My objective has been:

1) add zero overhead to the VM before anybody starts a VM with kvm and
   still zero overhead for all other tasks except the task where the
   VM runs.  The only exception is the unlikely(!mm->mmu_notifier_mm)
   check that is optimized away too when CONFIG_KVM=n. And even for
   that check my invalidate_page reduces the number of branches to the
   absolute minimum possible.

2) avoid any new cacheline collision in the fast paths to allow numa
   systems not to nearly-crash (mm->mmu_notifier_mm will be shared and
   never written, except during the first mmu_notifier_register)

3) avoid any risk to introduce regressions in 2.6.26 (the patch must
   be obviously safe). Even if mm_lock would be a bad idea like you
   say, it's order of magnitude safer even if entirely broken then
   messing with the VM core locking in 2.6.26.

mm_lock (or whatever name you like to give it, I admit mm_lock may not
be worrysome enough for people to have an idea to call it in a fast
path) is going to be the real deal for the long term to allow
mmu_notifier_register to serialize against
invalidate_page_start/end. If I fail in 2.6.26 I'll offer
maintainership to Christoph as promised, and you'll find him pushing
for mm_lock to be merged (as XPMEM/GRU aren't technologies running on
cellphones where your global wide spinlocks is optimized away at
compile time, and he also has to deal with XPMEM where such a spinlock
would need to become a rwsem as the anon_vma->sem has to be taken
after it), but let's assume you're right entirely right here that
mm_lock is going to be dropped and there's a better way: it's still a
fine solution for 2.6.26.

And if you prefer I can move the whole mm_lock() from mmap.c/mm.h to
mmu_notifier.[ch] so you don't get any pollution in the core VM, and
mm_lock will be invisible to everything but anybody calling
mmu_notifier_register() then and it will be trivial to remove later if
you really want to add a global spinlock as there's no way to be more
granular than a _global_ numa-wide spinlock taken before any
i_mmap_lock/anon_vma->lock, without my mm_lock.
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 00 of 11] mmu notifier #v16, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 10 of 11] export zap_page_range for XPMEM, Andrea Arcangeli, (Wed May 7, 10:36 am)
[PATCH 11 of 11] mmap sems, Andrea Arcangeli, (Wed May 7, 10:36 am)
[PATCH 09 of 11] mm_lock-rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 4:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 5:26 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Jack Steiner, (Wed May 7, 6:42 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 5:36 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 7, 8:38 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 13, 8:06 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 13, 11:32 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Wed May 14, 12:11 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 14, 7:26 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Thu May 15, 3:57 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Thu May 15, 1:33 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Thu May 15, 7:52 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Fri May 16, 7:23 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Fri May 16, 7:50 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 1:31 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 6:01 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 6:50 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 7:05 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 7:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 7:26 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Thu May 15, 7:01 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Avi Kivity, (Thu May 15, 7:12 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 11:18 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 14, 1:57 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 2:27 pm)
mm notifier: Notifications when pages are unmapped., Christoph Lameter, (Fri May 16, 9:38 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 14, 12:22 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 12:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 8:55 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:22 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 6:44 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:58 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 7:09 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:02 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrew Morton, (Wed May 7, 6:31 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:44 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Benjamin Herrenschmidt, (Wed May 7, 7:28 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:45 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:34 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 13, 8:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Benjamin Herrenschmidt, (Wed May 14, 1:43 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Jack Steiner, (Wed May 14, 9:15 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Wed May 14, 2:06 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrew Morton, (Wed May 7, 6:59 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:02 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:26 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 9:12 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 11:10 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 11:41 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 12:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 1:20 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 11:03 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 12:11 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Peter Zijlstra, (Fri May 9, 2:37 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Fri May 9, 2:55 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Peter Zijlstra, (Fri May 9, 3:04 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 6:01 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Pekka Enberg, (Thu May 8, 1:27 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Pekka Enberg, (Thu May 8, 1:30 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 1:49 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:32 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 7:19 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 7:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 8:03 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 8:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:52 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:57 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:24 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 10:32 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:07 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 7, 8:52 pm)
[PATCH 07 of 11] i_mmap_rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 06 of 11] rwsem contended, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 05 of 11] unmap vmas tlb flushing, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 05 of 11] unmap vmas tlb flushing, Rik van Riel, (Wed May 7, 1:46 pm)
[PATCH 04 of 11] free-pgtables, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 04 of 11] free-pgtables, Rik van Riel, (Wed May 7, 1:41 pm)
[PATCH 03 of 11] invalidate_page outside PT lock, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 03 of 11] invalidate_page outside PT lock, Rik van Riel, (Wed May 7, 1:39 pm)
Re: [PATCH 03 of 11] invalidate_page outside PT lock, Andrea Arcangeli, (Wed May 7, 1:57 pm)
[PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrew Morton, (Wed May 7, 4:05 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 4:30 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 5:58 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 6:11 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:27 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:00 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:37 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:38 pm)
Re: [ofa-general] Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:39 pm)
Re: [ofa-general] Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:03 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrew Morton, (Wed May 7, 4:02 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Rik van Riel, (Wed May 7, 1:35 pm)
[PATCH 02 of 11] get_task_mm, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 02 of 11] get_task_mm, Robin Holt, (Wed May 7, 11:59 am)
Re: [PATCH 02 of 11] get_task_mm, Andrea Arcangeli, (Wed May 7, 12:20 pm)
speck-geostationary