login
Header Space

 
 

Re: [PATCH 08 of 11] anon-vma-rwsem

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nick Piggin <npiggin@...>
Cc: Robin Holt <holt@...>, Nick Piggin <nickpiggin@...>, Linus Torvalds <torvalds@...>, Andrea Arcangeli <andrea@...>, Andrew Morton <akpm@...>, Christoph Lameter <clameter@...>, Jack Steiner <steiner@...>, Peter Zijlstra <a.p.zijlstra@...>, <kvm-devel@...>, Kanoj Sarcar <kanojsarcar@...>, Roland Dreier <rdreier@...>, Steve Wise <swise@...>, <linux-kernel@...>, Avi Kivity <avi@...>, <linux-mm@...>, <general@...>, Hugh Dickins <hugh@...>, Rusty Russell <rusty@...>, Anthony Liguori <aliguori@...>, Chris Wright <chrisw@...>, Marcelo Tosatti <marcelo@...>, Eric Dumazet <dada1@...>, Paul E. McKenney <paulmck@...>
Date: Thursday, May 15, 2008 - 7:01 am

We are pursuing Linus' suggestion currently.  This discussion is
completely unrelated to that work.

On Thu, May 15, 2008 at 09:57:47AM +0200, Nick Piggin wrote:

We would need to deposit the payload into a central location to do the
invalidate, correct?  That central location would either need to be
indexed by physical cpuid (65536 possible currently, UV will push that
up much higher) or some sort of global id which is difficult because
remote partitions can reboot giving you a different view of the machine
and running partitions would need to be updated.  Alternatively, that
central location would need to be protected by a global lock or atomic
type operation, but a majority of the machine does not have coherent
access to other partitions so they would need to use uncached operations.
Essentially, take away from this paragraph that it is going to be really
slow or really large.

Then we need to deposit the information needed to do the invalidate.

Lastly, we would need to interrupt.  Unfortunately, here we have a
thundering herd.  There could be up to 16256 processors interrupting the
same processor.  That will be a lot of work.  It will need to look up the
mm (without grabbing any sleeping locks in either xpmem or the kernel)
and do the tlb invalidates.

Unfortunately, the sending side is not free to continue (in most cases)
until it knows that the invalidate is completed.  So it will need to spin
waiting for a completion signal will could be as simple as an uncached
word.  But how will it handle the possible failure of the other partition?
How will it detect that failure and recover?  A timeout value could be
difficult to gauge because the other side may be off doing a considerable
amount of work and may just be backed up.


It is an assumption based upon some of the kernel functions we call
doing things like grabbing mutexes or rw_sems.  That pushes back to us.
I think the kernel's locking is perfectly reasonable.  The problem we run
into is we are trying to get from one context in one kernel to a different
context in another and the in-between piece needs to be sleepable.


XPMEM allows one process to make a portion of its virtual address range
directly addressable by another process with the appropriate access.
The other process can be on other partitions.  As long as Numa-link
allows access to the memory, we can make it available.  Userland has an
advantage in that the kernel entrance/exit code contains memory errors
so we can contain hardware failures (in most cases) to only needing to
terminate a user program and not lose the partition.  The kernel enjoys
no such fault containment so it can not safely directly reference memory.


Thanks,
Robin
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 00 of 11] mmu notifier #v16, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 10 of 11] export zap_page_range for XPMEM, Andrea Arcangeli, (Wed May 7, 10:36 am)
[PATCH 11 of 11] mmap sems, Andrea Arcangeli, (Wed May 7, 10:36 am)
[PATCH 09 of 11] mm_lock-rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 4:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 5:26 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Jack Steiner, (Wed May 7, 6:42 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 5:36 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 7, 8:38 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 13, 8:06 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 13, 11:32 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Wed May 14, 12:11 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 14, 7:26 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Thu May 15, 3:57 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Thu May 15, 1:33 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Thu May 15, 7:52 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Fri May 16, 7:23 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Fri May 16, 7:50 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 1:31 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 6:01 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 6:50 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 7:05 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 20, 7:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Tue May 20, 7:26 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Thu May 15, 7:01 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Avi Kivity, (Thu May 15, 7:12 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 11:18 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 14, 1:57 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 2:27 pm)
mm notifier: Notifications when pages are unmapped., Christoph Lameter, (Fri May 16, 9:38 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 14, 12:22 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 14, 12:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 8:55 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:22 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 6:44 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:58 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 7:09 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:02 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrew Morton, (Wed May 7, 6:31 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 6:44 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Benjamin Herrenschmidt, (Wed May 7, 7:28 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:45 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:34 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Tue May 13, 8:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Benjamin Herrenschmidt, (Wed May 14, 1:43 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Jack Steiner, (Wed May 14, 9:15 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Nick Piggin, (Wed May 14, 2:06 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrew Morton, (Wed May 7, 6:59 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 7:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:02 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:26 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 9:12 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 11:10 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 11:41 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 12:14 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 1:20 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 11:03 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Thu May 8, 12:11 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Peter Zijlstra, (Fri May 9, 2:37 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Fri May 9, 2:55 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Peter Zijlstra, (Fri May 9, 3:04 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 6:01 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Pekka Enberg, (Thu May 8, 1:27 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Pekka Enberg, (Thu May 8, 1:30 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Thu May 8, 1:49 am)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:32 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 7:19 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 7:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 8:03 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Christoph Lameter, (Wed May 7, 8:56 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:39 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 9:52 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:57 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Andrea Arcangeli, (Wed May 7, 10:24 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 10:32 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Linus Torvalds, (Wed May 7, 9:07 pm)
Re: [PATCH 08 of 11] anon-vma-rwsem, Robin Holt, (Wed May 7, 8:52 pm)
[PATCH 07 of 11] i_mmap_rwsem, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 06 of 11] rwsem contended, Andrea Arcangeli, (Wed May 7, 10:35 am)
[PATCH 05 of 11] unmap vmas tlb flushing, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 05 of 11] unmap vmas tlb flushing, Rik van Riel, (Wed May 7, 1:46 pm)
[PATCH 04 of 11] free-pgtables, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 04 of 11] free-pgtables, Rik van Riel, (Wed May 7, 1:41 pm)
[PATCH 03 of 11] invalidate_page outside PT lock, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 03 of 11] invalidate_page outside PT lock, Rik van Riel, (Wed May 7, 1:39 pm)
Re: [PATCH 03 of 11] invalidate_page outside PT lock, Andrea Arcangeli, (Wed May 7, 1:57 pm)
[PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrew Morton, (Wed May 7, 4:05 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 4:30 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 5:58 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 6:11 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:27 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:00 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:37 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:38 pm)
Re: [ofa-general] Re: [PATCH 01 of 11] mmu-notifier-core, Andrea Arcangeli, (Wed May 7, 6:39 pm)
Re: [ofa-general] Re: [PATCH 01 of 11] mmu-notifier-core, Linus Torvalds, (Wed May 7, 7:03 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Andrew Morton, (Wed May 7, 4:02 pm)
Re: [PATCH 01 of 11] mmu-notifier-core, Rik van Riel, (Wed May 7, 1:35 pm)
[PATCH 02 of 11] get_task_mm, Andrea Arcangeli, (Wed May 7, 10:35 am)
Re: [PATCH 02 of 11] get_task_mm, Robin Holt, (Wed May 7, 11:59 am)
Re: [PATCH 02 of 11] get_task_mm, Andrea Arcangeli, (Wed May 7, 12:20 pm)
speck-geostationary