Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Christoph Lameter <clameter@...>
Cc: Robin Holt <holt@...>, Avi Kivity <avi@...>, Izik Eidus <izike@...>, Nick Piggin <npiggin@...>, <kvm-devel@...>, Benjamin Herrenschmidt <benh@...>, Peter Zijlstra <a.p.zijlstra@...>, <steiner@...>, <linux-kernel@...>, <linux-mm@...>, <daniel.blueman@...>, Hugh Dickins <hugh@...>
Date: Tuesday, January 29, 2008 - 6:35 pm

On Tue, Jan 29, 2008 at 01:53:05PM -0800, Christoph Lameter wrote:

It's taken writable due to the code being inefficient the first time,
all later times remap_populate_range overwrites ptes with the mmap_sem
in readonly mode (finally rightfully so). The first remap_file_pages I
guess it's irrelevant to optimize, the whole point of nonlinear is to
call remap_file_pages zillon of times on the same vma, overwriting
present ptes the whole time, so if the first time the mutex is not
readonly it probably doesn't make a difference.

get_user_pages invoked by the kvm spte-fault, can happen between
invalidate_range and populate_range. If it can't happen, for sure
nobody pointed out a good reason why it can't happen. The kvm page
faults as well rightfully only takes the mmap_sem in readonly mode, so
get_user_pages is only called internally to gfn_to_page with the
readonly semaphore.

With my approach ptep_clear_flush was not only invalidating sptes
after ptep_clear_flush, but it was also invalidating them inside the
PT lock, so it was totally obvious there could be no race vs
get_user_pages.


Yes, but it would have been micro-optimized later if you really cared,
by simply changing ptep_clear_flush to __ptep_clear_flush, no big
deal. Definitely all methods must be robust about them being called
multiple times, even if the rmap finds no spte mapping such host
virtual address.


That's a question you should answer.


No, that's a different angle.

But now I think there may be an issue with a third thread that may
show unsafe the removal of invalidate_page from ptep_clear_flush.

A third thread writing to a page through the linux-pte and the guest
VM writing to the same page through the sptes, will be writing on the
same physical page concurrently and using an userspace spinlock w/o
ever entering the kernel. With your patch that invalidate_range after
dropping the PT lock, the third thread may start writing on the new
page, when the guest is still writing to the old page through the
sptes. While this couldn't happen with my patch.

So really at the light of the third thread, it seems your approach is
smp racey and ptep_clear_flush should invalidate_page as last thing
before returning. My patch was enforcing that ptep_clear_flush would
stop the third thread in a linux page fault, and to drop the spte,
before the new mapping could be instantiated in both the linux pte and
in the sptes. The PT lock provided the needed serialization. This
ensured the third thread and the guest VM would always write on the
same physical page even if the first thread runs a flood of
remap_file_pages on that same page moving it around the pagecache. So
it seems I found a unfixable smp race in pretending to invalidate in a
sleeping place.

Perhaps you want to change the PT lock to a mutex instead of a
spinlock, that may be your only chance to sleep while maintaining 100%
memory coherency with threads.
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch 2/6] mmu_notifier: Callbacks to invalidate address ra..., Christoph Lameter, (Mon Jan 28, 4:28 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Andrea Arcangeli, (Tue Jan 29, 12:20 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 3:55 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 5:35 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 6:39 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Wed Jan 30, 3:35 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Wed Jan 30, 3:50 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Wed Jan 30, 8:01 pm)
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to inval..., Christoph Lameter, (Wed Jan 30, 10:08 pm)
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to inval..., Andrea Arcangeli, (Wed Jan 30, 10:42 pm)
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to inval..., Christoph Lameter, (Wed Jan 30, 10:51 pm)
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to inval..., Christoph Lameter, (Wed Jan 30, 9:46 pm)
Re: [kvm-devel] mmu_notifier: invalidate_range_start with lo..., Christoph Lameter, (Wed Jan 30, 10:56 pm)
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to inval..., Christoph Lameter, (Wed Jan 30, 10:37 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 8:20 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Wed Jan 30, 3:41 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Wed Jan 30, 4:55 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 8:35 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 8:22 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 4:30 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 5:53 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Andrea Arcangeli, (Tue Jan 29, 6:35 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 6:55 pm)
Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addres..., Christoph Lameter, (Tue Jan 29, 8:34 pm)