Re: [RFC/PATH 1/2] MM: Make Page Tables Relocatable -- conditional flush

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Jeremy Fitzhardinge
Date: Wednesday, April 30, 2008 - 12:56 pm

Ross Biro wrote:

I wonder if rearranging mm_struct could put it on an already-hot cacheline?


Yeah, its easy for that to happen.  Those comments are helpful, I was 
thinking very high-level things like:

    * what initiates migration?
    * how can an mm be under multiple levels of migration?
    * ...?

Maybe the existing comments are sufficient.


Just a comment saying something like "a pagetable page is considered to 
be in limbo if it has been copied, but may still be in use.  It may be 
either in a cpu's stale tlb entry, or in use by the kernel on another 
cpu with a transient reference." would clarify what the delimbo is 
trying to achieve.


But the issue I'm concerned about is what happens if a process writes 
the page, causing its cpu to mark the (old, in-limbo) pte dirty.  
Meanwhile someone else is scanning the pagetables looking for things to 
evict.  It check the (shiny new) pte, finds it not dirty, and decides to 
evict the apparently clean page.

What, for that matter, stops a page from being evicted from under a 
limboed mapping?  Does it get accounted for (I guess the existing tlb 
flushing should be sufficient to keep it under control).

Also, what happens if a page happens to get migrated twice in quick 
succession (ie, while there's still an in-limbo page from the first 
time)?  Is there something to prevent that, or would it just all work out?


PTRS_PER_PGD * sizeof(pgd_t) == PAGE_SIZE


Erm, if you're lucky.  It can get pretty hairy.


Also the notion of a pgd_list, which is an x86-special at the moment.


Well, x86-64 is 4 level, x86-32 PAE is 3 level, and x86-32 non-PAE is 2 
level.  I don't think there'd be too much crying if you didn't support 
32-bit non-PAE, but 32-bit PAE is useful.

I would say that x86 unification is still a fair way from "done", but 
the areas you're dealing with should be getting more settled now.


OK.  I was planning on making the change anyway, and this is just 
another reason to do it.


I wouldn't eliminate it for speed, but it could be misleading if someone 
thought that it would actually do something.  Don't know; no clear 
answer.  Given that delimbo_X are inlines, you could easily put a "if 
(mm == &init_mm)" to skip everything, which would make these cases 
compile to nothing (and "if (__builtin_constant_p(mm) && mm == 
&init_mm)" if you really want to make sure there's no additional 
generated code).


It's OK for them to be using the old mm while you're migrating because 
they'll be using the limbo pages.  When you've completed the migration 
(when the count gets to 0?) then you can do a cross-cpu tlb flush (or 
function call to do the flush) to sync everything up.

Or, I guess looking at it the other way, the MMF_NEED_FLUSH means "I 
changed something, so sync up".  If you're migrating, the only reason 
that something didn't change was because you failed to allocate new 
pages to migrate into.  Given that that's unlikely, why not just 
(globally) flush unconditionally when migration is complete?  Similarly, 
why do you need MMF_NEED_RELOAD?  Couldn't you just compare mm->pgd with 
the new pgd and globally reload it if it changed?

Also, do you need to do the syncing and page freeing each time you're 
leaving a relocation_mode, or just the last time?  I guess if you defer 
it to the last leaving there's a possibility of livelock where you're 
always relocating and never free anything.


Well, see my comment above.


I think "migrate_X_entry()" is a better description of that.  The "pgd" 
is a whole array of pgd entries, so when I first saw migrate_pgd I was 
expecting you to be traversing the whole array and doing stuff.  
"pgd_entry" makes it clear you're only doing something to one of those.


I think if its page-sized you can be reasonably sure that its also 
page-aligned.  Or just have the arch set some Kconfig variables: 
MIGRATE_PAGETABLE_PGD, etc.


I think pagetable pages have quite a few struct page entries which can 
be overloaded, because they don't participate in most of the other 
activities a normal vm page does.  You could probably add something to 
the "_mapcount/inuse,objects" union, or steal some page flags.  (Not 
"private", because I'm planning on using that for some Xen-specific 
pagetable information ;)

    J
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [RFC/PATH 1/2] MM: Make Page Tables Relocatable -- con ..., Jeremy Fitzhardinge, (Wed Apr 30, 10:54 am)
Re: [RFC/PATH 1/2] MM: Make Page Tables Relocatable -- con ..., Jeremy Fitzhardinge, (Wed Apr 30, 12:56 pm)