I wonder if rearranging mm_struct could put it on an already-hot cacheline?
Yeah, its easy for that to happen. Those comments are helpful, I was
thinking very high-level things like:
* what initiates migration?
* how can an mm be under multiple levels of migration?
* ...?
Maybe the existing comments are sufficient.
Just a comment saying something like "a pagetable page is considered to
be in limbo if it has been copied, but may still be in use. It may be
either in a cpu's stale tlb entry, or in use by the kernel on another
cpu with a transient reference." would clarify what the delimbo is
trying to achieve.
But the issue I'm concerned about is what happens if a process writes
the page, causing its cpu to mark the (old, in-limbo) pte dirty.
Meanwhile someone else is scanning the pagetables looking for things to
evict. It check the (shiny new) pte, finds it not dirty, and decides to
evict the apparently clean page.
What, for that matter, stops a page from being evicted from under a
limboed mapping? Does it get accounted for (I guess the existing tlb
flushing should be sufficient to keep it under control).
Also, what happens if a page happens to get migrated twice in quick
succession (ie, while there's still an in-limbo page from the first
time)? Is there something to prevent that, or would it just all work out?
PTRS_PER_PGD * sizeof(pgd_t) == PAGE_SIZE
Erm, if you're lucky. It can get pretty hairy.
Also the notion of a pgd_list, which is an x86-special at the moment.
Well, x86-64 is 4 level, x86-32 PAE is 3 level, and x86-32 non-PAE is 2
level. I don't think there'd be too much crying if you didn't support
32-bit non-PAE, but 32-bit PAE is useful.
I would say that x86 unification is still a fair way from "done", but
the areas you're dealing with should be getting more settled now.
OK. I was planning on making the change anyway, and this is just
another reason to do it.
I wouldn't eliminate it for speed, but it could be misleading if someone
thought that it would actually do something. Don't know; no clear
answer. Given that delimbo_X are inlines, you could easily put a "if
(mm == &init_mm)" to skip everything, which would make these cases
compile to nothing (and "if (__builtin_constant_p(mm) && mm ==
&init_mm)" if you really want to make sure there's no additional
generated code).
It's OK for them to be using the old mm while you're migrating because
they'll be using the limbo pages. When you've completed the migration
(when the count gets to 0?) then you can do a cross-cpu tlb flush (or
function call to do the flush) to sync everything up.
Or, I guess looking at it the other way, the MMF_NEED_FLUSH means "I
changed something, so sync up". If you're migrating, the only reason
that something didn't change was because you failed to allocate new
pages to migrate into. Given that that's unlikely, why not just
(globally) flush unconditionally when migration is complete? Similarly,
why do you need MMF_NEED_RELOAD? Couldn't you just compare mm->pgd with
the new pgd and globally reload it if it changed?
Also, do you need to do the syncing and page freeing each time you're
leaving a relocation_mode, or just the last time? I guess if you defer
it to the last leaving there's a possibility of livelock where you're
always relocating and never free anything.
Well, see my comment above.
I think "migrate_X_entry()" is a better description of that. The "pgd"
is a whole array of pgd entries, so when I first saw migrate_pgd I was
expecting you to be traversing the whole array and doing stuff.
"pgd_entry" makes it clear you're only doing something to one of those.
I think if its page-sized you can be reasonably sure that its also
page-aligned. Or just have the arch set some Kconfig variables:
MIGRATE_PAGETABLE_PGD, etc.
I think pagetable pages have quite a few struct page entries which can
be overloaded, because they don't participate in most of the other
activities a normal vm page does. You could probably add something to
the "_mapcount/inuse,objects" union, or steal some page flags. (Not
"private", because I'm planning on using that for some Xen-specific
pagetable information ;)
J
--