Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.

Previous thread: Build regressions/improvements in v2.6.37-rc8 by Geert Uytterhoeven on Thursday, December 30, 2010 - 12:42 pm. (1 message)

Next thread: [PATCH 5/8] xen/debug: Print out all pages in the P2M. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (1 message)
From: Konrad Rzeszutek Wilk
Date: Thursday, December 30, 2010 - 12:48 pm

Our P2M tree structure is a three-level. On the leaf nodes
we set the Machine Frame Number (MFN) of the PFN. What this means
is that when one does: pfn_to_mfn(pfn), which is used when creating
PTE entries, you get the real MFN of the hardware. When Xen sets
up a guest it initially populates a array which has descending MFN
values, as so:

 idx: 0,  1,       2
 [0x290F, 0x290E, 0x290D, ..]

so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
starts looking quite random.

We graft this structure on our P2M tree structure and stick in
those MFN in the leafs. But for all other leaf entries, or for the top
root, or middle one, for which there is a void entry, we assume it is
"missing". So
 pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.

We add the possibility of setting 1-1 mappings on certain regions, so
that:
 pfn_to_mfn(0xc0000)=0xc0000

The benefit of this is, that we can assume for non-RAM regions (think
PCI BARs, or ACPI spaces), we can create mappings easily b/c we
get the PFN value to match the MFN.

For this to work efficiently we introduce are two new pages: p2m_identity
and p2m_mid_identity.  All entries in p2m_identity are set to INVALID_P2M_ENTRY
type, and all entries in p2m_mid_identity point to p2m_identity. Whenever we
are told to set the MFN which is equal to PFN for void regions, we swap over
from p2m_missing to p2m_identity or p2m_mid_missing to p2m_mid_identity.

See this diagram for details:
https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   67 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 8e9c669..667901f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -217,6 +217,9 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
 static ...
From: Ian Campbell
Date: Tuesday, January 4, 2011 - 9:53 am

Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this
patch?


This statement confused me at first. I think the piece of information
which is missing in the rest of the paragraph is that on lookup we spot
that the entry points to p2m_identity and return the identity value
instead of dereferencing and returning INVALID_P2M_ENTRY.

If the value is never actually (deliberately) completely dereferecned
then perhaps for sanity/debugging the page should contain some other
invalid/poison value so we can spot in stack traces etc cases where it
has been erroneously dereferenced? (e.g. 0xFFFFFFFE?) Or does that

If p2m_top[topidx] == p2m_mid_identity the also p2m_top[topidx][mididx]
== p2m_identity. Therefore I'm not sure if the first check is worthwhile

Should be "return true" throughout for a function returning bool. I

I don't think I quite understand this bit.

If I do "__set_phys_to_machine(X, X)" where X happens to currently
correspond to p2m_mid_missing won't that cause all pfn entries in the
range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or
something) to switch from missing to identity? Similarly for
p2m_top[topidx][mididx]?

Perhaps ranges of identity bits are often well aligned with the
boundaries in the p2m 3-level tree but wouldn't that just be
coincidence?

Don't we need to completely populate the tree at this point and setup
the leaf nodes as appropriate? Which we can't do since this is
__set_phys_to_machine so we need to fail and let set_phys_to_machine to
its thing? Or maybe this hole bit needs to be hoisted up to
set_phys_to_machine?



--

From: Ian Campbell
Date: Tuesday, January 4, 2011 - 9:59 am

I wonder if it would make sense to have a debugfs file which exports the
p2m, for the purposes of eye-balling it for this sort of issue?
comparing to the e820 etc...

Maybe the whole thing in raw form would be overkill but spitting out a
list of ranges of identity, invalid, normal pages, plus annotation
regarding whether those come from a p2m_mid_identity style page or a
leaf node, might be occasionally useful.

Ian.
-- 
Ian Campbell
Current Noise: Place of Skulls - The Watchers

Call toll free number before digging.

--

From: Ian Campbell
Date: Tuesday, January 4, 2011 - 10:20 am

I should have read patch 5/8 first...

--

From: Konrad Rzeszutek Wilk
Date: Tuesday, January 4, 2011 - 12:24 pm

Does not have too. This can work independently (it does not introduce

Yup. The tools have checks for either valid PFNs or 0xFFFFFFF, but nothing
else. They flip out if you stick other values in .. oh and sticking in
the same MFN (in case you were thinking of them pointing to a dummy page)



Yup. Keep in mind that this patchset explores the more "conservative" approach.

All 'void' entries are pointing to the p2m_*MISSING instead of p2m_*IDENTITY
pages.


I thought so, but for all the machines I've run this on, those boundaries

Not exactly. At this point (xen/setup, parsing the E820), the P2M has
been populated up to nr_pages. So, say we have 2G to the guest, that is
p2m[0, 1] point to two pages "created" by extend_brk. The contents of the
pages points to mfn_list[..]. The rest of the p2m (so, p2m[2]..p2m[511])
point to p2m_mid_missing.

When we start the identity mapping, we end up right (for example) from 3GB
to 4GB with identity mapping, so p2m[3] is now p2m_mid_identity. The p2m[4] and
later are still in p2m_mid_missing.

Thought I've just thought of nasty condition. Say those PCI regions start at 2.5GB,
and the user provided dom0_mem=max:3GB. Oh, and actually that still works. The
p2m[1] will be allocated by reserv_brk, and we just end up sticking from
p2m[1][473]..p2m[1][511] with the PFN | IDENTITY_FRAME_BIT, which is OK as
that contents used to have INVALID_P2M_IDENTITY (it would have been
created by extend_brk in xen_build_dynamic_phys_to_machine and filled

It can, but set_phys_to_machine would have to use reserved_brk.

The danger I am seeing is if the non-RAM regions (and gaps) in the E820 are
not 1GB (or 512MB under 32-bit) boundary aligned (2.36GB?)_and_ the dom0_mem is
employed as to make the top-level P2M right up to the non-boundary aligned E820
void (so, dom0_mem=max:2GB for example). That would create a proper p2m[0] and p2m[1],
but p2m[3] would point to p2m_mid_missing, and when we try to make
p2m[3][231] identity (so ~2.36GB area) we would ...
Previous thread: Build regressions/improvements in v2.6.37-rc8 by Geert Uytterhoeven on Thursday, December 30, 2010 - 12:42 pm. (1 message)

Next thread: [PATCH 5/8] xen/debug: Print out all pages in the P2M. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (1 message)