Re: [patch 6/6] Guest page hinting: s390 support.

Previous thread: [patch 5/6] Guest page hinting: minor fault optimization. by Martin Schwidefsky on Wednesday, March 12, 2008 - 6:21 am. (1 message)

Next thread: [PATCH] Arch: x86_32, fix fault_msg nul termination by Jiri Slaby on Wednesday, March 12, 2008 - 6:53 am. (9 messages)
From: Martin Schwidefsky
Date: Wednesday, March 12, 2008 - 6:21 am

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Hubertus Franke <frankeh@watson.ibm.com>
From: Himanshu Raj

s390 uses the milli-coded ESSA instruction to set the page state. The
page state is formed by four guest page states called block usage states
and three host page states called block content states.

The guest states are:
 - stable (S): there is essential content in the page
 - unused (U): there is no useful content and any access to the page will
   cause an addressing exception
 - volatile (V): there is useful content in the page. The host system is
   allowed to discard the content anytime, but has to deliver a discard
   fault with the absolute address of the page if the guest tries to
   access it.
 - potential volatile (P): the page has useful content. The host system
   is allowed to discard the content after it has checked the dirty bit
   of the page. It has to deliver a discard fault with the absolute
   address of the page if the guest tries to access it.

The host states are:
 - resident: the page is present in real memory.
 - preserved: the page is not present in real memory but the content is
   preserved elsewhere by the machine, e.g. on the paging device.
 - zero: the page is not present in real memory. The content of the page
   is logically-zero.

There are 12 combinations of guest and host state, currently only 8 are
valid page states:
 Sr: a stable, resident page.
 Sp: a stable, preserved page.
 Sz: a stable, logically zero page. A page filled with zeroes will be
     allocated on first access.
 Ur: an unused but resident page. The host could make it Uz anytime but
     it doesn't have to.
 Uz: an unused, logically zero page.
 Vr: a volatile, resident page. The guest can access it normally.
 Vz: a volatile, logically zero page. This is a discarded page. The host
     will deliver a discard fault for any access to the page.
 Pr: a potential volatile, resident page. The guest can access it normally.

The remaining 4 combinations can't ...
From: Jeremy Fitzhardinge
Date: Wednesday, March 12, 2008 - 9:19 am

I created the attached .dot graph based purely on this description.  It 
looks reasonable, but I didn't see how a page enters a Pr state.

    J
From: Martin Schwidefsky
Date: Wednesday, March 12, 2008 - 9:28 am

That is the first block of state transitions: {Ur,Sr,Vr,Pr}
You can go from any of the four states to any of the remaining three.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Jeremy Fitzhardinge
Date: Wednesday, March 12, 2008 - 9:44 am

You only mention page_set_{unused,stable,volatile}.  Is 
page_set_stable_if_present() the fourth.  And shouldn't that be 
"stable_if_clean":

     - potential volatile (P): the page has useful content. The host system
       is allowed to discard the content after it has checked the dirty bit
       of the page. It has to deliver a discard fault with the absolute
       address of the page if the guest tries to access it.
      

The use of "stable" in the function call and "volatile" in this 
description is a bit confusing.  My understanding is that a page in this 
state is either stable or volatile depending on whether its dirty, which 
makes sense, but it would be good to consistently refer to it in the 
same way.

Updated .dot attached.

    J
From: Martin Schwidefsky
Date: Wednesday, March 12, 2008 - 9:59 am

page_set_volatile has a "writable" argument. For writable==0 you get a
Vx page, for writable==1 you get a Px page.

With stable_if_clean you are refering to stable_if_present? If yes the
answer is that this operation is used to get a page from Vx/Px back to
Sx but only if the page has not been discarded. The operation will fail
if the page state is Vz/Pz. The dirty bit only matters for the hosts
decision to discard the page, these are the state transitions from Vr/Pr

Your understanding is good, but how can I make this less confusing? A Px
page that is dirty may not be discarded which makes it basically stable.
The guest state still is potential volatile though as it does not have a
state of Sx.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Jeremy Fitzhardinge
Date: Wednesday, March 12, 2008 - 10:48 am

Hm.  But a Vx page is writable isn't it?  It's just that its contents 
can go away at any time.  Or does the kernel treat Vx pages as strictly 
RO cached copies of other things?

It also seems to me that given you talking about "potentially volatile" 
as a distinct state, it would would be best to have a distinct 
state-setting function associated with it, so there's a 1:1 

No.  I misunderstood and thought that stable_if_present sets the Px 

So you mean it will change Vr/Pr to Sr but everything else will fail?  


Mainly, use identical terminology in code and description so they can be 
easily compared.  I found the diagram was quite helpful in understanding 
what's going on; feel free to include it in your documentation.

Updated .dot attached; I've updated it to include the page_set_volatile 
writable argument and the stable_if_present transitions; commented it, 
removed the self-edges which were cluttering things up.

Also, does a page go from Vz->Vr on guest memory write?  If so, does a 
clean page which goes from Pr->Vz->Vr lose its Px state in the process?

    J
From: Anthony Liguori
Date: Wednesday, March 12, 2008 - 1:04 pm

Well presumably Vp/Pr => Sp?  Is is true that from the guest's 
perspective, all of the 'p' states are identical to the 'r' states?

Do the host states even really need visibility to the guest at all?  It 
may be useful for the guest to be able to distinguish between Ur and Uz 
but it doesn't seem necessary.

BTW Jeremy, the .dot was very useful!

Regards,

Anthony Liguori
--

From: Jeremy Fitzhardinge
Date: Wednesday, March 12, 2008 - 1:45 pm

Vp should never happen, since you'd never preserve a V page.  And surely 
it would be Pr -> Sr, since the hypervisor wouldn't push the page to 

Well, you implicitly see the hypervisor state.  If you touch a [UV]z 
page then you get a fault telling you that the page has been taken away 
from you (I think).  And it would definitely help with debugging (seems 
likely there's lots of scope for race conditions if you prematurely tell 
Yes, there's no way I'd be able to get my head around this otherwise.  
BTW, here's an updated one with the host-driven events as dashed lines, 
and a couple of extra transitions I think should be in there (but 
waiting for Martin's confirmation).

    J
From: Anthony Liguori
Date: Wednesday, March 12, 2008 - 1:56 pm

You're right, I meant Vp/Pp but they are invalid states.  I think one of 
the things that keeps tripping me up is that the host can change both 
the host and guest page states.  My initial impression was that the host 

I was thinking that it may be useful to know a Ur verses a Uz when 
allocating memory.  In this case, you'd rather allocate Ur pages verses 
Uz to avoid the fault.  I don't read s390 arch code well, is the host 

Excellent!

Regards,


--

From: Jeremy Fitzhardinge
Date: Wednesday, March 12, 2008 - 2:36 pm

Yes.  And it seems to me that you get unfortunate outcomes if you have a 

Yes, reusing Ur pages might well be better, but who knows - they've 
probably got an instruction which makes Uz cheap...

Stuff like this suggets that both parts of the state are packed 
together, and are guest-visible:

+	return (state & ESSA_USTATE_MASK) == ESSA_USTATE_VOLATILE &&
+		(state & ESSA_CSTATE_MASK) == ESSA_CSTATE_ZERO;


      J
--

From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 2:45 am

Yes, faulting in a Uz page is cheap on s390. Isn't it a lovely

Yes, the return value of the ESSA instruction has both the guest state
and the host state.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Jeremy Fitzhardinge
Date: Thursday, March 13, 2008 - 9:07 am

Does that mean that Vz is effectively identical to Uz?

    J
--

From: Jeremy Fitzhardinge
Date: Thursday, March 13, 2008 - 9:17 am

Hm, on further thought:

If guests writes to Vz pages are disallowed, then the only way out of Vz 
is if the guest sets it to something else (Uz,Sz).  If so, what's the 
point of using that state?  Why not make:

    Vr -> Uz      host discard
    Pr -> Uz      host discard clean
    Sp -> Uz      set volatile
    Uz -> Uz      set volatile


But given how you've described V-state pages, I really would expect 
writes to a Vz to work, or alternatively, all writes to V-state pages to 
be disallowed.  Are there any real uses for a writable Vr page?

On the other hand, removing Vz->Vr does clean up the dot graph a lot...

    J
From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 9:55 am

Vz is the page discarded state. The difference to Uz is slim, both
states will cause a program check on access. Vz generates a discard
fault, Uz generates an addressing exception which is nice for debugging.
But I don't see a reason why an implementation that uses Uz instead of

You mean in the section that speaks about the guests states S/U/V/P ?
Always keep in mind that you can access a V/P page only until it gets
discarded. Then the useful content of the page frame is lost and any
read of write to the not Vz page will be answered with a discard fault.

A Vr page is read-only. If a page gets mapped for writing it needs to
get into the Pr state. This is the hint for the host to look at the
dirty bit before it discards a page.
So yes, there is no use for a writable Vr page.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Jeremy Fitzhardinge
Date: Thursday, March 13, 2008 - 10:05 am

How do you handle these different cases in Linux?  Do you use Vr pages 
in the pagecache, and then shoot down the pagecache entry if the host 
steals the page?

The Uz access exception presumably just generates a normal oops.



OK, thanks, that clears things up.  I was assuming that Vr was 
technically writable but that writes could be discarded at any time (ie, 
allowing guests to merrily shoot themselves in the foot ;).  Making it 
forced RO is much more sensible.

    J
--

From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 10:23 am

The environment where we currently run all this is z/VM as the host and
Linux as the guest. We have two page tables on s390, a host page table
and a guest page table. If the host discards a page it simple removes
the entry for the page in the host page table. If the guest comes along
and accesses the page the host gets the fault and generates the

Yes, the handler for an addressing exception will call die() for a


Well, technically you could write to a Vr page via the kernel address
space. The thing is that the host can just discard the page although it
is dirty. The Vr state is used for page cache pages which do not have
any writable mapping.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 2:42 am

In principle only the guest changes the guest state and only the host
changes the host state. The simplified state diagram shows exceptions

This is the second optimization you might want to think about. The other
is to avoid the page clearing for Uz.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 2:36 am

Vp does not happen in the current implementation. But it actually may be
useful. z/VM has multiple layers of paging, the first goes to expanded
storage which is very fast. If you make the page Vz and the guests needs
it you have to do a standard Linux I/O to get retrieve the page. This

You get an addressing exception if you touch a Uz page. This indicates a
BUG in the Linux code because this is a use after free. If the guests
touches a Vz page you get a discard fault.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Martin Schwidefsky
Date: Thursday, March 13, 2008 - 2:32 am

In the extended version Vp/Pp to Sr as well but the current z/VM code


It is very useful for debugging to have the host state in the guest as
well. There is one possible optimization: if the guests finds a Uz page
in the free list, it can make it Sz and doesn't have to clear it because
the host will provide an already empty page (not yet implemented

I've search on my disk and found the state diagrams we've used for the
OLS paper. You may find these useful as well.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.

Previous thread: [patch 5/6] Guest page hinting: minor fault optimization. by Martin Schwidefsky on Wednesday, March 12, 2008 - 6:21 am. (1 message)

Next thread: [PATCH] Arch: x86_32, fix fault_msg nul termination by Jiri Slaby on Wednesday, March 12, 2008 - 6:53 am. (9 messages)