[RFC 0/3] KVM, HWPoison, unpoison address across rebooting

Previous thread: [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally by Huang Ying on Tuesday, December 21, 2010 - 7:51 pm. (4 messages)

Next thread: Re: Linux 2.6.37-rc7 by Sedat Dilek on Tuesday, December 21, 2010 - 8:56 pm. (1 message)
From: Huang Ying
Date: Tuesday, December 21, 2010 - 7:51 pm

Unpoison address across rebooting, to make it possible that a new
memory page can be allocated, so that guest system can successfully
reboot.

[RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
[RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison
[RFC 3/3] KVM, HWPoison, unpoison address across rebooting
--

From: Huang Ying
Date: Tuesday, December 21, 2010 - 7:51 pm

In HWPoison processing code, not only the struct page corresponding
the error physical memory page is marked as HWPoison, but also the
virtual address in processes mapping the error physical memory page is
marked as HWPoison.  So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.

If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS.  If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system.  But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated.  That is, guest system can not recover via rebooting.

In fact, across rebooting, the contents of guest physical memory page
need not to be kept.  We can allocate a new host physical page to
back the corresponding guest physical address.

To do that, a mechanism in KVM to "unpoison" poisoned virtual address
by clearing the corresponding PTE is provided.  So that, when doing
rebooting, QEMU can unpoison the poisoned virtual address, and when
the unpoisoned memory page is accessed, a new physical memory may be
allocated if possible.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 include/linux/kvm.h |    1 +
 include/linux/mm.h  |    8 ++++++++
 mm/memory-failure.c |   39 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c |   14 ++++++++++++++
 4 files changed, 62 insertions(+)

--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -676,6 +676,7 @@ struct kvm_clock_data {
 #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 /* Available with KVM_CAP_PPC_GET_PVINFO */
 #define KVM_PPC_GET_PVINFO	  _IOW(KVMIO,  0xa1, struct kvm_ppc_pvinfo)
+#define ...
From: Huang Ying
Date: Tuesday, December 21, 2010 - 7:51 pm

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped.  While
get_user_pages_hwpoison will check both.

In a following patch, we will introduce unpoison_address, which will
clear the poisoned page table entry to make it possible to allocate a
new memory page for the virtual address.  But it is also possible that
the underlying memory page is kept poisoned even after the
corresponding page table entry is cleared, that is, a new memory page
can not be allocated.  get_user_pages_hwpoison can catch these
situations.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 include/linux/mm.h  |    8 --------
 mm/memory-failure.c |   32 --------------------------------
 virt/kvm/kvm_main.c |    4 +++-
 3 files changed, 3 insertions(+), 41 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1512,14 +1512,6 @@ extern int sysctl_memory_failure_recover
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t mce_bad_pages;
 extern int soft_offline_page(struct page *page, int flags);
-#ifdef CONFIG_MEMORY_FAILURE
-int is_hwpoison_address(unsigned long addr);
-#else
-static inline int is_hwpoison_address(unsigned long addr)
-{
-	return 0;
-}
-#endif
 
 extern void dump_page(struct page *page);
 
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1433,35 +1433,3 @@ done:
 	/* keep elevated page count for bad page */
 	return ret;
 }
-
-/*
- * The caller must hold current->mm->mmap_sem in read mode.
- */
-int is_hwpoison_address(unsigned long addr)
-{
-	pgd_t *pgdp;
-	pud_t pud, *pudp;
-	pmd_t pmd, *pmdp;
-	pte_t pte, *ptep;
-	swp_entry_t entry;
-
-	pgdp = pgd_offset(current->mm, addr);
-	if (!pgd_present(*pgdp))
-		return 0;
-	pudp = pud_offset(pgdp, addr);
-	pud = *pudp;
-	if (!pud_present(pud) || pud_large(pud))
-		return 0;
-	pmdp = pmd_offset(pudp, addr);
-	pmd = *pmdp;
-	if (!pmd_present(pmd) || pmd_large(pmd))
-		return 0;
-	ptep = pte_offset_map(pmdp, ...
Previous thread: [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally by Huang Ying on Tuesday, December 21, 2010 - 7:51 pm. (4 messages)

Next thread: Re: Linux 2.6.37-rc7 by Sedat Dilek on Tuesday, December 21, 2010 - 8:56 pm. (1 message)