I have observed the following bug trigger:
1. userspace calls GET_DIRTY_LOG
2. kvm_mmu_slot_remove_write_access is called and makes a page ro
3. page fault happens and makes the page writeable
fault is logged in the bitmap appropriately
4. kvm_vm_ioctl_get_dirty_log swaps slot pointers
a lot of time passes
5. guest writes into the page
6. userspace calls GET_DIRTY_LOG
At point (5), bitmap is clean and page is writeable,
thus, guest modification of memory is not logged
and GET_DIRTY_LOG returns an empty bitmap.
The rule is that all pages are either dirty in the current bitmap,
or write-protected, which is violated here.
It seems that just moving kvm_mmu_slot_remove_write_access down
to after the slot pointer swap should fix this bug.
Warning: completely untested.
Please comment.
Note: fix will be needed for -stable etc.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/kvm/x86.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a09c62..4ca1d7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2912,10 +2912,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;
- spin_lock(&kvm->mmu_lock);
- kvm_mmu_slot_remove_write_access(kvm, log->slot);
- spin_unlock(&kvm->mmu_lock);
-
r = -ENOMEM;
dirty_bitmap = vmalloc(n);
if (!dirty_bitmap)
@@ -2937,6 +2933,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
kfree(old_slots);
+ spin_lock(&kvm->mmu_lock);
+ kvm_mmu_slot_remove_write_access(kvm, log->slot);
+ spin_unlock(&kvm->mmu_lock);
+
r = -EFAULT;
if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
vfree(dirty_bitmap);
--
1.7.3-rc1
--
On Mon, 25 Oct 2010 03:21:24 +0200 This may be the reason why my commit is a corruption magnifier. My patch moved the vmalloc() right after kvm_mmu_slot_remove_write_access() and made this chance bigger: because vmalloc() takes some time. Thanks, --
Cool, seems to be the key to the corruptions I've seen. Applying your Assuming that a page cannot be write-enabled without having a dirty entry in the old bitmap and due to the fact that user space won't get hold of that old bitmap to read out the page before we reset write access again, your patch should actually be safe. If no one else sees some remaining race, let's get this applied upstream ASAP and pushed down to the stable trees. Thanks, -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux --
Yes, works for me as well. -- MST --
I did some tests on my laptop: - kvm.git + mst's patch - qemu.git (upstream qemu) and still got graphics curruption. Corruption: On usual Desktop environment, I opened two terminals on different workspaces. Then as Jan did, I did "find /" loop on both of them. During these heavy updates, I tried to switch between these workspaces some times. Then, at some turn, some part of old workspace's images like terminals and mouse cursor remained in the new workspace. I could refresh these by moving mouse over the problematic parts. But without doing so, the images remained still. Refresh is not working on virtual machines as I expect? Thanks, Takuya --
Excellent catch, I stared at this code for a while and didn't see the bug. Patch applied. -- error compiling committee.c: too many arguments to function --
BTW, while this was an annoying one for graphic emulation, wasn't it potentially lethal for live migration? The issue looks like is was introduced with the switch to SRCU, so every kernel since 2.6.34 should be affected, correct? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux --
Deadly. Yes autofs passed it happily. Yes. -- error compiling committee.c: too many arguments to function --
Er, autotest. I'm just having problems my 2.6.36+ autofs setup. -- error compiling committee.c: too many arguments to function --
This patch was marked KVM-stable on commit, but it did not make into any stable branch thus also none of the recent releases. Please fix (for 2.6.36 now). Thanks, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux --
We need some bot that watches out for commits with the tag going into upstream and pokes the maintainers. -- error compiling committee.c: too many arguments to function --
