[RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps

Previous thread: [2.6.32] task hung by Folkert van Heusden on Tuesday, May 4, 2010 - 5:49 am. (2 messages)

Next thread: [PATCH] KEYS: Fix RCU handling in key_gc_keyring() by David Howells on Tuesday, May 4, 2010 - 6:16 am. (5 messages)
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 5:56 am

Hi, sorry for sending from my personal account.
The following series are all from me:

  From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

  The 3rd version of "moving dirty bitmaps to user space".

From this version, we add x86 and ppc and asm-generic people to CC lists.


[To KVM people]

Sorry for being late to reply your comments.

Avi,
 - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .

 - I've considered to change the set_bit_user_non_atomic to an inline function,
   but did not change because the other helpers in the uaccess.h are written as
   macros. Anyway, I hope that x86 people will give us appropriate suggestions
   about this.

 - I thought that documenting about making bitmaps 64-bit aligned will be
   written when we add an API to register user-allocated bitmaps. So probably
   in the next series.

Avi, Alex,
 - Could you check the ia64 and ppc parts, please? I tried to keep the logical
   changes as small as possible.

   I personally tried to build these with cross compilers. For ia64, I could check
   build success with my patch series. But book3s, even without my patch series,
   it failed with the following errors:

  arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single':
  arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes
  make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
  make: *** [arch/powerpc/kvm] Error 2


About changelog: there are two main changes from the 2nd version:
  1. I changed the treatment of clean slots (see patch 1/12).
     This was already applied today, thanks!
  2. I changed the switch API. (see patch 11/12).

To show this API's advantage, I also did a test (see the end of this mail).


[To x86 people]

Hi, Thomas, Ingo, Peter,

Please review the patches 4,5/12. Because this is the first experience for
me to send patches to x86, please tell me if this lacks anything.


[To ...
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 5:58 am

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. Furthermore, we can avoid the
tlb flush triggered by vmalloc() and get some good effects. In my test,
the improved ioctl was about 4 to 10 times faster than the original one
for clean slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/x86.c |   37 +++++++++++++++++++++++--------------
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..b95a211 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	struct kvm_memory_slot *memslot;
 	unsigned long n;
 	unsigned long is_dirty = 0;
-	unsigned long *dirty_bitmap = NULL;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 
-	r = -ENOMEM;
-	dirty_bitmap = vmalloc(n);
-	if (!dirty_bitmap)
-		goto out;
-	memset(dirty_bitmap, 0, n);
-
 	for (i = 0; !is_dirty && i < n/sizeof(long); i++)
 		is_dirty = memslot->dirty_bitmap[i];
 
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (is_dirty) {
 		struct kvm_memslots *slots, *old_slots;
+		unsigned long *dirty_bitmap;
 
 		spin_lock(&kvm->mmu_lock);
 		kvm_mmu_slot_remove_write_access(kvm, log->slot);
 		spin_unlock(&kvm->mmu_lock);
 
-		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-		if (!slots)
-			goto out_free;
+		r = -ENOMEM;
+		dirty_bitmap = vmalloc(n);
+		if (!dirty_bitmap)
+			goto out;
+		memset(dirty_bitmap, 0, n);
 
+		r = -ENOMEM;
+		slots = kzalloc(sizeof(struct kvm_memslots), ...
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:01 am

We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later.
This patch makes it easy and cleanup the code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
 virt/kvm/kvm_main.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7ab6312..3e3acad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -435,6 +435,12 @@ out_err_nodisable:
 	return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	vfree(memslot->dirty_bitmap);
+	memslot->dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 		vfree(free->rmap);
 
 	if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-		vfree(free->dirty_bitmap);
+		kvm_destroy_dirty_bitmap(free);
 
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 	}
 
 	free->npages = 0;
-	free->dirty_bitmap = NULL;
 	free->rmap = NULL;
 }
 
@@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+	unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+	memslot->dirty_bitmap = vmalloc(dirty_bytes);
+	if (!memslot->dirty_bitmap)
+		return -ENOMEM;
+
+	memset(memslot->dirty_bitmap, 0, dirty_bytes);
+	return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -653,12 +670,8 @@ skip_lpage:
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new);
-
-		new.dirty_bitmap = ...
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:02 am

During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit x86 and ppc: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with existing generic copy user
helpers.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/uaccess_32.h |    2 ++
 arch/x86/lib/usercopy_32.c        |   26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 088d09f..85d396d 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache
 		(void *to, const void __user *from, unsigned long n);
 unsigned long __must_check __copy_from_user_ll_nocache_nozero
 		(void *to, const void __user *from, unsigned long n);
+unsigned long __must_check copy_in_user
+		(void __user *to, const void __user *from, unsigned n);
 
 /**
  * __copy_to_user_inatomic: - Copy a block of data into user space, with less checking.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e218d5d..e90ffc3 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -889,3 +889,29 @@ void copy_from_user_overflow(void)
 	WARN(1, "Buffer overflow detected!\n");
 }
 EXPORT_SYMBOL(copy_from_user_overflow);
+
+/**
+ * copy_in_user: - Copy a block of data from user space to user space.
+ * @to:   Destination address, in user space.
+ * @from: Source address, in user space.
+ * @n:    Number of bytes to copy.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Copy data from user space to user space.
+ *
+ * ...
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:02 am

During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a macro set_bit_user_non_atomic() following the implementation
style of x86's uaccess functions.

Note: there is a one restriction to this macro: bitmaps must be 64-bit
aligned (see the comment in this patch).

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/uaccess.h |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..3138e65 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -98,6 +98,45 @@ struct exception_table_entry {
 
 extern int fixup_exception(struct pt_regs *regs);
 
+/**
+ * set_bit_user_non_atomic: - set a bit of a bitmap in user space.
+ * @nr:   Bit offset.
+ * @addr: Base address of a bitmap in user space.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * This macro set a bit of a bitmap in user space.
+ *
+ * Restriction: the ...
From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:03 am

During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit ppc and x86: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with __copy_tofrom_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Alexander Graf <agraf@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/uaccess.h |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index bd0fb84..3a01ce8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to,
 	return n;
 }
 
+static inline unsigned long copy_in_user(void __user *to,
+		const void __user *from, unsigned long n)
+{
+	unsigned long over;
+
+	if (likely(access_ok(VERIFY_READ, from, n) &&
+	    access_ok(VERIFY_WRITE, to, n)))
+		return __copy_tofrom_user(to, from, n);
+	if (((unsigned long)from < TASK_SIZE) ||
+	    ((unsigned long)to < TASK_SIZE)) {
+		over = max((unsigned long)from, (unsigned long)to)
+			+ n - TASK_SIZE;
+		return __copy_tofrom_user(to, from, n - over) + over;
+	}
+	return n;
+}
+
 #else /* __powerpc64__ */
 
 #define __copy_in_user(to, from, size) \
-- 
1.7.0.4

--

From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:04 am

During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a function set_bit_user_non_atomic().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Alexander Graf <agraf@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/uaccess.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 3a01ce8..f878326 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -321,6 +321,25 @@ do {								\
 	__gu_err;						\
 })
 
+static inline int set_bit_user_non_atomic(int nr, void __user *addr)
+{
+	u8 __user *p;
+	u8 val;
+
+	p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
+	if (!access_ok(VERIFY_WRITE, p, 1))
+		return -EFAULT;
+
+	if (__get_user(val, p))
+		return -EFAULT;
+
+	val |= 1U << (nr % BITS_PER_BYTE);
+	if (__put_user(val, p))
+		return -EFAULT;
+
+	return 0;
+}
+
 
 /* more complex routines */
 
-- 
1.7.0.4

--

From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:07 am

We move dirty bitmaps to user space.

 - Allocation and destruction: we use do_mmap() and do_munmap().
   The new bitmap space is twice longer than the original one and we
   use the additional space for double buffering: this makes it
   possible to update the active bitmap while letting the user space
   read the other one safely. For x86, we can also remove the vmalloc()
   in kvm_vm_ioctl_get_dirty_log().

 - Bitmap manipulations: we replace all functions which access dirty
   bitmaps with *_user() functions.

 - For ia64: moving the dirty bitmaps of memory slots does not affect
   ia64 much because it's using a different place to store dirty logs
   rather than the dirty bitmaps of memory slots: all we have to change
   are sync and get of dirty log, so we don't need set_bit_user like
   functions for ia64.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
CC: Alexander Graf <agraf@suse.de>
---
 arch/ia64/kvm/kvm-ia64.c  |   15 +++++++++-
 arch/powerpc/kvm/book3s.c |    5 +++-
 arch/x86/kvm/x86.c        |   25 ++++++++----------
 include/linux/kvm_host.h  |    3 +-
 virt/kvm/kvm_main.c       |   62 +++++++++++++++++++++++++++++++++++++-------
 5 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 17fd65c..03503e6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
 	n = kvm_dirty_bitmap_bytes(memslot);
 	base = memslot->base_gfn / BITS_PER_LONG;
 
+	r = -EFAULT;
+	if (!access_ok(VERIFY_WRITE, memslot->dirty_bitmap, n))
+		goto out;
+
 	for (i = 0; i < n/sizeof(long); ++i) {
 		if (dirty_bitmap[base + i])
 			memslot->is_dirty = true;
 
-		memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
+		if (__put_user(dirty_bitmap[base + i],
+			       &memslot->dirty_bitmap[i])) {
+			r = ...
From: Takuya Yoshikawa
Date: Tuesday, May 11, 2010 - 11:27 pm

Oh, it's a serious problem. I have to consider it.


Thanks,
   Takuya
--

From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:08 am

Now that dirty bitmaps are accessible from user space, we export the
addresses of these to achieve zero-copy dirty log check:

  KVM_GET_USER_DIRTY_LOG_ADDR

We also need an API for triggering dirty bitmap switch to take the full
advantage of double buffered bitmaps.

  KVM_SWITCH_DIRTY_LOG

See the documentation in this patch for precise explanations.

About performance improvement: the most important feature of switch API
is the lightness. In our test, this appeared in the form of improved
responses for GUI manipulations.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
CC: Avi Kivity <avi@redhat.com>
CC: Alexander Graf <agraf@suse.de>
---
 Documentation/kvm/api.txt |   87 +++++++++++++++++++++++++++++++++++++++++++++
 arch/ia64/kvm/kvm-ia64.c  |   27 +++++++++-----
 arch/powerpc/kvm/book3s.c |   18 +++++++--
 arch/x86/kvm/x86.c        |   44 ++++++++++++++++-------
 include/linux/kvm.h       |   11 ++++++
 include/linux/kvm_host.h  |    4 ++-
 virt/kvm/kvm_main.c       |   63 +++++++++++++++++++++++++++++----
 7 files changed, 220 insertions(+), 34 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index a237518..c106c83 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -892,6 +892,93 @@ arguments.
 This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 irqchip, the multiprocessing state must be maintained by userspace.
 
+4.39 KVM_GET_USER_DIRTY_LOG_ADDR
+
+Capability: KVM_CAP_USER_DIRTY_LOG (>=1 see below)
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_user_dirty_log (in/out)
+Returns: 0 on success, -1 on error
+
+This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead
+of the old dirty log manipulation by KVM_GET_DIRTY_LOG.
+
+A bit about KVM_CAP_USER_DIRTY_LOG
+
+Before this ioctl was introduced, dirty bitmaps for dirty page logging were
+allocated in the kernel's ...
From: Marcelo Tosatti
Date: Tuesday, May 11, 2010 - 7:07 am

If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the
current set_memory_region (if its not freed already), after pointing the
memslot to the user supplied one, it should be fine?

--

From: Takuya Yoshikawa
Date: Tuesday, May 4, 2010 - 6:11 am

We use new API for light dirty log access if KVM supports it.

This conflicts with Marcelo's patches. So please take this as a sample patch.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 kvm/include/linux/kvm.h |   11 ++++++
 qemu-kvm.c              |   81 ++++++++++++++++++++++++++++++++++++++++++-----
 qemu-kvm.h              |    1 +
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index 6485981..efd9538 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -317,6 +317,14 @@ struct kvm_dirty_log {
 	};
 };
 
+/* for KVM_GET_USER_DIRTY_LOG_ADDR */
+struct kvm_user_dirty_log {
+	__u32 slot;
+	__u32 flags;
+	__u64 dirty_bitmap;
+	__u64 dirty_bitmap_old;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
 	__u32 len;
@@ -499,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_PPC_SEGSTATE 43
 
 #define KVM_CAP_PCI_SEGMENT 47
+#define KVM_CAP_USER_DIRTY_LOG 55
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -595,6 +604,8 @@ struct kvm_clock_data {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO,  0x49, struct kvm_user_dirty_log)
+#define KVM_SWITCH_DIRTY_LOG      _IO(KVMIO,   0x4a)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP        _IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE              _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..98777f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -143,6 +143,8 @@ struct slot_info {
     unsigned long userspace_addr;
     unsigned flags;
     int logging_count;
+    unsigned long *dirty_bitmap;
+    unsigned long *dirty_bitmap_old;
 };
 
 struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
@@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr,
     return 1;
 }
 ...
From: Arnd Bergmann
Date: Tuesday, May 4, 2010 - 8:03 am

Does this work correctly if your user space is 32 bits (i.e. unsigned long
is different size in user space and kernel) in both big- and little-endian
systems?

I'm not sure about all the details, but I think you cannot in general share
bitmaps between user space and kernel because of this.

	Arnd
--

From: Avi Kivity
Date: Tuesday, May 4, 2010 - 9:08 am

That's why the bitmaps are defined as little endian u64 aligned, even on 
big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
and u64 alignment ensures we can use long-sized bitops on mixed size 
systems.

-- 
error compiling committee.c: too many arguments to function

--

From: Arnd Bergmann
Date: Thursday, May 6, 2010 - 6:38 am

I'm not sure I understand how this macro is going to be used though.
If you are just using this in kernel space, that's fine, please go for
it.

However, if the intention is to use the same macro in user space, putting
it into asm-generic/bitops/* is not going to help, because those headers
are not available in user space, and I wouldn't want to change that.

The definition of the macro is not part of the ABI, so just duplicate
it in KVM if you need it there.

	Arnd
--

From: Takuya Yoshikawa
Date: Monday, May 10, 2010 - 4:46 am

Yes, I'm just using in kernel space: qemu has its own endian related helpers.

So if you allow us to place this macro in asm-generic/bitops/* it will help us.


--

From: Avi Kivity
Date: Monday, May 10, 2010 - 5:01 am

I really prefer anything that is generic to be outside kvm, even if kvm 
is the only user.

-- 
error compiling committee.c: too many arguments to function

--

From: Arnd Bergmann
Date: Monday, May 10, 2010 - 5:01 am

No problem at all then. Thanks for the explanation.

Acked-by: Arnd Bergmann <arnd@arndb.de>
--

From: Takuya Yoshikawa
Date: Monday, May 10, 2010 - 5:09 am

Thanks you both. I will add your Acked-by from now on!

   Takuya
--

From: Avi Kivity
Date: Monday, May 10, 2010 - 5:06 am

100 ns... this is a bit on the low side (and if you can measure it 

No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is 
problematic.  Have you tried profiling to see where the time is spent 

Can you post such a test, for an idle large guest?

-- 
error compiling committee.c: too many arguments to function

--

From: Takuya Yoshikawa
Date: Monday, May 10, 2010 - 5:26 am

Sorry but no, and I agree with your guess.
Anyway, I want to do some profiling to confirm this guess.


BTW, If we only think about performance improvement of time, optimized
get(get.opt) may be enough at this stage.

But if we consider the future expansion like using user allocated bitmaps,
new API's introduced for switch.opt won't become waste, I think, because we


--

From: Takuya Yoshikawa
Date: Tuesday, May 11, 2010 - 3:11 am

Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*****************************************
     get.org     get.opt    switch.opt

     1060875     310292     190335
     1076754     301295     188600
      655504     318284     196029
      529769     301471        325
      694796      70216     221172
      651868     353073     196184
      543339     312865     213236
     1061938      72785     203090
      689527     323901     249519
      621364     323881        473
     1063671      70703     192958
      915903     336318     174008
     1046462     332384        782
     1037942      72783     190655
      680122     318305     243544
      688156     314935     193526
      558658     265934     190550
      652454     372135     196270
      660140      68613        352
     1101947     378642     186575
         ...        ...        ...
*****************************************

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.


--

From: Avi Kivity
Date: Thursday, May 13, 2010 - 4:47 am

User allocated bitmaps have the advantage of reducing pinned memory.  
However we have plenty more pinned memory allocated in memory slots, so 
by itself, user allocated bitmaps don't justify this change.

Perhaps if we optimize memory slot write protection (I have some ideas 
about this) we can make the performance improvement more pronounced.

-- 
error compiling committee.c: too many arguments to function

--

From: Takuya Yoshikawa
Date: Monday, May 17, 2010 - 2:06 am

In that sense, what do you think about the question I sent last week?

=== REPOST 1 ===
 >>
 >> mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
 >> Must find a way to move it outside of the spinlock section.
 >>
 >
 > Oh, it's a serious problem. I have to consider it.

Avi, Marcelo,

Sorry but I have to say that mmu_lock spin_lock problem was completely out of
my mind. Although I looked through the code, it seems not easy to move the
set_bit_user to outside of spinlock section without breaking the semantics of
its protection.

So this may take some time to solve.

But personally, I want to do something for x86's "vmallc() every time" problem
even though moving dirty bitmaps to user space cannot be achieved soon.

In that sense, do you mind if we do double buffering without moving dirty bitmaps to
user space?

I know that the resource for vmalloc() is precious for x86 but even now, at the timing
of get_dirty_log, we use the same amount of memory as double buffering.

It's really nice!

Even now we can measure the performance improvement by introducing switch ioctl
when guest is relatively idle, so the combination will be really effective!

=== REPOST 2 ===
 >>
 >> Can you post such a test, for an idle large guest?
 >
 > OK, I'll do!


Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*****************************************
     get.org     get.opt    switch.opt

     1060875     310292     190335
     1076754     301295     188600
      655504     318284     196029
      529769     301471        325
      694796      70216     221172
      651868     353073     196184
      543339     312865     213236
     1061938      72785     203090
      689527     323901     249519
      621364     323881        473
     1063671      70703     192958
      915903     336318     174008
     1046462     332384        782
     1037942      72783     190655
      680122  ...
From: Alexander Graf
Date: Tuesday, May 11, 2010 - 8:55 am

This is bad. I haven't encountered that one at all so far, but I guess

Could you please point me to a git tree where everything's readily
applied? That would make testing a lot easier.

Alex

--

Previous thread: [2.6.32] task hung by Folkert van Heusden on Tuesday, May 4, 2010 - 5:49 am. (2 messages)

Next thread: [PATCH] KEYS: Fix RCU handling in key_gc_keyring() by David Howells on Tuesday, May 4, 2010 - 6:16 am. (5 messages)