[PATCH 3/3] KVM: Convert x86 vcpu ioctls to use dispatch_ioctl_extensible()

Previous thread: [PATCH] hwmon: applesmc: Lighter wait mechanism, drastic improvement by Henrik Rydberg on Saturday, September 27, 2008 - 8:28 am. (1 message)

Next thread: Re: [PATCH] x86: TSC resync by Ingo Molnar on Saturday, September 27, 2008 - 9:51 am. (1 message)
From: Avi Kivity
Date: Saturday, September 27, 2008 - 8:43 am

While ioctls are officially ugly, they are the best choice for some use
cases, not to mention compatibility issues.  Currently ioctl writers face
the following hurdles:

- if the ioctl uses a data buffer, the ioctl handler must allocate
  kernel memory for this buffer
  - the memory may be allocated on the heap or on the stack, depending on the
    buffer size
- handle any errors from the operation
- copy the data from userspace, if necessary
- handle any errors from the operation
- actually perform the operation
- copy the data back to userspace, if necessary
- handle any errors from the operation
- free the buffer, if allocated from the heap

The first patch automates these operations, only requiring the caller to
supply the ioctl number and a callback in a table.

The second patch addresses another problem with ioctls: they are brittle.
Once written, an ioctl cannot be extended, since the buffer sizes used for
transferring data are encoded in the ioctl number.  This is addressed by
allowing the user-supplied size and the kernel-visible size of the data
buffer to be different; the kernel will zero fill or truncate appropriately.
With the new mechanism, it is easy to write forward- and backward- compatible
ioctl handlers.

The third patch demonstrates the effectiveness of the first patch; it
converts some of kvm's ioctl handlers to the new mechanism, removing
around 90 lines in the process.

Comments welcome.

Avi Kivity (3):
  ioctl: generic ioctl dispatcher
  ioctl: extensible ioctl dispatch
  KVM: Convert x86 vcpu ioctls to use dispatch_ioctl_extensible()

 arch/x86/kvm/x86.c    |  241 ++++++++++++++++---------------------------------
 fs/ioctl.c            |  139 ++++++++++++++++++++++++++++
 include/linux/ioctl.h |   37 ++++++++
 3 files changed, 253 insertions(+), 164 deletions(-)

--

From: Avi Kivity
Date: Saturday, September 27, 2008 - 8:44 am

ioctls (as traditionally implemented) are very brittle; one cannot add
a field to the argument structure without breaking the ABI.  In many
cases, this is a good thing as the ABI was not designed for extension.

In other cases, however, it makes sense to design an ABI with extension
in mind.  This patch provides a new ioctl_dispatch_extensible(), which
is designed to cope with evolving interfaces:

- the ioctl number is matched only by its number and type, allowing
  size and direction to change
- buffer space is allocated for the size the kernel expects, even if
  the user passed a smaller buffer
- copying is limited by the size the kernel expects, even if the user
  passed a larger buffer
- buffer size mismatches are handled by zeroing

The ABI can account for new fields either by recognizing zero as a "missing"
value, or by implementing bitflags for feature availability.  Both an old
kernel with new userspace and a new kernel with old userspace are accomodated.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 fs/ioctl.c            |   87 +++++++++++++++++++++++++++++++++++++++++-------
 include/linux/ioctl.h |    4 ++
 2 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index f63d5ce..5a354fc 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -225,12 +225,13 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
  * Locates and calls ioctl handler in @handlers; if none exist, calls
  * @fallback; if that does not exist, return -ENOTTY.
  */
-long dispatch_ioctl(struct inode *inode, struct file *filp,
-		    unsigned cmd, unsigned long arg,
-		    const struct ioctl_handler *handlers,
-		    long (*fallback)(const struct ioctl_arg *arg))
+static long __dispatch_ioctl(struct inode *inode, struct file *filp,
+			     unsigned cmd, unsigned long arg,
+			     const struct ioctl_handler *handlers,
+			     long (*fallback)(const struct ioctl_arg *arg),
+			     unsigned cmd_mask)
 {
-	unsigned dir, ...
From: Avi Kivity
Date: Saturday, September 27, 2008 - 8:44 am

ioctl handler writers currently have the following tasks:

- allocate a buffer for the data, either in memory or on the heap, depending
  on the buffer size
- handle errors
- copy the data from userspace
- handle errors
- actually perform the intended operation
- copy data back to userspace
- handle errors
- free the buffer, in case it was allocated on the heap

This patch provides a dispatch_ioctl() helper, which will do all of these
things for the caller, except for performing the intended operation.  The
caller need only provide a list of ioctl commands and handler functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 fs/ioctl.c            |   78 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ioctl.h |   33 ++++++++++++++++++++
 2 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 7db32b3..f63d5ce 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -211,3 +211,81 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
  out:
 	return error;
 }
+
+/**
+ * dispatch_ioctl - dispatch to an ioctl handler based on ioctl number
+ * @inode:	inode to invoke ioctl method on
+ * @filp:	open file to invoke ioctl method on
+ * @cmd:	ioctl command to execute
+ * @arg:	command-specific argument for ioctl
+ * @handlers:	list of potential handlers for @cmd; null terminated;
+ *              place frequently used cmds first
+ * @fallback:   optional function to call if @cmd not found in @handlers
+ *
+ * Locates and calls ioctl handler in @handlers; if none exist, calls
+ * @fallback; if that does not exist, return -ENOTTY.
+ */
+long dispatch_ioctl(struct inode *inode, struct file *filp,
+		    unsigned cmd, unsigned long arg,
+		    const struct ioctl_handler *handlers,
+		    long (*fallback)(const struct ioctl_arg *arg))
+{
+	unsigned dir, size;
+	long ret;
+	long stackbuf[IOCTL_STACK_BUF / sizeof(long)];
+	struct ioctl_arg iarg;
+	long (*handler)(const struct ioctl_arg *arg) = ...
From: Andi Kleen
Date: Monday, September 29, 2008 - 10:16 am

The basic idea is good, but i don't like the proliferation of callbacks,
which tends to make complicated code and is ugly for simple code
(which a lot of ioctls are)

How about you make it return an number that can index a switch() instead?
Then everything could be still kept in the same function.

-Andi
--

From: Avi Kivity
Date: Tuesday, September 30, 2008 - 2:08 am

If the simple calls mostly don't use the argument as a pointer, they are 
better off using a plain switch.  For my own code, I usually leave the 
boilerplate within the switch and the app-specific code in a separate 
function anyway, so there's no big change in style.

The main motivation here was the extensibility (patch 2), which becomes 

We need to execute code both before and after the handler, so it would 
look pretty ugly:

long my_ioctl_handler(...)
{
    struct ioctl_arg iarg;
    ...
    long ret;

    ret = dispatch_ioctl_begin(&iarg, ...);
    if (ret < 0)
        return ret;
    switch (ret) {
          case _IOC_KEY(MY_IOCTL):
               // your stuff goes here
               break;
          ...
    }
    dispatch_ioctl_end(&iarg, ret);
    return ret;
}

The only clean way to do this without callbacks is with 
constructors/destructors, but we don't have those in C.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Avi Kivity
Date: Saturday, September 27, 2008 - 8:44 am

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/x86.c |  241 +++++++++++++++++-----------------------------------
 1 files changed, 77 insertions(+), 164 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cfdd1b..aa9d228 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1084,26 +1084,24 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
  *
  * @return number of msrs set successfully.
  */
-static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
+static int msr_io(const struct ioctl_arg *iarg,
 		  int (*do_msr)(struct kvm_vcpu *vcpu,
-				unsigned index, u64 *data),
-		  int writeback)
+				unsigned index, u64 *data))
 {
-	struct kvm_msrs msrs;
+	struct kvm_vcpu *vcpu = iarg->file->private_data;
+	struct kvm_msrs *msrs = iarg->argp;
+	struct kvm_msrs __user *user_msrs =
+		(void __user *)iarg->argl + _IOC_SIZE(iarg->cmd);
 	struct kvm_msr_entry *entries;
-	int r, n;
+	int r;
 	unsigned size;
 
-	r = -EFAULT;
-	if (copy_from_user(&msrs, user_msrs, sizeof msrs))
-		goto out;
-
 	r = -E2BIG;
-	if (msrs.nmsrs >= MAX_IO_MSRS)
+	if (msrs->nmsrs >= MAX_IO_MSRS)
 		goto out;
 
 	r = -ENOMEM;
-	size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
+	size = sizeof(struct kvm_msr_entry) * msrs->nmsrs;
 	entries = vmalloc(size);
 	if (!entries)
 		goto out;
@@ -1112,22 +1110,26 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
 	if (copy_from_user(entries, user_msrs->entries, size))
 		goto out_free;
 
-	r = n = __msr_io(vcpu, &msrs, entries, do_msr);
+	r = __msr_io(vcpu, msrs, entries, do_msr);
 	if (r < 0)
 		goto out_free;
 
-	r = -EFAULT;
-	if (writeback && copy_to_user(user_msrs->entries, entries, size))
-		goto out_free;
-
-	r = n;
-
 out_free:
 	vfree(entries);
 out:
 	return r;
 }
 
+static long kvm_vcpu_ioctl_get_msrs(const struct ioctl_arg *iarg)
+{
+	return msr_io(iarg, kvm_get_msr);
+}
+
+static long kvm_vcpu_ioctl_set_msrs(const ...
From: Arjan van de Ven
Date: Saturday, September 27, 2008 - 9:13 am

On Sat, 27 Sep 2008 18:43:59 +0300

this doesn't seem to be much different from the way the DRM code deals
with ioctls. Or the V4L code.
Personally I hate that code though.....

There is a fine balance here; between driver writers screwing something
up they shouldn't be doing in the first place and us being able to
clearly see what the code is doing; your patch kinda hides some key
elements of the ioctl path... I'm afraid it gives a false sense of
security though. Not having to deal with one aspect of security just to
have to deal with the rest....
Lets put it this way: if the driver author has to type "copy_from_user"
there's a chance that he'll remember that the data comes from the user
and isn't to be trusted on face value.
--

From: Avi Kivity
Date: Saturday, September 27, 2008 - 10:40 am

Which key elements?

It hides the big switch (ioctl), memory allocation, and error handling, 
and exposes the actual ioctl-specific code, which I thought was the key 
element.



I'll rename the argp variable to argp_user_supplied.

I can't believe you think writing the copy code from scratch (or worse, 
copy/paste) each time helps security.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

Previous thread: [PATCH] hwmon: applesmc: Lighter wait mechanism, drastic improvement by Henrik Rydberg on Saturday, September 27, 2008 - 8:28 am. (1 message)

Next thread: Re: [PATCH] x86: TSC resync by Ingo Molnar on Saturday, September 27, 2008 - 9:51 am. (1 message)