Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3

Previous thread: [GIT PULL] ocfs2 fixes for 2.6.34-rc by Joel Becker on Tuesday, May 4, 2010 - 3:57 pm. (1 message)

Next thread: Re: [patch 7/8] Add a bootparameter to reserve high linear address space. by Jeremy Fitzhardinge on Tuesday, May 4, 2010 - 4:37 pm. (1 message)
From: Pankaj Thakkar
Date: Tuesday, May 4, 2010 - 4:02 pm

Device passthrough technology allows a guest to bypass the hypervisor and drive
the underlying physical device. VMware has been exploring various ways to
deliver this technology to users in a manner which is easy to adopt. In this
process we have prepared an architecture along with Intel - NPA (Network Plugin
Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
passthrough to a number of physical NICs which support it. The document below
provides an overview of NPA.

We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
Linux users can exploit the benefits provided by passthrough devices in a
seamless manner while retaining the benefits of virtualization. The document
below tries to answer most of the questions which we anticipated. Please let us
know your comments and queries.

Thank you.

Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Network Plugin Architecture
---------------------------

VMware has been working on various device passthrough technologies for the past
few years. Passthrough technology is interesting as it can result in better
performance/cpu utilization for certain demanding applications. In our vSphere
product we support direct assignment of PCI devices like networking adapters to
a guest virtual machine. This allows the guest to drive the device using the
device drivers installed inside the guest. This is similar to the way KVM
allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
for all I/O and control operations and hence it can not provide any value add
features such as live migration, suspend/resume, etc.

Network Plugin Architecture (NPA) is an approach which VMware has developed in
joint partnership with Intel which allows us to retain the best of passthrough
technology and virtualization. NPA allows for passthrough of the fast data
(I/O) path and lets the hypervisor deal with the slow control path using
traditional emulation/paravirtualization techniques. Through this ...
From: Stephen Hemminger
Date: Tuesday, May 4, 2010 - 5:05 pm

On Tue, 4 May 2010 16:02:25 -0700


Code please. Also, it has to work for all architectures not just VMware and
Intel.
--

From: Pankaj Thakkar
Date: Tuesday, May 4, 2010 - 5:18 pm

The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

The architecture supports more than Intel NICs. We started the project with Intel but plan to support all major IHVs including Broadcom, Qlogic, Emulex and others through a certification program. The architecture works on VMware ESX server only as it requires significant support from the hypervisor. Also, the vmxnet3 driver works on VMware platform only. AFAICT Xen has a different model for supporting SR-IOV devices and allowing live migration and the document briefly talks about it (paragraph 6).

Thanks,

-pankaj


--

From: David Miller
Date: Tuesday, May 4, 2010 - 5:32 pm

From: Pankaj Thakkar <pthakkar@vmware.com>

Stephen's point is that code talks and bullshit walks.

Talk about high level designs rarely gets any traction, and often goes
nowhere.  Give us an example implementation so there is something
concrete for us to sink our teeth into.
--

From: Pankaj Thakkar
Date: Tuesday, May 4, 2010 - 5:38 pm

Sure. We have been working on NPA for a while and have the code internally up
and running. Let me sync up internally on how and when we can provide the
vmxnet3 driver code so that people can look at it.


--

From: Stephen Hemminger
Date: Tuesday, May 4, 2010 - 7:44 pm

On Tue, 4 May 2010 17:18:57 -0700

As Dave said, we care more about what the implementation looks like than the high level
goals of the design. I think we all agree that better management of virtualized devices
is necessary, the problem is that their are so many of them (vmware, xen, HV, Xen), 
and vendors seem to to lean on their own specific implementation of a offloading, 
which makes a general solution more difficult. Please, Please solve this cleanly.

The little things like API's and locking semantics and handling of dynamic versus
static control can make a good design in principle fall apart when someone does a bad
job of implementing them.

Lastly, projects that have had multiple people involved for long periods of time
in the dark often end up building a legacy mentality "but we convinced vendor XXX to include it
in their Enterprise version 666" and require lots of "retraining" before the code
becomes acceptable.

-- 
--

From: Chris Wright
Date: Tuesday, May 4, 2010 - 5:58 pm

How does the throughput, latency, and host CPU utilization for normal
data path compare with say NetQueue?


How many cards actually support this NPA interface?  What does it look
like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT

How do you handle hardware which has a more symmetric view of the
SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
specification)?  Or hardware which has multiple functions per physical


This can happen without NPA as well.  VF simply needs to request
the change via the PF (in fact, hw does that right now).  Also, we
already have a host side management interface via PF (see, for example,
RTM_SETLINK IFLA_VF_MAC interface).


So we have a plugin per hardware VF implementation?  And the hypervisor

Yes, this is important, esp. instead of the requirement for hw to
implement a specific interface (I suspect you know all about this issue

And it will need to be GPL AFAICT from what you've said thus far.  It
does sound worrisome, although I suppose hw firmware isn't particularly


Please make this shell API interface and the PF/VF requirments available.

thanks,
-chris
--

From: Pankaj Thakkar
Date: Wednesday, May 5, 2010 - 12:00 pm

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = ...
From: Christoph Hellwig
Date: Wednesday, May 5, 2010 - 10:23 am

We're not going to add any kind of loader for binry blobs into kernel
space, sorry.  Don't even bother wasting your time on this.

--

From: Christoph Hellwig
Date: Wednesday, May 5, 2010 - 10:31 am

The mechanism described in the document is loading a binary blob
coded to an abstract API.

That's something entirely different from having normal modules for
the Virtual Functions, which we already have for various pieces of
hardware anyway.
--

From: Dmitry Torokhov
Date: Wednesday, May 5, 2010 - 10:35 am

Yes, with the exception that the only body of code that will be
accepted by the shell should be GPL-licensed and thus open and available
for examining. This is not different from having a standard kernel
module that is loaded normally and plugs into a certain subsystem.
The difference is that the binary resides not on guest filesystem

-- 
Dmitry
--

From: Pankaj Thakkar
Date: Wednesday, May 5, 2010 - 10:47 am

[PT] Today this is tied to vmxnet3 device and is intended to work on ESX hypervisor only (vmxnet3 works on VMware hypervisor only). All the loading support is inside the ESX hypervisor. I am going to post the interface between the shell and the plugin soon and you can see that there is not a whole lot of dependency or infrastructure requirements from the Linux kernel. Please keep in mind that we don't use Linux as a hypervisor but as a guest VM.

--

From: Arnd Bergmann
Date: Wednesday, May 5, 2010 - 1:09 pm

We have the right number of module loaders in the kernel: one. If you
add another one, you're doubling the amount of code that anyone

Your approach assumes that the plugin is always available, which has

If you have the limited driver for some hardware that does not have
the real thing, we could still ship just that. I would however guess
that most vendors are interested in not just running in vmware but
also other hypervisors that still require the full driver, so that
case would be rare, especially in the long run.

	Arnd
--

From: Dmitry Torokhov
Date: Wednesday, May 5, 2010 - 1:36 pm

Since plugin[s] are carried by the host they are indeed always
available.

-- 
Dmitry
--

From: Arnd Bergmann
Date: Wednesday, May 5, 2010 - 2:53 pm

But what makes you think that you can build code that can be linked
into arbitrary future kernel versions? The kernel does not define any
calling conventions that are stable across multiple versions or
configurations. For example, you'd have to provide different binaries
for each combination of

- 32/64 bit code
- gcc -mregparm=?
- lockdep
- tracepoints
- stackcheck
- NOMMU
- highmem
- whatever new gets merged

If you build the plugins only for specific versions of "enterprise" Linux
kernels, the code becomes really hard to debug and maintain.
If you wrap everything in your own version of the existing interfaces, your
code gets bloated to the point of being unmaintainable.

So I have to correct myself: this is very different from assuming the
driver is available in the guest, it's actually much worse.

	Arnd
--

From: Shreyas Bhatewara
Date: Wednesday, May 5, 2010 - 3:05 pm

The plugin image is not linked against Linux kernel. It is OS agnostic infact (Eg. same plugin works for Linux and Windows VMs)
Plugin is built against the shell API interface. It is loaded by hypervisor in a set of pages provided by shell. Guest OS specific tasks (like allocation of pages for plugin to load) are handled by shell and this is the one which will be upstreamed in Linux kernel. Maintenance of shell is the same as for any other driver currently existing in Linux kernel.


--

From: Gleb Natapov
Date: Thursday, May 6, 2010 - 1:19 am

Overhead of interpreting bytecode plugin is written in. Or are you
saying plugin is x86 assembly (32bit or 64bit btw?) and other arches
will have to have in kernel x86 emulator to use the plugin (like some
of them had for vgabios)? 

--
			Gleb.
--

From: Pankaj Thakkar
Date: Thursday, May 6, 2010 - 11:04 am

Plugin is x86 or x64 machine code. You write the plugin in C and compile it using gcc/ld to get the object file, we map the relevant sections only to the OS space. 

NPA is a way of enabling passthrough of SR-IOV NICs with live migration support on ESX Hypervisor which runs only on x86/x64 hardware. It only supports x86/x64 guest OS. So we don't have to worry about other architectures. If NPA approach needs to be extended and adopted by other hypervisors then we have to take care of that. Today we have two plugins images per VF (one for 32-bit, one for 64-bit).
--

From: Christoph Hellwig
Date: Thursday, May 6, 2010 - 1:19 pm

Which is simply not supportable for a cross-platform operating system
like Linux.

--

From: Christoph Hellwig
Date: Thursday, May 6, 2010 - 1:17 pm

We only support in-kernel drivers, everything else is subject to changes
in the kernel API and ABI.  What you do is basically introducing another
wrapper layer not allowing full access to the normal Linux API.  People
have tried this before and we're not willing to add it.  Do a little


And that's not something we care about at all.  The Linux kernel has
traditionally a very hostile position against cross platform drivers for

Yes, of course it does.  It's a normal driver at the point which it

But we use Linux as the hypervisor, too.  So if you want to target a
major infrastructure you might better make it available for that case.

--

From: Stephen Hemminger
Date: Wednesday, May 5, 2010 - 10:52 am

On Wed, 5 May 2010 13:39:51 -0400

Let me put it bluntly. Any design that allows external code to run
in the kernel is not going to be accepted.  Out of tree kernel modules are enough
of a pain already, why do you expect the developers to add another
interface.
--

From: Christoph Hellwig
Date: Thursday, May 6, 2010 - 1:21 pm

Exactly.  Until our friends at VMware get this basic fact it's useless
to continue arguing.

Pankaj and Dmitry: you're fine to waste your time on this, but it's not
going to go anywhere until you address that fundamental problem.  The
first thing you need to fix in your archicture is to integrate the VF
function code into the kernel tree, and we can work from there.

Please post patches doing this if you want to resume the discussion.

--

From: Shreyas Bhatewara
Date: Monday, July 12, 2010 - 8:06 pm

As discussed, following is the patch to give you an idea
about implementation of NPA for vmxnet3 driver. Although the
patch is big, I have verified it with checkpatch.pl. It gave
0 errors / warnings.

Signed-off-by: Matthieu Bucchaineri <matthieu@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
---

 drivers/net/vmxnet3/Makefile          |    2 
 drivers/net/vmxnet3/npa_defs.h        |   83 +
 drivers/net/vmxnet3/npa_plugin_api.h  |  473 ++++++++
 drivers/net/vmxnet3/npa_shell_api.h   |  234 ++++
 drivers/net/vmxnet3/vmxnet3_defs.h    |    2 
 drivers/net/vmxnet3/vmxnet3_drv.c     | 1845
+++++++++++++++++++--------------
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   66 +
 drivers/net/vmxnet3/vmxnet3_int.h     |  221 ++--
 drivers/net/vmxnet3/vmxnet3_plugin.c  | 1221 ++++++++++++++++++++++
 9 files changed, 3221 insertions(+), 926 deletions(-)
 create mode 100644 drivers/net/vmxnet3/npa_defs.h
 create mode 100644 drivers/net/vmxnet3/npa_plugin_api.h
 create mode 100644 drivers/net/vmxnet3/npa_shell_api.h
 create mode 100644 drivers/net/vmxnet3/vmxnet3_plugin.c

diff --git a/drivers/net/vmxnet3/Makefile b/drivers/net/vmxnet3/Makefile
index 880f509..af501d8 100644
--- a/drivers/net/vmxnet3/Makefile
+++ b/drivers/net/vmxnet3/Makefile
@@ -32,4 +32,4 @@
 
 obj-$(CONFIG_VMXNET3) += vmxnet3.o
 
-vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o
+vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3_plugin.o
diff --git a/drivers/net/vmxnet3/npa_defs.h
b/drivers/net/vmxnet3/npa_defs.h
new file mode 100644
index 0000000..74d28b8
--- /dev/null
+++ b/drivers/net/vmxnet3/npa_defs.h
@@ -0,0 +1,83 @@
+/*
+ * Network Plugin Architecture definitions.
+ *
+ * Copyright (C) 2008-2010, VMware, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
it
+ * under the terms of the GNU General Public License as published by
the
+ * Free Software Foundation; version 2 of the License and no later
version.
+ *
+ * ...
From: Stephen Hemminger
Date: Monday, July 12, 2010 - 10:16 pm

On Mon, 12 Jul 2010 20:06:28 -0700

I think the concept won't fly.

But you should really at least try running checkpatch to make sure
the style conforms.


-- 
--

From: Stephen Hemminger
Date: Tuesday, July 13, 2010 - 5:31 pm

On Mon, 12 Jul 2010 20:06:28 -0700

I am surprised, the code seems to use lots of mixed case in places
that don't really follow current kernel practice.

--

From: Greg KH
Date: Wednesday, July 14, 2010 - 2:49 am

Your patch is line-wrapped and can not be applied :(

Care to fix your email client?


Is there some reason that our in-kernel functions that do this type of
logic are not working for you to require you to reimplement this?

thanks,

greg k-h
--

From: Shreyas Bhatewara
Date: Wednesday, July 14, 2010 - 10:19 am

Greg,

Thanks for pointing out. I will fix both these issues and repost the patch.

->Shreyas
--

From: Pankaj Thakkar
Date: Wednesday, July 14, 2010 - 10:18 am

The plugin is guest agnostic and hence we did not want to rely on any kernel provided functions. The plugin uses only the interface provided by the shell. The assumption is that since the plugin is really simple and straight forward (all the control/init complexity lies in the PF driver in the hypervisor) we should be able to get by for most of the things and for things like memcpy/memset the plugin can write simple functions like this.


-p


________________________________________
From: Greg KH [greg@kroah.com]
Sent: Wednesday, July 14, 2010 2:49 AM
To: Shreyas Bhatewara
Cc: Christoph Hellwig; Stephen Hemminger; Pankaj Thakkar; pv-drivers@vmware.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; virtualization@lists.linux-foundation.org
Subject: Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3


Is there some reason that our in-kernel functions that do this type of
logic are not working for you to require you to reimplement this?

thanks,

greg k-h--

From: David Miller
Date: Wednesday, July 14, 2010 - 10:54 am

From: Pankaj Thakkar <pthakkar@vmware.com>

While I disagree entirely with this kind of approach, even that
doesn't justify what you're doing here.

memcpy() and memset() are on a much more fundamental ground than
"kernel provided functions".  They had better be available no matter
where you build this thing.

And doing what you're doing is foolish on so many levels.  One more
duplication of code, one more place for unnecessary bugs to live, one
more place that might need optimizations and thus require duplication
of even more work people have done over the years.
--

From: Jeremy Fitzhardinge
Date: Wednesday, July 14, 2010 - 11:03 am

Not to mention calling a function "MoveMemory" when it doesn't do a
memmove is just cruel.

    J
--

From: Greg KH
Date: Wednesday, July 14, 2010 - 1:20 pm

Really?  vmxnet3_plugin.c is no supposed to use any kernel-provided
functions at all?  Then why have it in the kernel at all?  Seriously,

If it's so simple, then why does it need to be separate?  Why not just
put it in your driver as-is to handle the ring-buffer logic (as that's
all it looks to be doing), and then you don't need any plugin code at
all?

It looks like you are linking this file into your "main" driver module,
so I fail to see any type of separation at all happening with this
patch.

Or am I totally missing something here?

thanks,

greg k-h
--

From: Shreyas Bhatewara
Date: Wednesday, July 14, 2010 - 1:42 pm

Reposting the patch with the fixes.

---

From: Shreyas Bhatewara <sbhatewara@vmware.com>

Patch to enable NPA support in vmxnet3 driver.

Signed-off-by: Matthieu Bucchaineri <matthieu@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

 drivers/net/vmxnet3/Makefile          |    2 
 drivers/net/vmxnet3/npa_defs.h        |   83 +
 drivers/net/vmxnet3/npa_plugin_api.h  |  473 ++++++++
 drivers/net/vmxnet3/npa_shell_api.h   |  234 ++++
 drivers/net/vmxnet3/vmxnet3_defs.h    |    2 
 drivers/net/vmxnet3/vmxnet3_drv.c     | 1841 +++++++++++++++++++--------------
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   66 +
 drivers/net/vmxnet3/vmxnet3_int.h     |  221 ++--
 drivers/net/vmxnet3/vmxnet3_plugin.c  | 1199 +++++++++++++++++++++
 9 files changed, 3195 insertions(+), 926 deletions(-)
 create mode 100644 drivers/net/vmxnet3/npa_defs.h
 create mode 100644 drivers/net/vmxnet3/npa_plugin_api.h
 create mode 100644 drivers/net/vmxnet3/npa_shell_api.h
 create mode 100644 drivers/net/vmxnet3/vmxnet3_plugin.c

diff --git a/drivers/net/vmxnet3/Makefile b/drivers/net/vmxnet3/Makefile
index 880f509..af501d8 100644
--- a/drivers/net/vmxnet3/Makefile
+++ b/drivers/net/vmxnet3/Makefile
@@ -32,4 +32,4 @@
 
 obj-$(CONFIG_VMXNET3) += vmxnet3.o
 
-vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o
+vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3_plugin.o
diff --git a/drivers/net/vmxnet3/npa_defs.h b/drivers/net/vmxnet3/npa_defs.h
new file mode 100644
index 0000000..74d28b8
--- /dev/null
+++ b/drivers/net/vmxnet3/npa_defs.h
@@ -0,0 +1,83 @@
+/*
+ * Linux driver for VMware's vmxnet3 ethernet NIC.
+ *
+ * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License and no later version.
+ *
+ * This program is distributed in the hope that it will be ...
From: Greg KH
Date: Wednesday, July 14, 2010 - 2:06 pm

Why would the kernel care about this file path?  And since when do we
hard-code file paths in the kernel in the first place (yeah, in some


This is happily copied around and zeroed out, but never actually used by

This field is never used.




This hiding of functions kind of implies that something odd is going on
here, right?  At the least, make them inline functions so you get the


This will never work, sorry.  Please use the proper functions for doing
this type of access.  I'm amazed that anyone even thought this would

What's wrong with the kernel provided function for this?

Anyway, just randomly poking at the code like this turns up these types
of trivial issues, has this code ever been run?

wierd,

greg k-h
--

From: Avi Kivity
Date: Wednesday, May 5, 2010 - 10:59 am

Is this enforced?  Since you pass the hardware through, you can't rely 

This is essentially a miniature network stack with a its own mini 
bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
correct view?

If so, the Linuxy approach would be to use the ordinary drivers and the 
Linux networking API, and hide the bond setup using namespaces.  The 
bond driver, or perhaps a new, similar, driver can be enhanced to 
propagate ethtool commands to its (hidden) components, and to have a 
control channel with the hypervisor.

This would make the approach hypervisor agnostic, you're just pairing 

So the Shell would be the reworked or new bond driver, and Plugins would 
be ordinary Linux network drivers.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Pankaj Thakkar
Date: Wednesday, May 5, 2010 - 12:44 pm

We don't pass the whole VF to the guest. Only the BAR which is responsible for
TX/RX/intr is mapped into guest space. The interface between the shell and
plugin only allows to do operations related to TX and RX such as send a packet
to the VF, allocate RX buffers, indicate a packet upto the shell. All control
operations are handled by the shell and the shell does what the existing
vmxnet3 drivers does (touch a specific register and let the device emulation do
the work). When a VF is mapped to the guest the hypervisor knows this and
programs the h/w accordingly on behalf of the shell. So for example if the VM
does a MAC address change inside the guest, the shell would write to
VMXNET3_REG_MAC{L|H} registers which would trigger the device emulation to read
the new mac address and update its internal virtual port information for the
virtual switch and if the VF is mapped it would also program the embedded

To some extent yes but there is no complicated bonding nor there is any thing
like a PCI hotplug. The shell interface is small and the OS always interacts
with the shell as the main driver. Based on the underlying VF the plugin
changes and this plugin as well is really small. Our vmxnet3 s/w plugin is
about 1300 lines with whitespaces and comments and the Intel Kawela plugin is
about 1100 lines with whitspaces and comments. The design principle is to put
more of the complexity related to initialization/control into the PF driver

In NPA we do not rely on the guest OS to provide any of these services like
bonding or PCI hotplug. We don't rely on the guest OS to unmap a VF and switch
a VM out of passthrough. In a bonding approach that becomes an issue you can't
just yank a device from underneath, you have to wait for the OS to process the
request and switch from using VF to the emulated device and this makes the
hypervisor dependent on the guest OS. Also we don't rely on the presence of all
the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
carries all the ...
From: Avi Kivity
Date: Thursday, May 6, 2010 - 1:58 am

Well the Shell does some sort of bonding (there are two links and the 
shell selects which one to exercise) and some sort of hotplug.  Since 
the Shell is part of the guest OS, you do rely on it.


How can you unmap the VF without guest cooperation?  If you're executing 
Plugin code, you can't yank anything out.


What ISAs do those plugins support?

-- 
error compiling committee.c: too many arguments to function

--

From: Pankaj Thakkar
Date: Monday, May 10, 2010 - 1:46 pm

No. This is a guideline which we provided to IHVs and would have to be enforced

In our Kawela plugin we don't have any reads from the memory space at all.
Hence you can yank the VF anytime (the code loaded in the guest address space
will keep on executing). Even if there were reads we can map the memory
pages to a NULL page and return 0xffffffff so that the plugin can detect this
and return an error to the shell. Remember there are no control operations in
the plugin and the code is really small (about 1k lines compared to 5k lines in

Depends on the model. Today the plugin code for checking the TX/RX rings runs

x86 and x64.

Thanks,

-pankaj

--

Previous thread: [GIT PULL] ocfs2 fixes for 2.6.34-rc by Joel Becker on Tuesday, May 4, 2010 - 3:57 pm. (1 message)

Next thread: Re: [patch 7/8] Add a bootparameter to reserve high linear address space. by Jeremy Fitzhardinge on Tuesday, May 4, 2010 - 4:37 pm. (1 message)