Re: [PATCH 9/9] x86: Detect whether we should use Xen SWIOTLB.

Previous thread: [PATCH 8/9] pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions. by Konrad Rzeszutek Wilk on Tuesday, July 27, 2010 - 10:00 am. (1 message)

Next thread: [PATCH v2] checkpatch: Add warnings for use of mdelay() by Israel Schlesinger on Tuesday, July 27, 2010 - 10:27 am. (2 messages)
From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 9:59 am

Changes since last posting [v0.8.3 (http://lkml.org/lkml/2010/6/22/310)]:
 - Coalesced the Xen-SWIOTLB set of patches in one patch.
 - Moved the swiotlb-xen.c to drivers/xen.
 - Added Ack's, Cc's, as appropriate.

This patchset:

These nine patches lay the groundwork for Xen Paravirtualized (PV)
domains to access PCI pass-through devices (with and without an hardware
IOMMU) These patches utilize the SWIOTLB library modifications
(http://lkml.org/lkml/2010/6/4/272).

The end user of this are:
 - The Xen PCI frontend and Xen PCI [1] which require a DMA API "backend"
   that understands Xen's MMU. This allows the PV domains to use PCI devices.
   The use case is for machines with and without hardware IOMMU. Without an
   hardware IOMMU you have a potential security hole wherin a guest domain
   can use the hardware to map pages outside its memory range and slurp
   pages up. As such, this is more restricted to a Priviliged PV domain,
   aka - device driver domain (similar to Qubes but a poor-man mechanism [2]).
 - Xen PV domain 0 support. Without this domain 0 is incapable of using any
   PCI devices.

This patch-set is split in two groups. The first alter the Xen components,
while the second introduces the SWIOTLB-Xen.

The Xen components patches consist of:

      xen: Allow unprivileged Xen domains to create iomap pages
      xen: Rename the balloon lock
      xen: Add xen_create_contiguous_region
      xen: use _PAGE_IOMAP in ioremap to do machine mappings
      vmap: add flag to allow lazy unmap to be disabled at runtime
      xen/mmu: inhibit vmap aliases rather than trying to clear them out

which alter the Xen MMU, which by default utilizes a layer of indirection
wherein the PFN is translated to the Machine Frame Number (MFN) and vice-versa.
This is required to "fool" the guest in thinking its memory starts at PFN 0 and
goes up to the available amount.  While in the background, PFN 0 might as well be
MFN 1048576 (4GB).

For PCI/DMA API calls (ioremap, ...
From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 10:00 am

It is paramount that we call pci_xen_swiotlb_detect before
pci_swiotlb_detect as both implementations use the 'swiotlb'
and 'swiotlb_force' flags. The pci-xen_swiotlb_detect inhibits
the swiotlb_force and swiotlb flag so that the native SWIOTLB
implementation is not enabled when running under Xen.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 arch/x86/kernel/pci-dma.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4b7e3d8..9f07cfc 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -13,6 +13,7 @@
 #include <asm/calgary.h>
 #include <asm/amd_iommu.h>
 #include <asm/x86_init.h>
+#include <asm/xen/swiotlb-xen.h>
 
 static int forbid_dac __read_mostly;
 
@@ -132,7 +133,7 @@ void __init pci_iommu_alloc(void)
 	/* free the range so iommu could get some range less than 4G */
 	dma32_free_bootmem();
 
-	if (pci_swiotlb_detect())
+	if (pci_xen_swiotlb_detect() || pci_swiotlb_detect())
 		goto out;
 
 	gart_iommu_hole_init();
@@ -144,6 +145,8 @@ void __init pci_iommu_alloc(void)
 	/* needs to be called after gart_iommu_hole_init */
 	amd_iommu_detect();
 out:
+	pci_xen_swiotlb_init();
+
 	pci_swiotlb_init();
 }
 
@@ -296,7 +299,7 @@ static int __init pci_iommu_init(void)
 #endif
 	x86_init.iommu.iommu_init();
 
-	if (swiotlb) {
+	if (swiotlb || xen_swiotlb) {
 		printk(KERN_INFO "PCI-DMA: "
 		       "Using software bounce buffering for IO (SWIOTLB)\n");
 		swiotlb_print_info();
-- 
1.7.0.1

--

From: H. Peter Anvin
Date: Tuesday, July 27, 2010 - 12:03 pm

Is there any way we can abstract this out a bit more instead of crapping
on generic code?

	-hpa
-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 12:41 pm

I was toying with something like this:

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 9f07cfc..e0cd388 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -45,6 +45,25 @@ int iommu_detected __read_mostly = 0;
  */
 int iommu_pass_through __read_mostly;
 
+initcall_t __swiotlb_initcall_detect[] =
+	{pci_xen_swiotlb_detect,
+	 pci_swiotlb_detect,
+	NULL};
+
+initcall_t __swiotlb_initcall_init[] = {
+	pci_xen_swiotlb_init,
+	pci_swiotlb_init,
+	NULL};
+
+
+initcall_t __iommu_initcall_detect[] = {
+	gart_iommu_hole_init,
+	detect_calgary,
+	detect_intel_iommu,
+	/* needs to be called after gart_iommu_hole_init */
+	amd_iommu_detect,
+	NULL};
+
 /* Dummy device used for NULL arguments (normally ISA). */
 struct device x86_dma_fallback_dev = {
 	.init_name = "fallback device",
@@ -130,24 +149,22 @@ static void __init dma32_free_bootmem(void)
 
 void __init pci_iommu_alloc(void)
 {
+	initcall_t *fn;
+
 	/* free the range so iommu could get some range less than 4G */
 	dma32_free_bootmem();
 
-	if (pci_xen_swiotlb_detect() || pci_swiotlb_detect())
-		goto out;
-
-	gart_iommu_hole_init();
-
-	detect_calgary();
+	/* First do the SWIOTLB - if they work, skip the IOMMUs. */
+	for (fn = __swiotlb_initcall_detect; fn != NULL; fn++)
+		if ((*fn)())
+			goto swiotlb_init;
 
-	detect_intel_iommu();
-
-	/* needs to be called after gart_iommu_hole_init */
-	amd_iommu_detect();
-out:
-	pci_xen_swiotlb_init();
+	for (fn = __iommu_initcall_detect; fn != NULL; fn++)
+		(*fn)();
 
-	pci_swiotlb_init();
+swiotlb_init:
+	for (fn = __swiotlb_initcall_init; fn != NULL; fn++)
+		(*fn)();
 }
 
 void *dma_generic_alloc_coherent(struct device *dev, size_t size,


(compiles with warnings and has not yet been completly flushed), but
Fujita mentioned that it might the right choice to use this
--

From: FUJITA Tomonori
Date: Tuesday, July 27, 2010 - 4:36 pm

On Tue, 27 Jul 2010 15:41:05 -0400

I don't like this change much too, however I think that this is the
most simple and straightforward.

Basically, Xen's swiotlb works like a new IOMMU implementation so we
need to initialize it like other IOMMU implementations (call the

I really don't think that this makes the code better. I prefer the

btw, this comment is wrong. We check if we are forced to use SWIOTLB
by kernel command line here.

Even if SWIOTLB works, we see if hardware IOMMU is available. SWIOTLB
is a last resort. We prefer hardware IOMMU.
--

From: H. Peter Anvin
Date: Tuesday, July 27, 2010 - 5:19 pm

Even mentioning "xen" in generic code should be considered a bug.  I
think we *do* need to driverize the iommu stuff, and yes, Xen's swiotlb

The special handling of swiotlb here really looks wrong, but otherwise I

Any reason to not just handle swiotlb like any of the other iommus, at
the bottom of the list?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: FUJITA Tomonori
Date: Tuesday, July 27, 2010 - 5:52 pm

On Tue, 27 Jul 2010 17:19:41 -0700

we need to check if swiotlb usage is forced by the command line since:

- we skip hardware IOMMU initialization if so.


We also need swiotlb initialization after all hardware IOMMU
initialization since:

- if all hardware IOMMU initialization fails, we might need to
initialize swiotlb.

- even if hardware IOMMU initialization is successful, we might need
to initialize swiotlb (even if a system has hardware IOMMU, some
devices are not connected to hardware IOMMU).

- swiotlb initialization must be after GART initialization. We reserve
some DMA32 memory for broken bios with GART. The order must be freeing
the memory, initializing GART, then initializing swiotlb.
Initializing swiotlb before GART steals the reserved memory. It breaks
GART.
--

From: Konrad Rzeszutek Wilk
Date: Wednesday, July 28, 2010 - 3:38 pm

I think we all don't like the way 'pci_iommu_alloc' does it. But it does
the job right now pretty well, and the code looks well, ok. Adding in
the extra '_detect' and '_init' does not detract from it all that much.


I think the flow a). check if we need SWIOTLB b), check all IOMMUs, c).
recheck SWIOTLB in case no IOMMUs volunteered MUST be preserved
irregardless if we driverize the IOMMUs/SWIOTLB or not.

Perhaps we should get together at one of these Linux conferences and
think this one through? Beers on me.

--

From: H. Peter Anvin
Date: Wednesday, July 28, 2010 - 3:52 pm

I don't understand point (a) here.  (c) simply seems like the fallback
case, and in the case we are actively forcing swiotlb we simply skip
step (b).

	-hpa
--

From: FUJITA Tomonori
Date: Thursday, July 29, 2010 - 12:17 am

On Wed, 28 Jul 2010 15:52:50 -0700

Looks like (a) is too simplified. The actual XEN code (a) is:

+int __init pci_xen_swiotlb_detect(void)
+{
+
+	/* If running as PV guest, either iommu=soft, or swiotlb=force will
+	 * activate this IOMMU. If running as PV privileged, activate it
+	 * irregardlesss.
+	 */
+	if ((xen_initial_domain() || swiotlb || swiotlb_force) &&
+	    (xen_pv_domain()))
+		xen_swiotlb = 1;
+
+	/* If we are running under Xen, we MUST disable the native SWIOTLB.
+	 * Don't worry about swiotlb_force flag activating the native, as
+	 * the 'swiotlb' flag is the only one turning it on. */
+	if (xen_pv_domain())
+		swiotlb = 0;
+
+	return xen_swiotlb;

It does things more complicated than checking if swiotlb usage is
forced.

Looks like we need to call Xen specific code twice, (a) and (c), I
dislike it though.


btw, (c) is not the fallback case (i.e. if we can't find hardware
IOMMU, we enable swiotlb). We use both hardware IOMMU and swiotlb on
some systems.
--

From: H. Peter Anvin
Date: Thursday, July 29, 2010 - 6:44 am

Right, which is why it can't be folded into (b).

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Konrad Rzeszutek Wilk
Date: Thursday, July 29, 2010 - 9:05 am

The way it works right now is that if user specifies swiotlb=force we would

I can eliminate step c) by making a) 'pci_xen_swiotlb_detect' do
what it does now and also utilize the x86_init.iommu.iommu_init.
In essence making it an IOMMU-type-ish.

The patch is on top of the other patches and the only reason I am calling
in 'pci_iommu_alloc' the 'pci_xen_swiotlb_detect' before 'pci_swiotlb_detect'
is because a user could specify 'swiotlb=force' and that would bypass the
Xen SWIOTLB detection code and end up using the wrong dma_ops (under Xen
of course). Oh, and I added a check in gart_iommu_hole_init() to stop it
from setting the iommu_init to its own.

What do you guys think?

Another option would be in 'pci_xen_swiotlb_detect' to coalesce
it with 'pci_xen_swiotlb_init' and right there do the deed.

diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h b/arch/x86/include/asm/xen/swiotlb-xen.h
index 1be1ab7..07ed055 100644
--- a/arch/x86/include/asm/xen/swiotlb-xen.h
+++ b/arch/x86/include/asm/xen/swiotlb-xen.h
@@ -4,11 +4,9 @@
 #ifdef CONFIG_SWIOTLB_XEN
 extern int xen_swiotlb;
 extern int __init pci_xen_swiotlb_detect(void);
-extern void __init pci_xen_swiotlb_init(void);
 #else
 #define xen_swiotlb (0)
 static inline int __init pci_xen_swiotlb_detect(void) { return 0; }
-static inline void __init pci_xen_swiotlb_init(void) { }
 #endif
 
 #endif /* _ASM_X86_SWIOTLB_XEN_H */
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index b5d8b0b..7a3ea9a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -379,6 +379,9 @@ void __init gart_iommu_hole_init(void)
 	int fix, slot, valid_agp = 0;
 	int i, node;
 
+	if (iommu_detected)
+		return;
+
 	if (gart_iommu_aperture_disabled || !fix_aperture ||
 	    !early_pci_allowed())
 		return;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 9f07cfc..ef1de8e 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -133,7 +133,9 @@ ...
From: Konrad Rzeszutek Wilk
Date: Monday, August 2, 2010 - 8:25 am

And silence ensues. Let me back up a bit as I think I am heading the
wrong way.

hpa, are your concerns that a) inserting a sub-system call in the
generic code is not good. Or b) that we have five IOMMUs (counting SWIOTLB in that
category) and that we don't jettison from memory the ones we don't need
(that would be the primary goal of driverization of those IOMMUs,
right?). Or c) we should remove all sub-system detect calls (Calgary, AMD,
Intel, AGP) altogether from pci-dma.c and depend more on
x86_init.iommu structure (perhaps expend it?)
--

From: H. Peter Anvin
Date: Monday, August 2, 2010 - 8:30 am

Sorry, had to deal with other stuff.

Basically, a) and c) are the issues, with a) being the more immediate;
the amount of code left in memory is relatively small and as such I'm
not too concerned with that aspect specifically.

With five IOMMUs we're well past the point where we need to have a clean
and generic interface instead of having everything be ad hoc and
interdependent.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: FUJITA Tomonori
Date: Monday, August 2, 2010 - 8:43 am

On Mon, 02 Aug 2010 08:30:38 -0700

That's the difficult part because IOMMUs are not
interdependent. Hardware IOMMUs are related with swiotlb. GART and
AMD-IOMMU are too.

We could invent sorta IOMMU register interface and driver-ize IOMMUs
but they can't be interdependent completely.
--

From: H. Peter Anvin
Date: Monday, August 2, 2010 - 8:47 am

Of course.  However, we need there to be as much structure to it as
there can be.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: FUJITA Tomonori
Date: Monday, August 2, 2010 - 9:01 am

On Mon, 02 Aug 2010 08:47:53 -0700

Ok, let's see if Konrad can invent something clean.

But his attempt to create "swiotlb iommu function array" and "hardware
iommu function array" looks like to makes the code more unreadable.
--

From: Konrad Rzeszutek Wilk
Date: Monday, August 2, 2010 - 9:42 am

Let me go to my favorite coffee shop and think this one through.
Can I get concession for putting the original patch in (the simple, dumb one),
and then:
 - start working on the IOMMU register interface without having to try
   to get it done for 2.6.36, and
 - do the driverization as a seperate cleanup.

--

From: H. Peter Anvin
Date: Monday, August 2, 2010 - 9:53 am

That makes sense.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: FUJITA Tomonori
Date: Monday, August 2, 2010 - 10:35 pm

On Mon, 2 Aug 2010 12:42:34 -0400

We could simplify swiotlb initialization after 2.6.36. If we merge my
patches to expand swiotlb memory dynamically, we could initialize
swiotlb before any hw IOMMUs (see the commit
186a25026c44d1bfa97671110ff14dcd0c99678e).

If you can make Xen swiotlb initialized like HW iommu, the IOMMU
initialization could be cleaner.

As I wrote, I also want to simplify swiotlb's init memory
allocation. I'll see what I can do on the whole issue.
--

From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 9:59 am

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Rather than trying to deal with aliases once they appear, just completely
inhibit them.  Mostly the removal of aliases was managable, but it comes
unstuck in xen_create_contiguous_region() because it gets executed at
interrupt time (as a result of dma_alloc_coherent()), which causes all
sorts of confusion in the vmap code, as it was never intended to be run
in interrupt context.

This has the unfortunate side effect of removing all the unmap batching
the vmap code so carefully added, but that can't be helped.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index eb51402..ef5728d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -42,6 +42,7 @@
 #include <linux/highmem.h>
 #include <linux/debugfs.h>
 #include <linux/bug.h>
+#include <linux/vmalloc.h>
 #include <linux/module.h>
 #include <linux/gfp.h>
 
@@ -1015,8 +1016,6 @@ static int xen_pin_page(struct mm_struct *mm, struct page *page,
    read-only, and can be pinned. */
 static void __xen_pgd_pin(struct mm_struct *mm, pgd_t *pgd)
 {
-	vm_unmap_aliases();
-
 	xen_mc_batch();
 
 	if (__xen_pgd_walk(mm, pgd, xen_pin_page, USER_LIMIT)) {
@@ -1580,7 +1579,6 @@ static void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn, unsigned l
 	if (PagePinned(virt_to_page(mm->pgd))) {
 		SetPagePinned(page);
 
-		vm_unmap_aliases();
 		if (!PageHighMem(page)) {
 			make_lowmem_page_readonly(__va(PFN_PHYS((unsigned long)pfn)));
 			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
@@ -2026,6 +2024,8 @@ void __init xen_init_mmu_ops(void)
 	x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
+
+	vmap_lazy_unmap = false;
 ...
From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 10:00 am

This patchset:

PV guests under Xen are running in an non-contiguous memory architecture.

When PCI pass-through is utilized, this necessitates an IOMMU for
translating bus (DMA) to virtual and vice-versa and also providing a
mechanism to have contiguous pages for device drivers operations (say DMA
operations).

Specifically, under Xen the Linux idea of pages is an illusion. It
assumes that pages start at zero and go up to the available memory. To
help with that, the Linux Xen MMU provides a lookup mechanism to
translate the page frame numbers (PFN) to machine frame numbers (MFN)
and vice-versa. The MFN are the "real" frame numbers. Furthermore
memory is not contiguous. Xen hypervisor stitches memory for guests
from different pools, which means there is no guarantee that PFN==MFN
and PFN+1==MFN+1. Lastly with Xen 4.0, pages (in debug mode) are
allocated in descending order (high to low), meaning the guest might
never get any MFN's under the 4GB mark.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 drivers/xen/Kconfig       |    4 +
 drivers/xen/Makefile      |    3 +-
 drivers/xen/swiotlb-xen.c |  515 +++++++++++++++++++++++++++++++++++++++++++++
 include/xen/swiotlb-xen.h |   65 ++++++
 4 files changed, 586 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/swiotlb-xen.c
 create mode 100644 include/xen/swiotlb-xen.h

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index fad3df2..97199c2 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -62,4 +62,8 @@ config XEN_SYS_HYPERVISOR
 	 virtual environment, /sys/hypervisor will still be present,
 	 but will have no xen contents.
 
+config SWIOTLB_XEN
+	def_bool y
+	depends on SWIOTLB
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ...
From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 9:59 am

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

In a Xen domain, ioremap operates on machine addresses, not
pseudo-physical addresses.  We use _PAGE_IOMAP to determine whether a
mapping is intended for machine addresses.

[ Impact: allow Xen domain to map real hardware ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h |    8 +---
 arch/x86/xen/mmu.c              |   71 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 018a0a4..bf5f7d3 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -112,13 +112,9 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
  */
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
-	extern unsigned long max_mapnr;
 	unsigned long pfn = mfn_to_pfn(mfn);
-	if ((pfn < max_mapnr)
-	    && !xen_feature(XENFEAT_auto_translated_physmap)
-	    && (get_phys_to_machine(pfn) != mfn))
-		return max_mapnr; /* force !pfn_valid() */
-	/* XXX fixme; not true with sparsemem */
+	if (get_phys_to_machine(pfn) != mfn)
+		return -1; /* force !pfn_valid() */
 	return pfn;
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 914f046..a4dea9d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -56,9 +56,11 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
 
+#include <xen/xen.h>
 #include <xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/version.h>
+#include <xen/interface/memory.h>
 #include <xen/hvc-console.h>
 
 #include "multicalls.h"
@@ -377,6 +379,28 @@ static bool xen_page_pinned(void *ptr)
 	return PagePinned(page);
 }
 
+static bool xen_iomap_pte(pte_t pte)
+{
+	return xen_initial_domain() && (pte_flags(pte) & _PAGE_IOMAP);
+}
+
+static void xen_set_iomap_pte(pte_t *ptep, ...
From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 9:59 am

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Add a flag to force lazy_max_pages() to zero to prevent any outstanding
mapped pages.  We'll need this for Xen.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Nick Piggin <npiggin@suse.de>
---
 include/linux/vmalloc.h |    2 ++
 mm/vmalloc.c            |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 227c2a5..b840fda 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -7,6 +7,8 @@
 
 struct vm_area_struct;		/* vma defining user mapping in mm_types.h */
 
+extern bool vmap_lazy_unmap;
+
 /* bits in flags of vmalloc's vm_struct below */
 #define VM_IOREMAP	0x00000001	/* ioremap() and friends */
 #define VM_ALLOC	0x00000002	/* vmalloc() */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..7f35fe2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,7 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+bool vmap_lazy_unmap __read_mostly = true;
 
 /*** Page table manipulation functions ***/
 
@@ -502,6 +503,9 @@ static unsigned long lazy_max_pages(void)
 {
 	unsigned int log;
 
+	if (!vmap_lazy_unmap)
+		return 0;
+
 	log = fls(num_online_cpus());
 
 	return log * (32UL * 1024 * 1024 / PAGE_SIZE);
-- 
1.7.0.1

--

From: Konrad Rzeszutek Wilk
Date: Tuesday, July 27, 2010 - 9:59 am

From: Alex Nixon <alex.nixon@citrix.com>

* xen_create_contiguous_region needs access to the balloon lock to
  ensure memory doesn't change under its feet, so expose the balloon
  lock
* Change the name of the lock to xen_reservation_lock, to imply it's
  now less-specific usage.

[ Impact: cleanup ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c             |    7 +++++++
 drivers/xen/balloon.c          |   15 ++++-----------
 include/xen/interface/memory.h |    8 ++++++++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a5577f5..9e0d82f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -70,6 +70,13 @@
 
 #define MMU_UPDATE_HISTO	30
 
+/*
+ * Protects atomic reservation decrease/increase against concurrent increases.
+ * Also protects non-atomic updates of current_pages and driver_pages, and
+ * balloon lists.
+ */
+DEFINE_SPINLOCK(xen_reservation_lock);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct {
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 1a0d8c2..500290b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -85,13 +85,6 @@ static struct sys_device balloon_sysdev;
 
 static int register_balloon(struct sys_device *sysdev);
 
-/*
- * Protects atomic reservation decrease/increase against concurrent increases.
- * Also protects non-atomic updates of current_pages and driver_pages, and
- * balloon lists.
- */
-static DEFINE_SPINLOCK(balloon_lock);
-
 static struct balloon_stats balloon_stats;
 
 /* We increase/decrease in batches which fit in a page */
@@ -210,7 +203,7 @@ static int increase_reservation(unsigned long nr_pages)
 	if (nr_pages > ARRAY_SIZE(frame_list))
 		nr_pages = ARRAY_SIZE(frame_list);
 
-	spin_lock_irqsave(&balloon_lock, ...
Previous thread: [PATCH 8/9] pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions. by Konrad Rzeszutek Wilk on Tuesday, July 27, 2010 - 10:00 am. (1 message)

Next thread: [PATCH v2] checkpatch: Add warnings for use of mdelay() by Israel Schlesinger on Tuesday, July 27, 2010 - 10:27 am. (2 messages)