This patchset (against tip/master) fixes the problem that swiotlb exhausts ZONE_DMA: http://lkml.org/lkml/2008/8/31/16 The root problem is that swiotlb_alloc_coherent always use ZONE_DMA, which is fine for IA64 but not for x86_64. This patchset makes the callers set up the gfp flags so that swiotlb_alloc_coherent can stop playing with the gfp flags. I think that it would be better to remove the allocation code in swiotlb_alloc_coherent theoretically (what swiotlb should do is taking care of the swiotlb memory. And swiotlb_alloc_coherent is not useful since we use it only when we can't allocate memory reachable by the device or we are in out of memory). But that code works for both x86 and IA64 so it's not so bad, I guess. #1 is for IA64, #2-4 for x86, and #5 is for swiotlb. = arch/ia64/include/asm/dma-mapping.h | 4 ++- arch/x86/kernel/pci-nommu.c | 21 +------------------ include/asm-x86/dma-mapping.h | 37 +++++++++++++++++++++++++++++++--- lib/swiotlb.c | 7 ------ 4 files changed, 37 insertions(+), 32 deletions(-) --
This patch makes dma_alloc_coherent use GFP_DMA at all times. This is necessary for swiotlb, which requires the callers to set up the gfp flags properly. swiotlb_alloc_coherent tries to allocate pages with the gfp flags. If the allocated memory isn't fit for dev->coherent_dma_mask, swiotlb_alloc_coherent reserves some of the swiotlb memory area, which is precious resource. So the callers need to set up the gfp flags properly. This patch means that other IA64 IOMMUs' dma_alloc_coherent also use GFP_DMA. These IOMMUs (e.g. SBA IOMMU) don't need GFP_DMA since they can map a memory to any address. But IA64's GFP_DMA is large, generally drivers allocate small memory with dma_alloc_coherent only at startup. So I chose the simplest way to set up the gfp flags for swiotlb. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/ia64/include/asm/dma-mapping.h | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h index 9f0df9b..06ff1ba 100644 --- a/arch/ia64/include/asm/dma-mapping.h +++ b/arch/ia64/include/asm/dma-mapping.h @@ -8,7 +8,9 @@ #include <asm/machvec.h> #include <linux/scatterlist.h> -#define dma_alloc_coherent platform_dma_alloc_coherent +#define dma_alloc_coherent(dev, size, handle, gfp) \ + platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA) + /* coherent mem. is cheap */ static inline void * dma_alloc_noncoherent(struct device *dev, size_t size, dma_addr_t *dma_handle, -- 1.5.5.GIT --
The check to see if dev->dma_mask is NULL in pci-nommu is more appropriate for dma_alloc_coherent(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/x86/kernel/pci-nommu.c | 3 --- include/asm-x86/dma-mapping.h | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c index 73853d3..0f51883 100644 --- a/arch/x86/kernel/pci-nommu.c +++ b/arch/x86/kernel/pci-nommu.c @@ -80,9 +80,6 @@ nommu_alloc_coherent(struct device *hwdev, size_t size, int node; struct page *page; - if (hwdev->dma_mask == NULL) - return NULL; - gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); gfp |= __GFP_ZERO; diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h index bc6c8df..39d3641 100644 --- a/include/asm-x86/dma-mapping.h +++ b/include/asm-x86/dma-mapping.h @@ -256,6 +256,9 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp |= GFP_DMA; } + if (!dev->dma_mask) + return NULL; + if (ops->alloc_coherent) return ops->alloc_coherent(dev, size, dma_handle, gfp); -- 1.5.5.GIT --
We need to use __GFP_DMA for NULL device argument (fallback_dev) with pci-nommu. It's a hack for ISA (and some old code) so we need to use GFP_DMA. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/x86/kernel/pci-nommu.c | 3 +-- include/asm-x86/dma-mapping.h | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c index 0f51883..ada1c87 100644 --- a/arch/x86/kernel/pci-nommu.c +++ b/arch/x86/kernel/pci-nommu.c @@ -80,7 +80,6 @@ nommu_alloc_coherent(struct device *hwdev, size_t size, int node; struct page *page; - gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); gfp |= __GFP_ZERO; dma_mask = hwdev->coherent_dma_mask; @@ -93,7 +92,7 @@ nommu_alloc_coherent(struct device *hwdev, size_t size, node = dev_to_node(hwdev); #ifdef CONFIG_X86_64 - if (dma_mask <= DMA_32BIT_MASK) + if (dma_mask <= DMA_32BIT_MASK && !(gfp & GFP_DMA)) gfp |= GFP_DMA32; #endif diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h index 39d3641..9d6dcf4 100644 --- a/include/asm-x86/dma-mapping.h +++ b/include/asm-x86/dma-mapping.h @@ -248,6 +248,8 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, struct dma_mapping_ops *ops = get_dma_ops(dev); void *memory; + gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); + if (dma_alloc_from_coherent(dev, size, dma_handle, &memory)) return memory; -- 1.5.5.GIT --
Non real IOMMU implemenations (which doesn't do virtual mappings,
e.g. swiotlb, pci-nommu, etc) need to use proper gfp flags and
dma_mask to allocate pages in their own dma_alloc_coherent()
(allocated page need to be suitable for device's coherent_dma_mask).
This patch makes dma_alloc_coherent do this job so that IOMMUs don't
need to take care of it any more.
Real IOMMU implemenataions can simply ignore the gfp flags.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
arch/x86/kernel/pci-nommu.c | 19 ++-----------------
include/asm-x86/dma-mapping.h | 32 ++++++++++++++++++++++++++++----
2 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index ada1c87..8e398b5 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -80,26 +80,11 @@ nommu_alloc_coherent(struct device *hwdev, size_t size,
int node;
struct page *page;
- gfp |= __GFP_ZERO;
-
- dma_mask = hwdev->coherent_dma_mask;
- if (!dma_mask)
- dma_mask = *(hwdev->dma_mask);
+ dma_mask = dma_alloc_coherent_mask(hwdev, gfp);
- if (dma_mask < DMA_24BIT_MASK)
- return NULL;
+ gfp |= __GFP_ZERO;
node = dev_to_node(hwdev);
-
-#ifdef CONFIG_X86_64
- if (dma_mask <= DMA_32BIT_MASK && !(gfp & GFP_DMA))
- gfp |= GFP_DMA32;
-#endif
-
- /* No alloc-free penalty for ISA devices */
- if (dma_mask == DMA_24BIT_MASK)
- gfp |= GFP_DMA;
-
again:
page = alloc_pages_node(node, gfp, get_order(size));
if (!page)
diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h
index 9d6dcf4..a072ae6 100644
--- a/include/asm-x86/dma-mapping.h
+++ b/include/asm-x86/dma-mapping.h
@@ -241,6 +241,29 @@ static inline int dma_get_cache_alignment(void)
return boot_cpu_data.x86_clflush_size;
}
+static inline unsigned long dma_alloc_coherent_mask(struct device *dev,
+ gfp_t gfp)
+{
+ unsigned long dma_mask = 0;
+
+ dma_mask = dev->coherent_dma_mask;
+ if ...The callers are supposed to set up the gfp flags appropriately.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
lib/swiotlb.c | 7 -------
1 files changed, 0 insertions(+), 7 deletions(-)
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 977edbd..3066ffe 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -467,13 +467,6 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
void *ret;
int order = get_order(size);
- /*
- * XXX fix me: the DMA API should pass us an explicit DMA mask
- * instead, or use ZONE_DMA32 (ia64 overloads ZONE_DMA to be a ~32
- * bit range instead of a 16MB one).
- */
- flags |= GFP_DMA;
-
ret = (void *)__get_free_pages(flags, order);
if (ret && address_needs_mapping(hwdev, virt_to_bus(ret))) {
/*
--
1.5.5.GIT
--
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
Cool :-)
This is much better than our last two tries to solve this problem. Doing
no gfp handling at all in swiotlb_alloc_coherent is a nice and clean
solution.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
i've applied Fujita's patches to tip/x86/iommu: 68e91d6: swiotlb: remove GFP_DMA hack in swiotlb_alloc_coherent 823e7e8: x86: dma_alloc_coherent sets gfp flags properly 8a53ad6: x86: fix nommu_alloc_coherent allocation with NULL device argument de9f521: x86: move pci-nommu's dma_mask check to common code 3a80b6a: ia64: dma_alloc_coherent always use GFP_DMA Tony, do you have any problem with us carrying the ia64 commit above (3a80b6a, also attached below) in tip/x86/iommu tree? It's really small and straightforward. Ingo -----------------> From 3a80b6aa271eb08a3da1a04b5cbdcdc19d4a5ae0 Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Mon, 8 Sep 2008 18:10:10 +0900 Subject: [PATCH] ia64: dma_alloc_coherent always use GFP_DMA This patch makes dma_alloc_coherent use GFP_DMA at all times. This is necessary for swiotlb, which requires the callers to set up the gfp flags properly. swiotlb_alloc_coherent tries to allocate pages with the gfp flags. If the allocated memory isn't fit for dev->coherent_dma_mask, swiotlb_alloc_coherent reserves some of the swiotlb memory area, which is precious resource. So the callers need to set up the gfp flags properly. This patch means that other IA64 IOMMUs' dma_alloc_coherent also use GFP_DMA. These IOMMUs (e.g. SBA IOMMU) don't need GFP_DMA since they can map a memory to any address. But IA64's GFP_DMA is large, generally drivers allocate small memory with dma_alloc_coherent only at startup. So I chose the simplest way to set up the gfp flags for swiotlb. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/ia64/include/asm/dma-mapping.h | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h index 9f0df9b..06ff1ba 100644 --- a/arch/ia64/include/asm/dma-mapping.h +++ ...
On Mon, 8 Sep 2008 18:10:09 +0900 Thanks, works well for me :) Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
