This is for tip/iommu.
btw, Joery, what happens if map_sg (or map_single) gets
not-DMA-capable buffer in the case of !iommu || !domain?
==
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
need it for devices that the IOMMU can do virtual mappings for. This
patch avoids unnecessary low zone allocation.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
arch/x86/kernel/amd_iommu.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 01c68c3..8efd249 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -999,6 +999,11 @@ static void *alloc_coherent(struct device *dev, size_t size,
u16 devid;
phys_addr_t paddr;
+ get_device_resources(dev, &iommu, &domain, &devid);
+
+ if (iommu && domain)
+ flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
virt_addr = (void *)__get_free_pages(flag, get_order(size));
if (!virt_addr)
return 0;
@@ -1006,8 +1011,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
memset(virt_addr, 0, size);
paddr = virt_to_phys(virt_addr);
- get_device_resources(dev, &iommu, &domain, &devid);
-
if (!iommu || !domain) {
*dma_addr = (dma_addr_t)paddr;
return virt_addr;
--
1.5.4.2
--
applied these commits to tip/x86/iommu: edee956: x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent 9821626: x86: convert dma_alloc_coherent to use is_device_dma_capable 8a493d3: x86: remove duplicated extern force_iommu thanks! Ingo --
It blows up :-)
I know it but currently it is no problem because this will never happen
in any real AMD IOMMU system (because the hardware just remaps every
device in the system). It needs a fix anyway and the
right solution here is to fall back to one of the software iommu
implementations. The stackable dma_ops patches I have currently in work
These flags are already removed in the dma_alloc_coherent function which
calls this one. Further I think in the case of a remapping IOMMU like
this one we should avoid implementing any gfp hacks and just trust the
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
or add a WARN_ON_ONCE() instead, to document and enforce that assumption. Ingo --
On Wed, 10 Sep 2008 14:03:10 +0200 I'm not sure you need the stackable dma_ops support. Calgary IOMMU had Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does that so that swiotlb and pci-nommu don't need the gfp hack. Clearing the gfp flags is much simpler than setting up the flags correctly mainly because of the fallback device, setting up the flags is really --
We need stackable dma_ops anyway for paravirt IOMMU support in KVM. And
Yes, dma_alloc_coherent in dma-mapping.h clears the flags. And this
function also calls ops->alloc_coherent which points to the AMD IOMMUs
alloc_coherent function if the driver is in place.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 14:48:22 +0200
Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
in tip/x86/iommu:
dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp)
{
struct dma_mapping_ops *ops = get_dma_ops(dev);
void *memory;
gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
Surely we here clear the flag but...
if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
return memory;
if (!dev) {
dev = &x86_dma_fallback_dev;
gfp |= GFP_DMA;
}
we play with it here though (not happens with pci devices),
if (!dev->dma_mask)
return NULL;
if (!ops->alloc_coherent)
return NULL;
Then dma_alloc_coherent_gfp_flags() sets it again according to
device->coherent_dma_mask and gfp before ops->alloc_coherent hook:
return ops->alloc_coherent(dev, size, dma_handle,
dma_alloc_coherent_gfp_flags(dev, gfp));
This code can set up the exact same gfp flag for swiotbl and nommu as
before.
--
On Wed, Sep 10, 2008 at 10:03:32PM +0900, FUJITA Tomonori wrote:
So its possible that alloc_coherent is called with region specifiers in
the gfp flags. Can't we simply make the gfp hacks depend on
dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
implementations?
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 15:10:32 +0200 Yes, that's how we solved the ZONE_DMA exhaustion problem in swiotlb. And we don't duplicate the gfp hack in both swiotlb and I thought about it but adding a new dma_ops->we_don't_want_gfp_flag hook doesn't make the code simpler much. Currently, we have the gfp setting hack in just one place. It's not bad. Adding such new hook means adding more lines than we can remove. Yeah, I was against your patch to adding the gfp setting hack to swiotlb but it's because gfp is kinda architecture specific stuff and swiotlb should not. It's the bad design IMO. It's ok for me that architecture specific IOMMUs can do the architecture specific stuff (and it's about just clearing the gfp flag). Intel IOMMU already clears up the flag so I'll send a patch to do that for Calgary shortly then I think that we can finish x86 alloc_coherent rewrite. --
The is_phys flas is already in place and its meaning is "the dma_ops
return bus addresses equal to physical addresses". This is exactly the
case when we need the gfp hacks. So I don't see a problem in just
skipping the gfp rewrite if is_phys is zero. I don't see a point in
adding gfp flags in dma_alloc_coherent and remove them again
dma_ops->alloc_coherent code. Specially in this case where we already
The generic swiotlb code already contained a gfp hack for IA64 and I
added another one for x86 which is ok in my opinion. But the #ifdef was
ugly, I agree with that now :-)
Your solution of removing the flag hacks from the generic code completly
was the other possible way.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 15:53:47 +0200 dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do virtual mappings for some devices and doesn't for some. We need to a hook that can pass a point to a device to IOMMUs like: dma_ops->is_phys(struct device *dev) Because they need to look at a device to know if they will do virtual mappings or not for it. --
Ok, thats a valid point. I queue your patch with the AMD IOMMU updates for 2.6.28. Thanks. Joerg --
On Wed, 10 Sep 2008 16:38:18 +0200 Ingo already has queued it his tree, I think. --
I asked Ingo to remove it from his tree to avoid merge conflicts with
updates I am currently preparing. Here is the patch I queued into my
2.6.28 update chain.
Joerg
===
commit 42a7fa8d8a7f1680cba76e4824dba30bfd1c9d70
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Wed Sep 10 20:19:40 2008 +0900
x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
need it for devices that the IOMMU can do virtual mappings for. This
patch avoids unnecessary low zone allocation.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index ae38b24..09261dc 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -1360,6 +1360,9 @@ static void *alloc_coherent(struct device *dev, size_t size,
if (!check_device(dev))
return NULL;
+ if (!get_device_resources(dev, &iommu, &domain, &devid))
+ flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
virt_addr = (void *)__get_free_pages(flag, get_order(size));
if (!virt_addr)
return 0;
@@ -1367,8 +1370,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
memset(virt_addr, 0, size);
paddr = virt_to_phys(virt_addr);
- get_device_resources(dev, &iommu, &domain, &devid);
-
if (!iommu || !domain) {
*dma_addr = (dma_addr_t)paddr;
return virt_addr;
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. ...On Thu, 11 Sep 2008 11:10:00 +0200 I see, thanks. --
On Wed, 10 Sep 2008 23:24:57 +0900 btw, in tip/x86/iommu, GART's alloc_coherent always does virtual mappings to allocate a size-aligned memory (as DMA-mapping.txt defines). Because someone strongly insisted, I modified GART's alloc_coherent to do so but as I said again and again, it's completely meaningless (only POWER IOMMU does it and drivers don't depend on such requirement). I guess that it would be better to do virtual mappings only when necessary as the current mainline does since GART I/O space is precious in some systems. But I don't care much. What's your opinion (as a AMD developer)? --
Very true. My original rewrite did the mapping only when necessary too.
What were the reasons to do the mapping always?
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 16:52:49 +0200 As I said above, it's for allocating a size-aligned memory. Look at the description of pci_alloc_consistent in DMA-mapping.txt: The cpu return address and the DMA bus master address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary. You can't do this with __get_free_pages easily (you need some hacks to do this). You can do this via iommu_area_alloc() for free. Well, actually you agreed with adding such requirement (though I said again and again that it's totally meaningless...): http://lkml.org/lkml/2008/7/24/162 --
What hacks do you need with __get_free_pages? The memory it returns is
The other possible way is removing this requirement from the
documentation. But if the spec has this requirement the code _has_ to
This message stated nothing about _always_ map GART alloc_coherent
allocations using the aperture.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 17:29:18 +0200 Is it guaranteed (documented somewhere) ? --
I don't know if there is a formal definition for it. It is documented in
some books about the Linux kernel (I read this in some book the first
time). This alignment results from the buddy algorithm the page alloctor
uses. You can definitly rely on that.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
On Wed, 10 Sep 2008 19:05:04 +0200 I meant, if it's not documented as a guaranteed feature (not just the characteristic of the current code), it could change any time. --
It will never change. I am sure about this. A lot of code relys on
this like the HugeTLBfs implementations for example. In fact, it may be
possible that the requirement to the dma_alloc_coherent function derive
from this behavior of the Linux page allocator (to not break anything
when the dma api was introduced).
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
