This is for tip/iommu.
btw, Joery, what happens if map_sg (or map_single) gets
not-DMA-capable buffer in the case of !iommu || !domain?==
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherentx86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
need it for devices that the IOMMU can do virtual mappings for. This
patch avoids unnecessary low zone allocation.Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
arch/x86/kernel/amd_iommu.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 01c68c3..8efd249 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -999,6 +999,11 @@ static void *alloc_coherent(struct device *dev, size_t size,
u16 devid;
phys_addr_t paddr;+ get_device_resources(dev, &iommu, &domain, &devid);
+
+ if (iommu && domain)
+ flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
virt_addr = (void *)__get_free_pages(flag, get_order(size));
if (!virt_addr)
return 0;
@@ -1006,8 +1011,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
memset(virt_addr, 0, size);
paddr = virt_to_phys(virt_addr);- get_device_resources(dev, &iommu, &domain, &devid);
-
if (!iommu || !domain) {
*dma_addr = (dma_addr_t)paddr;
return virt_addr;
--
1.5.4.2--
It blows up :-)
I know it but currently it is no problem because this will never happen
in any real AMD IOMMU system (because the hardware just remaps every
device in the system). It needs a fix anyway and the
right solution here is to fall back to one of the software iommu
implementations. The stackable dma_ops patches I have currently in workThese flags are already removed in the dma_alloc_coherent function which
calls this one. Further I think in the case of a remapping IOMMU like
this one we should avoid implementing any gfp hacks and just trust the--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 14:03:10 +0200
I'm not sure you need the stackable dma_ops support. Calgary IOMMU had
Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does
that so that swiotlb and pci-nommu don't need the gfp hack. Clearing
the gfp flags is much simpler than setting up the flags correctly
mainly because of the fallback device, setting up the flags is really
--
We need stackable dma_ops anyway for paravirt IOMMU support in KVM. And
Yes, dma_alloc_coherent in dma-mapping.h clears the flags. And this
function also calls ops->alloc_coherent which points to the AMD IOMMUs
alloc_coherent function if the driver is in place.Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 14:48:22 +0200
Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
in tip/x86/iommu:dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp)
{
struct dma_mapping_ops *ops = get_dma_ops(dev);
void *memory;gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
Surely we here clear the flag but...
if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
return memory;if (!dev) {
dev = &x86_dma_fallback_dev;
gfp |= GFP_DMA;
}we play with it here though (not happens with pci devices),
if (!dev->dma_mask)
return NULL;if (!ops->alloc_coherent)
return NULL;Then dma_alloc_coherent_gfp_flags() sets it again according to
device->coherent_dma_mask and gfp before ops->alloc_coherent hook:return ops->alloc_coherent(dev, size, dma_handle,
dma_alloc_coherent_gfp_flags(dev, gfp));This code can set up the exact same gfp flag for swiotbl and nommu as
before.
--
On Wed, Sep 10, 2008 at 10:03:32PM +0900, FUJITA Tomonori wrote:
So its possible that alloc_coherent is called with region specifiers in
the gfp flags. Can't we simply make the gfp hacks depend on
dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
implementations?Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 15:10:32 +0200
Yes, that's how we solved the ZONE_DMA exhaustion problem in
swiotlb. And we don't duplicate the gfp hack in both swiotlb andI thought about it but adding a new dma_ops->we_don't_want_gfp_flag
hook doesn't make the code simpler much. Currently, we have the gfp
setting hack in just one place. It's not bad. Adding such new hook
means adding more lines than we can remove.Yeah, I was against your patch to adding the gfp setting hack to
swiotlb but it's because gfp is kinda architecture specific stuff and
swiotlb should not. It's the bad design IMO. It's ok for me that
architecture specific IOMMUs can do the architecture specific stuff
(and it's about just clearing the gfp flag).Intel IOMMU already clears up the flag so I'll send a patch to do that
for Calgary shortly then I think that we can finish x86 alloc_coherent
rewrite.
--
The is_phys flas is already in place and its meaning is "the dma_ops
return bus addresses equal to physical addresses". This is exactly the
case when we need the gfp hacks. So I don't see a problem in just
skipping the gfp rewrite if is_phys is zero. I don't see a point in
adding gfp flags in dma_alloc_coherent and remove them again
dma_ops->alloc_coherent code. Specially in this case where we alreadyThe generic swiotlb code already contained a gfp hack for IA64 and I
added another one for x86 which is ok in my opinion. But the #ifdef was
ugly, I agree with that now :-)
Your solution of removing the flag hacks from the generic code completly
was the other possible way.Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 15:53:47 +0200
dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do
virtual mappings for some devices and doesn't for some.We need to a hook that can pass a point to a device to IOMMUs like:
dma_ops->is_phys(struct device *dev)
Because they need to look at a device to know if they will do virtual
mappings or not for it.
--
On Wed, 10 Sep 2008 23:24:57 +0900
btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
mappings to allocate a size-aligned memory (as DMA-mapping.txt
defines).Because someone strongly insisted, I modified GART's alloc_coherent to
do so but as I said again and again, it's completely meaningless (only
POWER IOMMU does it and drivers don't depend on such requirement).I guess that it would be better to do virtual mappings only when
necessary as the current mainline does since GART I/O space is
precious in some systems. But I don't care much. What's your opinion
(as a AMD developer)?
--
Very true. My original rewrite did the mapping only when necessary too.
What were the reasons to do the mapping always?Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 16:52:49 +0200
As I said above, it's for allocating a size-aligned memory. Look at
the description of pci_alloc_consistent in DMA-mapping.txt:The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size. This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.You can't do this with __get_free_pages easily (you need some hacks to
do this). You can do this via iommu_area_alloc() for free.Well, actually you agreed with adding such requirement (though I said
again and again that it's totally meaningless...):http://lkml.org/lkml/2008/7/24/162
--
What hacks do you need with __get_free_pages? The memory it returns is
The other possible way is removing this requirement from the
documentation. But if the spec has this requirement the code _has_ toThis message stated nothing about _always_ map GART alloc_coherent
allocations using the aperture.Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 17:29:18 +0200
Is it guaranteed (documented somewhere) ?
--
I don't know if there is a formal definition for it. It is documented in
some books about the Linux kernel (I read this in some book the first
time). This alignment results from the buddy algorithm the page alloctor
uses. You can definitly rely on that.Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
On Wed, 10 Sep 2008 19:05:04 +0200
I meant, if it's not documented as a guaranteed feature (not just the
characteristic of the current code), it could change any time.
--
It will never change. I am sure about this. A lot of code relys on
this like the HugeTLBfs implementations for example. In fact, it may be
possible that the requirement to the dma_alloc_coherent function derive
from this behavior of the Linux page allocator (to not break anything
when the dma api was introduced).Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy--
Ok, thats a valid point. I queue your patch with the AMD IOMMU updates
for 2.6.28. Thanks.Joerg
--
On Wed, 10 Sep 2008 16:38:18 +0200
Ingo already has queued it his tree, I think.
--
I asked Ingo to remove it from his tree to avoid merge conflicts with
updates I am currently preparing. Here is the patch I queued into my
2.6.28 update chain.Joerg
===
commit 42a7fa8d8a7f1680cba76e4824dba30bfd1c9d70
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Wed Sep 10 20:19:40 2008 +0900x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
need it for devices that the IOMMU can do virtual mappings for. This
patch avoids unnecessary low zone allocation.Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index ae38b24..09261dc 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -1360,6 +1360,9 @@ static void *alloc_coherent(struct device *dev, size_t size,
if (!check_device(dev))
return NULL;+ if (!get_device_resources(dev, &iommu, &domain, &devid))
+ flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
virt_addr = (void *)__get_free_pages(flag, get_order(size));
if (!virt_addr)
return 0;
@@ -1367,8 +1370,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
memset(virt_addr, 0, size);
paddr = virt_to_phys(virt_addr);- get_device_resources(dev, &iommu, &domain, &devid);
-
if (!iommu || !domain) {
*dma_addr = (dma_addr_t)paddr;
return virt_addr;--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
...
On Thu, 11 Sep 2008 11:10:00 +0200
I see, thanks.
--
or add a WARN_ON_ONCE() instead, to document and enforce that
assumption.Ingo
--
applied these commits to tip/x86/iommu:
edee956: x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
9821626: x86: convert dma_alloc_coherent to use is_device_dma_capable
8a493d3: x86: remove duplicated extern force_iommuthanks!
Ingo
--
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| David Miller | Slow DOWN, please!!! |
| Michael Smith | gettimeofday() jumping into the future |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
git: | |
| Sander | 'struct task_struct' has no member named 'mems_allowed' (was: Re: 2.6.20-rc4-mm1) |
