Re: ARM: 2.6.3[45] PCI regression (IXP4xx and PXA?)

Previous thread: asm/vga.h (was: Re: drm: Add support for platform devices to register as DRM devices) by Geert Uytterhoeven on Tuesday, August 10, 2010 - 1:35 pm. (3 messages)

Next thread: [BUG] USB no longer works for APCUPS by Heinz Diehl on Tuesday, August 10, 2010 - 2:09 pm. (31 messages)
From: Krzysztof Halasa
Date: Tuesday, August 10, 2010 - 1:36 pm

Hi,

Noticed this problem a bit late :-(

6fee48cd330c68332f9712bc968d934a1a84a32a broke
pci_set_consistent_dma_mask() on IXP4xx and most probably PXA. Affected
devices are e.g. IDE controller (CS5536-based: disk inaccessible) and
e1000 ethernet ("Detected Tx Unit Hang").

The attached patch makes it work again, though I'm not sure it's the
best solution.

Ideas?

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ca32ed7..bd2a7d3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -129,6 +129,14 @@ static inline u64 dma_get_mask(struct device *dev)
 
 static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
+#ifdef CONFIG_DMABOUNCE
+	if (dev->archdata.dmabounce) {
+		if (mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+#endif
 	if (!dma_supported(dev, mask))
 		return -EIO;
 	dev->coherent_dma_mask = mask;
--

From: FUJITA Tomonori
Date: Tuesday, August 10, 2010 - 7:06 pm

On Tue, 10 Aug 2010 22:36:21 +0200


I think that we should avoid adding "#ifdef CONFIG_DMABOUNCE" to a
generic place.

--

From: Russell King - ARM Linux
Date: Wednesday, August 11, 2010 - 12:25 am

It doesn't break dmabounce.

What it breaks is the fact that a PCI device which can do 32-bit DMA is
connected to a PCI bus which can only access the first 64MB of memory
through the host bridge, but the system has more than 64MB available.

Allowing a 32-bit DMA mask means that dmabounce can't detect that memory
above 64MB needs to be bounced to memory below the 64MB boundary.
--

From: FUJITA Tomonori
Date: Thursday, August 12, 2010 - 11:23 pm

On Wed, 11 Aug 2010 08:25:32 +0100

But dmabounce doesn't look at dev->coherent_dma_mask.

The change breaks __dma_alloc_buffer()? If we set dev->coherent_dma_mask
to DMA_BIT_MASK(32) for ixp4xx's pci devices, __dma_alloc_buffer()
doesn't use GFP_DMA.
--

From: Russell King - ARM Linux
Date: Friday, August 13, 2010 - 2:54 pm

With an incorrect coherent_dma_mask, dma_alloc_coherent() will return
memory outside of the 64MB window.  This means that when dmabounce comes
to allocate the replacement buffer, it gets a buffer which won't be
accessible to the DMA controller - which is a condition it doesn't check
for.

Ergo, it breaks dmabounce.
--

From: FUJITA Tomonori
Date: Saturday, August 14, 2010 - 2:30 am

On Fri, 13 Aug 2010 22:54:13 +0100


Really? looks like dmabounce does nothing for coherent memory that
dma_alloc_coherent() allocates.

The following very hacky patch works?

Or we could introduce something like ARCH_HAS_DMA_SET_COHERENT_MASK to
let architectures to have dma_set_coherent_mask.

A long solution would be having two dma_mask for a device and a
bus. We also need something to represent a DMA-capable range instead
of the dma mask.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c704eed..2a3fc2e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -77,6 +77,11 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
+#ifdef CONFIG_DMABOUNCE
+	if (dev->archdata.dmabounce)
+		gfp |= GFP_DMA;
+#endif
+
 	page = alloc_pages(gfp, order);
 	if (!page)
 		return NULL;
--

From: Russell King - ARM Linux
Date: Saturday, August 14, 2010 - 11:46 am

So what happens if you use a driver which uses dma_alloc_coherent()
directly?  Should the driver really be passed memory which is
inaccessible to the device because its outside the host bridge PCI
window?

No, this is clearly wrong, so this patch doesn't fix anything.  It's
a bodge, nothing more.  The real solution is to have _both_ masks
both reduced down according to the host bridge, as we used to do.

So I suggest 6fee48c is reverted so that these platforms don't regress
for -rc1.

As I said when you sent the originally patch, it _looked_ okay, but I
don't have any way to test it.  It seems from testing (which
unfortunately only seems to only happen after patches hit mainline)
that it is not okay after all.
--

From: FUJITA Tomonori
Date: Saturday, August 14, 2010 - 10:42 pm

On Sat, 14 Aug 2010 19:46:05 +0100

I'm not sure what you mean.

A driver which uses dma_alloc_coherent() directly should
work. dma_alloc_coherent() allocates memory with GFP_DMA with that
patch for dmabounce devices. So the driver gets the access-able
memory.

The memory that dma_alloc_coherent() returns should be always
consistent. We can't bounce it. All we can do is returning a memory
that a device (and its bus) can access to.


Sorry, the original ARM code is wrong. dev->dma_mask and
dev->coherent_dma_mask represent the driver of dma restriction. ARM
tries to use them for both a driver and a bus so ARM needs a
workaround that doesn't set the driver of dma restriction to
dev->dma_mask and dev->coherent_dma_mask.

The proper solution is having a separate dma mask for a bus. I think
that POWERPC already does the similar. It has max_direct_dma_addr in
struct dev_archdata, which represents the dma restriction of a bus.

--

From: Russell King - ARM Linux
Date: Sunday, August 15, 2010 - 1:23 am

Why bother when we both agree that the patch is a dirty hack?

Come up with something cleaner first.
--

From: FUJITA Tomonori
Date: Sunday, August 15, 2010 - 8:55 am

On Sun, 15 Aug 2010 09:23:28 +0100

Because this fix needs to go to stable trees too. A simple patch is
better even if it's hacky.

For example, we can unify dma_needs_bounce functions in arm with a
clean solution, I think. But dma_needs_bounce() was changed after
2.6.35 so it would be difficult to backport a clean solution.

btw, will we have more like this case? If so, I think that it's worth
having a generic solution for this case instead of having the arch
(arm and powerpc) specific solution.
--

From: Krzysztof Halasa
Date: Monday, August 16, 2010 - 4:29 pm

Hi,


This patch fixes the problem on my IXP425.
-- 
Krzysztof Halasa
--

From: FUJITA Tomonori
Date: Thursday, August 19, 2010 - 1:51 am

On Tue, 17 Aug 2010 01:29:49 +0200

Thanks a lot!

I'll re-send the patch in the proper format. Can you send it to
mainline for 2.6.36?

I'll work on the proper solution for this issue for 2.6.37.
--

From: Krzysztof Halasa
Date: Thursday, August 19, 2010 - 9:56 am

Guess that's a question for Russell.
-- 
Krzysztof Halasa
--

From: Benjamin Herrenschmidt
Date: Thursday, August 19, 2010 - 3:31 am

I'd rather have the arch (aka the bus) be able to filter the mask,
better than having to deal with multiple masks in the generic code.
Besides, in embedded-land, you never know how many busses are stacked
before you reach the device, ie, you'd end up having to AND quite a few
masks before getting there in some cases.

Sounds better to establish that once, at set_coherent_dma_mask() time.

Cheers,
Ben.


--

From: FUJITA Tomonori
Date: Thursday, August 19, 2010 - 7:50 am

On Thu, 19 Aug 2010 20:31:22 +1000

You mean that you like to permit architectures to modify
dev->coherent_dma_mask behind a device? If so, I'm against it because
it means dev->coherent_dma_mask has two meanings. That's confusing.

I don't plan to have the generic code to deal with multiple masks. I
thought about simply moving max_direct_dma_addr in POWERPC's
dev_archdata to a generic place (possibly, struct
device_dma_parameters). I think that having the generic place for bus'
dma mask would be better rather than architecture specific

As long as dev->coherent_dma_mask represents the same thing on every
architecture, permitting architectures to have the own
dma_set_coherent_mask() is fine by me. I like to avoid it if possible
though.
--

From: Krzysztof Halasa
Date: Thursday, August 19, 2010 - 9:53 am

Well, I think it may be the only really correct solution, and in fact
it's arch-independent.

The coherent_dma_mask would mean one thing: address space shared between
the CPU(s) and the device.
This usually equals device's address space - only because CPU and
bridges next to it have wide (logical) address busses. It's not always
the case, though, and may be not the case on any arch.

We should make sure we got it right (including drivers), since any
reduction of the dma*mask would be irreversible (new masks would be

Definitely, if possible. BTW the dmabounce (and equivalent code on other
archs, including probably swiotlb on x86-64) could probably be merged as
well. I don't know the internals very well, though. At least it may be

Not sure. Which bus? There could be many :-)
In practice - 64-bit PCIe -> 32-bit PCI -> 24-bit ISA - etc.

That would be ideal. Buses work on all archs the same after all.
-- 
Krzysztof Halasa
--

From: FUJITA Tomonori
Date: Thursday, August 19, 2010 - 10:20 am

On Thu, 19 Aug 2010 18:53:38 +0200

linux/device.h defines that it's the device dma mask.

Except for ARM, coherent_dma_mask represents the device dma mask.

We sometimes want to know the device dma mask. Moving to your

btw, swiotlb is used by x86_64, ia64, and powerpc.

I'm sure that I can convert ARM to use it instead of dmabounce. But
well, even a bug fix patch for dmabounce haven't been merged so I'm
not sure that ARM people would accept such change.


Seems that we are not on the same page. As I said before, have you
seen how POWERPC uses max_direct_dma_addr in dev_archdata struct? I
was talking about moving it to the generic place and the API to set
it.
--

From: Benjamin Herrenschmidt
Date: Thursday, August 19, 2010 - 2:54 pm

That's just verbiage in a header file. Nobody cares. The point is, the
device DMA mask per-se is a completely useless piece of information, and
thus there is absolutely no point keeping it around there.

The only thing that matters is the intersection of all the masks on the
way to memory, which defines where memory can be allocated.

Now the mask thing itself might end up not being enough for ARM in the
long run, I wouldn't be surprised if we end up with busses that can DMA
to areas of memory that are not 0 based, but that's a discussion for

When ?

Cheers,


--

From: Benjamin Herrenschmidt
Date: Thursday, August 19, 2010 - 2:51 pm

No it's not. It has one and only one meaning which is the mask defining
where the coherent memory can come from for that device. Nobody cares if
the device can do more and has been "clipped" at set_coherent_dma_mask()
time by the bus. This is not useful information.

So I beleive the arch should hook the later and modify the mask as it

Well, max_direct_dma_addr used on powerpc is a bit nasty because it
doesn't necessarily represent a power of 2, so a mask is no good here

Ben.


--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 4:55 am

On Fri, 20 Aug 2010 07:51:52 +1000

Ok.

I've seen that someone submitted a patch to print the dma_mask under
sysfs, I supposed, for debugging to check if a driver misuses the DMA
API or the bus can't do 64bit DMA when the device can support 64bit
DMA but only gets a buffer under 32bit.

But yeah, we can live withtout it.

Lots of drivers call dma_set_coherent_mask with 64bit mask and then
call it with 32bit mask if 64bit mask fails. So we don't have driver's

OK. like dma_set_mask(), we could make every architecutre have the own
dma_set_coherent_mask(). But looks like only ARM needs own
dma_set_coherent_mask() (at least now), so adding
ARCH_HAS_DMA_SET_COHERENT_MASK define might be better. I don't like
the asymmetry of dma_set_mask and dma_set_coherent_mask much though.
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 6:54 am

On Thu, 26 Aug 2010 20:55:09 +0900

btw, I'm still not sure, letting architectures to clip the dma mask
(and coherent mask) behind a driver is correct by defintion of the DMA
API (it's not a real problem).

DMA-API.txt defines dma_set_mask is "checks to see if the mask is
possible and update the device parameters if it is". It means that
architectures can't clip the mask behind a driver, I think.

Lots of drivers do something like:

if (dma_set_mask(dev, DMA_BIT_MASK(64)))
        if (dma_set_mask(dev, DMA_BIT_MASK(32)))

What arm does is accepting whatever dma mask and setting the clipped
mask behind a driver. If we use this sementics (archs are free to clip
the mask), drivers don't need the second dma_set_mask call. And the
driver wrongly assumes that it successfully set 64bit dma mask (and
possibly set the hardware to 64bit dma mode needlessly).
--

From: Russell King - ARM Linux
Date: Thursday, August 26, 2010 - 10:57 am

Ok, in that case lets disable all PCI drivers which do this on IXP4xx
then, because they obviously can't cope with the 64MB window that this
platform has.

Clearly they need to be rewritten such that they can cope with this,
irrespective of the fact that they've worked for ages with the current
solution.
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 11:54 pm

On Thu, 26 Aug 2010 18:57:19 +0100

I didn't insist such (I wrote, "it's not a real proble"). 

As I wrote in another mail, we could make every architecutre have the own
dma_set_coherent_mask(). It's fine by me.

I simply wanted to know your opinions:

Looks like that the DMA API doesn't expect architectures to clip the
mask. It might be better to add the new API to set the dma mask in the
long term? (i.e. calling dma_set_mask twice is the best interface?)
It's better to forget it since it's only about one architecture (too
much change provides little benefit).
--

From: Krzysztof Halasa
Date: Thursday, August 26, 2010 - 9:02 am

Which seems strange to me. If the driver asks for 64-bit mask and the
system can only give it 32-bits, why return an error? Every 32-bit
address is also 64-bit, with the most significant bits simply cleared.

It makes sense the other way around, if the device wants e.g. 24-bit
mask (ISA or something) but the OS doesn't have a memory pool smaller
than e.g. 4 GB then returning with error is ok.

Same with IXP4xx (and perhaps PXA) - the device wants 32 bits and it's
fine, even if the mask is set to 28 bits since the CPU can't let PCI
access more.

If the device wants to know if it should issue DACs or something like
that, sure - just check the current masks (effective).
-- 
Krzysztof Halasa
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 5:26 pm

On Thu, 26 Aug 2010 18:02:46 +0200

As I wrote, the DMA API simply wasn't designed in that way (let
architectures to clip the mask), I guess.

If it was, dma_set_coherent_mask might return the actual mask.
--

Previous thread: asm/vga.h (was: Re: drm: Add support for platform devices to register as DRM devices) by Geert Uytterhoeven on Tuesday, August 10, 2010 - 1:35 pm. (3 messages)

Next thread: [BUG] USB no longer works for APCUPS by Heinz Diehl on Tuesday, August 10, 2010 - 2:09 pm. (31 messages)