This is the fifth iteration of this ever growing patch set for PrimeCell DMA support, reposting the entire series. This now depend on stuff pending in Dan Williams async_tx (DMA Devices/Engine) tree and Andrew Mortons tree where the new DMA40 driver for U8500 is queued. I suggest putting these into Andrews tree now, since: A) 4 of the patches it touches MMCI code which is hanled by Andrew B) It extends the DMA40 driver which is now pending in his tree as well. C) Since there doesn't seem to be any consensus of whether this is the right way forward, it needs some wider testing I believe. OK? Yours, Linus Walleij --
Hi Linus, back online now. On Wed, Apr 7, 2010 at 4:12 PM, Linus Walleij Ok, but it looks like they do not have a build dependency on dma bits No consensus with respect to which pieces, the Primecell driver or something outside of drivers/dma? Forgive me for missing recent I can go ahead and queue up the dma bits unless you would prefer, and Andrew agrees, to take this all through the -mm tree? -- Dan --
Well: [PATCH 04/11] ARM: define the PrimeCell DMA API v5 Is independent. (One .h-file.) [PATCH 05/11] ARM: add generic PrimeCell interface to COH 901 318 v5 [PATCH 06/11] ARM: add generic PrimeCell interface to DMA40 v1 Depends on 04 AND has a merge dependency on the recent patches for generic channel control and status, and then the DMA40 driver which is now Picked into Andrews -mm. So the whole thing does depend on async_tx HEAD. Is it possible to move the two DMA40 patches over from -mm to async_tx to atleast lower the complexity a little bit? (Should be to just apply them...) If this is done, you could apply the above three patches to the async_tx tree. [PATCH 07/11] ARM: add PrimeCell generic DMA to MMCI/PL180 v5 This is where is starts to get complicated because this patch Depends on 01, 02, 03, 04. So it has to be applied to a tree which contains all of it. [PATCH 08/11] ARM: add PrimeCell generic DMA to PL011 v5 Just depends on 04 (that's the idea, a generic PrimeCell interface) so could be applied to the async_tx tree if the others go in there. [PATCH 09/11] ARM: add PrimeCell generic DMA to PL022 v5 Same thing, plus it is Acked-by: Grant and OK to merge into async_tx if 04 is there. [PATCH 10/11] ARM: config U300 PL180 PL011 PL022 for DMA v5 [PATCH 11/11] ARM: config Ux500 PL011 PL022 for DMA v1 These should go in through the ARM tree really, it's platform Well, I'd want Russell to comment on that, I think from the PrimeCell point of view it is important that the file we put in place in <linux/amba/dma.h> is something that will really be likely to a good path forward for all PrimeCell and derivates. And I really would like Russell to ACK that first, he historically watches over the PrimeCell stuff. But that said I think we're pretty solid: - Implementation for three vastly different PrimeCells - Implementation for two vastly different DMA engines If it's OK with Russell, putting 04-06 plus 09 through async_tx tree is a good ...
Russell, are you OK with pushing these patches from this series: patch 01 - <linux/amba/dma.h> patch 08 - DMA for drivers/serial/amba-pl011.c through Dan's async_tx / DMAengine tree? I think those are the ones which need your Acked-by to proceed. If you have some other idea of how these patches should be twisted around please let me know! Yours, Linus Walleij --
I do think it would be of value for someone to try to get this working on the Realview boards to ensure that these patches are well proven... unfortunately I don't have the bandwidth to do that at present. --
Hi Linus, On Wed, Apr 7, 2010 at 11:35 PM, Linus WALLEIJ Getting closer... I have pushed out the dma40 driver (v3), 4, and 6. The other patch in -mm I could take as well but that needs an ack from Russell. 5 is pending the review comment and 9 does not apply cleanly (does it depend on something in the spi tree?) -- Dan --
Nah, I'll push that in through Russells tree hopefully, it needs rebasing on OK I'm sending updated versions soon, along with a DMA40 bug fix all on top of async_tx instead. Number 9 fails since it is based on -next where all the #include <slab.h> business has taken place, I don't know how that is resolved in the end but it now includes that include and applies cleanly on async_tx. I'll keep working on getting the PL011 and PL180 DMA tested on the RealView somehow so those can also be accepted. Your, Linus Walleij --
I tested them on U300 which has an unmodified PL011 block, both with and without DMA support compiled in. I have tested the Pl180 mods on the U300 as well, it has a slightly modified PL180 block. I have no other hardware... I will try too boot it up in the QEMU emulator, it has an emulated PL011 atleast that should account for something? I don't think I understand this. I will have to try to dig out some ARM reference design from somewhere, I cannot afford one sadly. ARM Ltd. people on this list: if you can send me a versatile machine, mail me in private for post address... Yours, Linus Walleij --
So has this (which has now been applied to Dan's tree) been tested as I asked on Versatile platforms, or do we have something that could be incompatible with those platforms? I'm basically not acking or applying these patches until something along those lines has happened. (And unfortunately I don't have the resources to apply to this at present.) --
On Thu, Apr 22, 2010 at 4:00 AM, Russell King - ARM Linux Just to clarify are you nak'ing these patches for upstream inclusion until this testing occurs? Or do we just need a !ARCH_VERSATILE somewhere to allow any incompatibilities to be worked out later in-tree? I am not convinced this is the long term approach we want to follow for architecture specific extensions to dmaengine, but it is has the nice property of being minimally obtrusive and the best proposal of the moment. -- Dan --
None of the stuff you have applied is included in the objects compiled for Versatile boards. The PL022 driver probably works with Versatile but noone has tested it and it's not included in any defconfigs. What I though Russell was worried about was the PL011 and PL180 drivers which *are* in use by Versatile. So to be clear: none of the stuff that touches the Versatile platform has been applied so far. Only the U300/U8500 specific stuff has been patched in, and I'm suggesting also the PL022 driver which is currently only used by U300 and U8500 to be patched. That said I hope to bring in help, run QEMU or similar ASAP so that also the PL011 and PL180 can be cleanly applied for 2.6.35... Yours, Linus Walleij --
What I don't want to do is to get into the situation where we throw this patchset into the kernel and then find that we have to invent a whole new implementation in the various primecell drivers to support the Versatile hardware. Versatile has some MUXing on three of the DMA signals, so (eg) we really don't want UARTs claiming DMAs just because they're in existence and not in use - that would prevent DMAs from being used for (eg) AACI or MMC. The alternative is that we could just take the attitude that Versatile/ Realview will never have DMA support implemented, but that seems rather silly, as they've tended to be the first platforms I get new CPU architectures for. (This is why DMA coherency stuff on new architectures tends to be left for others to do...) --
As long as Versatile doesn't specify any filter function or data for the channel allocation function (it currently doesn't and defaults to NULL) it won't even try to call the DMA engine to allocate a channel for say the UART. There is nothing blocking some other peripheral from grabbing a muxed channel in that case. But the implementation of the DMA engine would be better of handling the muxing dynamically I believe, so when the PL011 driver (say) requests a DMA channel, it doesn't mean it requests the *physical* channel and holds it (unless the driver is very naïvely implemented) it nominally means it reserves a placeholder in the DMA engine. When the driver issues a request to perform a DMA transfer, it will pull out a physical channel and use that, then return it. If there is too much combat about the physical channels, you configure out DMA for the least wanted PrimeCells. Yours, Linus Walleij --
So what happens if we try to use DMA with the PL011 but the physical channels are already in use? From what I can see, it assumes that it always has access to the transmit channel, and there's no recovery if it doesn't. Plus if we can't get DMA for the RX path, it _permanently_ disables Three physical channels shared between: AACI Tx, AACI Rx, MMCI 0, MMCI 1, UART3 Tx, UART3 Rx. (USB and smartcard/SIM which we don't implement.) In total there's 10 valid settings for the MUX for each channel, so contention is going to happen. All you need is to load both the AACI and MMCI drivers, and if they want to use the DMA channels, you're already wanting 4 channels with only 3 available. --
OK now I get it.. the point of crux is that you need the drivers to be coded to switch seamlessly back to interrupt mode and retry with DMA on next transaction nevertheless if possible. That is definately possible with the current API, so it's nothing blocking the stuff pending in Dan's tree. However when it comes to the PL011/PL180 drivers you got me there, it surely does assume you either have the channel and can use it or else there is some permanent error on it. I'll twist these patches around a bit, it shouldn't be too hard to come up Yep, that's where it kicks in. (What's the name of this DMA controller BTW? Is that PL080?) (I read it as MMCI is bidirectional also on the Versatile, as it is on the U300.) However: this way of using the DMA dynamically instead of statically leads to the situation where a UART or two MMCs are using up the DMA channels and AACI cannot use it, and need to fall back to interrupts. Since the Audio traffic is likely to be more important, this is perhaps not so optimal, so a static assignment of DMA channels may be desired after all in a practical scenario. But I'll surely make a try to make all DMA allocation from the PrimeCells dynamic! Yours, Linus Walleij --
It's one of the standard ARM primecells, with a FPGA controlling the Such a scenario leads to two of the three channels assigned to AACI (one for playback and the other for record - remember, it's full duplex), leaving one to be shared between the UART Tx and Rx, and two MMCIs. I'd disagree with you and say that MMCI would be more important than AACI. The data rate for MMCI is far higher than AACI - and remember ARM MMCIs overflow if you don't read the data fast enough. The MMCI fmax parameter only exists to put a cap on the rate of the transfer so that the CPU can read the data fast enough in PIO mode. However, you only need DMA for MMCI if there's a card inserted in the slot. If there's no card in the slot, there's no point starving AACI of a DMA channel if that's what is being used. --
The latest patchset is now also tested on the ARM-RealView
PB11MPCore. My best friends over at Ericsson AB helped me
out by lending me their board for a short session.
See bootlog below...
- UART console comes up fine and is interactive
- MMCI card mounts and you can list and copy files
No DMA in use since the PL081 in this machine does not
have a driver yet, but no regressions in sight.
This should be similar to Versatile or Integrator.
Is this OK now Russell?
Yours,
Linus Walleij
Uncompressing Linux... done, booting the kernel.
Linux version 2.6.34-rc6-next-20100503-00033-gc482e92 (linus@fecusia) (gcc vers0
CPU: ARMv6-compatible processor [410fb020] revision 0 (ARMv7), cr=00c5387f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: ARM-RealView PB11MPCore
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
Kernel command line: root=/dev/nfs nfsroot=192.168.0.3:/export/rootfs/rootfs-ant
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 120360k/120360k available, 10712k reserved, 0K highmem
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
DMA : 0xffc00000 - 0xffe00000 ( 2 MB)
vmalloc : 0xc8800000 - 0xf8000000 ( 760 MB)
lowmem : 0xc0000000 - 0xc8000000 ( 128 MB)
modules : 0xbf000000 - 0xc0000000 ( 16 MB)
.init : 0xc0008000 - 0xc0672000 (6568 kB)
.text : 0xc0672000 - 0xc0909000 (2652 kB)
.data : 0xc0922000 - 0xc093d400 ( 109 kB)
Hierarchical RCU implementation.
NR_IRQS:128
Console: colour dummy device 80x30
Calibrating delay loop... 83.76 BogoMIPS (lpj=418816)
Mount-cache hash table entries: 512
CPU: Testing write buffer ...On Sat, May 1, 2010 at 4:04 PM, Linus Walleij Could you simulate this by publishing more struct dma_chans than are physically present, and then handle the muxing internal to the driver? Or am I misunderstanding the usage model? --
Yes exactly that way. What I had in mind atleast. Yours, Linus Walleij --
On Sat, May 1, 2010 at 3:44 PM, Russell King - ARM Linux Ok, it will be good to have this approach vetted on a challenging arch. We'll see where things stand when the merge window opens. -- Dan --
