RE: [PATCH 00/11] ARM: PrimeCell DMA Interface v5

Previous thread: none

Next thread: [PATCH 01/11] ARM: MMCI: support 8bit mode on the ST Micro version by Linus Walleij on Wednesday, April 7, 2010 - 4:12 pm. (1 message)
From: Linus Walleij
Date: Wednesday, April 7, 2010 - 4:12 pm

This is the fifth iteration of this ever growing patch set
for PrimeCell DMA support, reposting the entire series.

This now depend on stuff pending in Dan Williams async_tx
(DMA Devices/Engine) tree and Andrew Mortons tree where
the new DMA40 driver for U8500 is queued.

I suggest putting these into Andrews tree now, since:

A) 4 of the patches it touches MMCI code which is hanled
   by Andrew

B) It extends the DMA40 driver which is now pending in
   his tree as well.

C) Since there doesn't seem to be any consensus of whether
   this is the right way forward, it needs some wider
   testing I believe.

OK?

Yours,
Linus Walleij
--

From: Dan Williams
Date: Wednesday, April 7, 2010 - 4:45 pm

Hi Linus, back online now.

On Wed, Apr 7, 2010 at 4:12 PM, Linus Walleij

Ok, but it looks like they do not have a build dependency on dma bits

No consensus with respect to which pieces, the Primecell driver or
something outside of drivers/dma?  Forgive me for missing recent

I can go ahead and queue up the dma bits unless you would prefer, and
Andrew agrees, to take this all through the -mm tree?

--
Dan
--

From: Linus WALLEIJ
Date: Wednesday, April 7, 2010 - 11:35 pm

Well:

[PATCH 04/11] ARM: define the PrimeCell DMA API v5
Is independent. (One .h-file.)

[PATCH 05/11] ARM: add generic PrimeCell interface to COH 901 318 v5
[PATCH 06/11] ARM: add generic PrimeCell interface to DMA40 v1
Depends on 04 AND has a merge dependency on the recent patches for generic
channel control and status, and then the DMA40 driver which is now
Picked into Andrews -mm. So the whole thing does depend on async_tx
HEAD.

Is it possible to move the two DMA40 patches over from -mm to async_tx
to atleast lower the complexity a little bit? (Should be to just
apply them...)

If this is done, you could apply the above three patches to the
async_tx tree.

[PATCH 07/11] ARM: add PrimeCell generic DMA to MMCI/PL180 v5
This is where is starts to get complicated because this patch
Depends on 01, 02, 03, 04. So it has to be applied to a tree
which contains all of it.

[PATCH 08/11] ARM: add PrimeCell generic DMA to PL011 v5
Just depends on 04 (that's the idea, a generic PrimeCell interface)
so could be applied to the async_tx tree
if the others go in there.

[PATCH 09/11] ARM: add PrimeCell generic DMA to PL022 v5
Same thing, plus it is Acked-by: Grant and OK to merge into
async_tx if 04 is there.

[PATCH 10/11] ARM: config U300 PL180 PL011 PL022 for DMA v5
[PATCH 11/11] ARM: config Ux500 PL011 PL022 for DMA v1
These should go in through the ARM tree really, it's platform

Well, I'd want Russell to comment on that, I think from the
PrimeCell point of view it is important that the file we put
in place in <linux/amba/dma.h> is something that will really
be likely to a good path forward for all PrimeCell and derivates.
And I really would like Russell to ACK that first, he historically
watches over the PrimeCell stuff.

But that said I think we're pretty solid:
- Implementation for three vastly different PrimeCells
- Implementation for two vastly different DMA engines

If it's OK with Russell, putting 04-06 plus 09 through async_tx 
tree is a good ...
From: Linus Walleij
Date: Sunday, April 11, 2010 - 7:13 am

Russell,

are you OK with pushing these patches from this series:

patch 01 - <linux/amba/dma.h>
patch 08 - DMA for drivers/serial/amba-pl011.c

through Dan's async_tx / DMAengine tree?

I think those are the ones which need your Acked-by to proceed.

If you have some other idea of how these patches should be twisted
around please let me know!

Yours,
Linus Walleij
--

From: Russell King - ARM Linux
Date: Monday, April 12, 2010 - 12:23 pm

I do think it would be of value for someone to try to get this working
on the Realview boards to ensure that these patches are well proven...
unfortunately I don't have the bandwidth to do that at present.
--

From: Dan Williams
Date: Wednesday, April 14, 2010 - 6:15 pm

Hi Linus,

On Wed, Apr 7, 2010 at 11:35 PM, Linus WALLEIJ

Getting closer... I have pushed out the dma40 driver (v3), 4, and 6.
The other patch in -mm I could take as well but that needs an ack from
Russell.

5 is pending the review comment and 9 does not apply cleanly (does it
depend on something in the spi tree?)

--
Dan
--

From: Linus Walleij
Date: Friday, April 16, 2010 - 9:58 pm

Nah, I'll push that in through Russells tree hopefully, it needs rebasing on

OK I'm sending updated versions soon, along with a DMA40 bug fix
all on top of async_tx instead.

Number 9 fails since it is based on -next where all the #include <slab.h>
business has taken place, I don't know how that is resolved in the end
but it now includes that include and applies cleanly on async_tx.

I'll keep working on getting the PL011 and PL180 DMA tested on the
RealView somehow so those can also be accepted.

Your,
Linus Walleij
--

From: Linus Walleij
Date: Friday, April 30, 2010 - 11:30 am

I tested them on U300 which has an unmodified PL011 block, both with
and without DMA support compiled in. I have tested the Pl180 mods
on the U300 as well, it has a slightly modified PL180 block.
I have no other hardware...

I will try too boot it up in the QEMU emulator, it has an emulated
PL011 atleast that should account for something? I don't think

I understand this. I will have to try to dig out some ARM reference
design from somewhere, I cannot afford one sadly.

ARM Ltd. people on this list: if you can send me a versatile
machine, mail me in private for post address...

Yours,
Linus Walleij
--

From: Russell King - ARM Linux
Date: Thursday, April 22, 2010 - 4:00 am

So has this (which has now been applied to Dan's tree) been tested
as I asked on Versatile platforms, or do we have something that could
be incompatible with those platforms?

I'm basically not acking or applying these patches until something
along those lines has happened.  (And unfortunately I don't have the
resources to apply to this at present.)
--

From: Dan Williams
Date: Saturday, May 1, 2010 - 3:00 pm

On Thu, Apr 22, 2010 at 4:00 AM, Russell King - ARM Linux

Just to clarify are you nak'ing these patches for upstream inclusion
until this testing occurs?  Or do we just need a !ARCH_VERSATILE
somewhere to allow any incompatibilities to be worked out later
in-tree?

I am not convinced this is the long term approach we want to follow
for architecture specific extensions to dmaengine, but it is has the
nice property of being minimally obtrusive and the best proposal of
the moment.

--
Dan
--

From: Linus Walleij
Date: Saturday, May 1, 2010 - 3:27 pm

None of the stuff you have applied is included in the objects compiled
for Versatile boards. The PL022 driver probably works with Versatile
but noone has tested it and it's not included in any defconfigs.

What I though Russell was worried about was the PL011 and PL180
drivers which *are* in use by Versatile.

So to be clear: none of the stuff that touches the Versatile platform
has been applied so far. Only the U300/U8500 specific stuff has
been patched in, and I'm suggesting also the PL022 driver which
is currently only used by U300 and U8500 to be patched.

That said I hope to bring in help, run QEMU or similar ASAP
so that also the PL011 and PL180 can be cleanly applied for
2.6.35...

Yours,
Linus Walleij
--

From: Russell King - ARM Linux
Date: Saturday, May 1, 2010 - 3:44 pm

What I don't want to do is to get into the situation where we throw
this patchset into the kernel and then find that we have to invent a
whole new implementation in the various primecell drivers to support
the Versatile hardware.

Versatile has some MUXing on three of the DMA signals, so (eg) we
really don't want UARTs claiming DMAs just because they're in existence
and not in use - that would prevent DMAs from being used for (eg) AACI
or MMC.

The alternative is that we could just take the attitude that Versatile/
Realview will never have DMA support implemented, but that seems rather
silly, as they've tended to be the first platforms I get new CPU
architectures for.  (This is why DMA coherency stuff on new architectures
tends to be left for others to do...)
--

From: Linus Walleij
Date: Saturday, May 1, 2010 - 4:04 pm

As long as Versatile doesn't specify any filter function or
data for the channel allocation function (it currently doesn't and defaults
to NULL) it won't even try to call the DMA engine to allocate a channel
for say the UART.

There is nothing blocking some other peripheral from grabbing a
muxed channel in that case.

But the implementation of the DMA engine would be better of
handling the muxing dynamically I believe, so when the PL011
driver (say) requests a DMA channel, it doesn't mean it requests the
*physical* channel and holds it (unless the driver is very naïvely
implemented) it nominally means it reserves a placeholder in the
DMA engine.

When the driver issues a request to perform a DMA transfer, it will pull
out a physical channel and use that, then return it. If there is too
much combat about the physical channels, you configure out DMA
for the least wanted PrimeCells.

Yours,
Linus Walleij
--

From: Russell King - ARM Linux
Date: Saturday, May 1, 2010 - 4:28 pm

So what happens if we try to use DMA with the PL011 but the physical
channels are already in use?  From what I can see, it assumes that it
always has access to the transmit channel, and there's no recovery if
it doesn't.

Plus if we can't get DMA for the RX path, it _permanently_ disables

Three physical channels shared between: AACI Tx, AACI Rx, MMCI 0, MMCI 1,
UART3 Tx, UART3 Rx.  (USB and smartcard/SIM which we don't implement.)
In total there's 10 valid settings for the MUX for each channel, so
contention is going to happen.  All you need is to load both the AACI
and MMCI drivers, and if they want to use the DMA channels, you're
already wanting 4 channels with only 3 available.
--

From: Linus Walleij
Date: Saturday, May 1, 2010 - 5:21 pm

OK now I get it.. the point of crux is that you need the drivers to be
coded to switch seamlessly back to interrupt mode and retry with
DMA on next transaction nevertheless if possible.

That is definately possible with the current API, so it's nothing blocking
the stuff pending in Dan's tree.

However when it comes to the PL011/PL180 drivers you got me there,
it surely does assume you either have the channel and can use it
or else there is some permanent error on it.

I'll twist these patches around a bit, it shouldn't be too hard to come up

Yep, that's where it kicks in. (What's the name of this DMA controller
BTW? Is that PL080?)

(I read it as MMCI is bidirectional also on the Versatile, as it is
on the U300.)

However: this way of using the DMA dynamically instead of statically
leads to the situation where a UART or two MMCs are using up the
DMA channels and AACI cannot use it, and need to fall back to
interrupts. Since the Audio traffic is likely to be more important, this
is perhaps not so optimal, so a static assignment of DMA channels
may be desired after all in a practical scenario.

But I'll surely make a try to make all DMA allocation from the
PrimeCells dynamic!

Yours,
Linus Walleij
--

From: Russell King - ARM Linux
Date: Sunday, May 2, 2010 - 1:18 am

It's one of the standard ARM primecells, with a FPGA controlling the


Such a scenario leads to two of the three channels assigned to AACI
(one for playback and the other for record - remember, it's full duplex),
leaving one to be shared between the UART Tx and Rx, and two MMCIs.

I'd disagree with you and say that MMCI would be more important than
AACI.  The data rate for MMCI is far higher than AACI - and remember
ARM MMCIs overflow if you don't read the data fast enough.  The MMCI
fmax parameter only exists to put a cap on the rate of the transfer
so that the CPU can read the data fast enough in PIO mode.

However, you only need DMA for MMCI if there's a card inserted in the
slot.  If there's no card in the slot, there's no point starving AACI
of a DMA channel if that's what is being used.
--

From: Linus WALLEIJ
Date: Tuesday, May 4, 2010 - 6:05 am

The latest patchset is now also tested on the ARM-RealView
PB11MPCore. My best friends over at Ericsson AB helped me
out by lending me their board for a short session.

See bootlog below...

- UART console comes up fine and is interactive
- MMCI card mounts and you can list and copy files

No DMA in use since the PL081 in this machine does not
have a driver yet, but no regressions in sight.

This should be similar to Versatile or Integrator.

Is this OK now Russell?

Yours,
Linus Walleij


Uncompressing Linux... done, booting the kernel.
Linux version 2.6.34-rc6-next-20100503-00033-gc482e92 (linus@fecusia) (gcc vers0
CPU: ARMv6-compatible processor [410fb020] revision 0 (ARMv7), cr=00c5387f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: ARM-RealView PB11MPCore
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: root=/dev/nfs nfsroot=192.168.0.3:/export/rootfs/rootfs-ant
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 120360k/120360k available, 10712k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xffc00000 - 0xffe00000   (   2 MB)
    vmalloc : 0xc8800000 - 0xf8000000   ( 760 MB)
    lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .init : 0xc0008000 - 0xc0672000   (6568 kB)
      .text : 0xc0672000 - 0xc0909000   (2652 kB)
      .data : 0xc0922000 - 0xc093d400   ( 109 kB)
Hierarchical RCU implementation.
NR_IRQS:128
Console: colour dummy device 80x30
Calibrating delay loop... 83.76 BogoMIPS (lpj=418816)
Mount-cache hash table entries: 512
CPU: Testing write buffer ...
From: Dan Williams
Date: Saturday, May 1, 2010 - 4:28 pm

On Sat, May 1, 2010 at 4:04 PM, Linus Walleij

Could you simulate this by publishing more struct dma_chans than are
physically present, and then handle the muxing internal to the driver?
 Or am I misunderstanding the usage model?
--

From: Linus Walleij
Date: Saturday, May 1, 2010 - 4:48 pm

Yes exactly that way. What I had in mind atleast.

Yours,
Linus Walleij
--

From: Dan Williams
Date: Saturday, May 1, 2010 - 4:25 pm

On Sat, May 1, 2010 at 3:44 PM, Russell King - ARM Linux

Ok, it will be good to have this approach vetted on a challenging
arch.  We'll see where things stand when the merge window opens.

--
Dan
--

Previous thread: none

Next thread: [PATCH 01/11] ARM: MMCI: support 8bit mode on the ST Micro version by Linus Walleij on Wednesday, April 7, 2010 - 4:12 pm. (1 message)