Re: [PATCH v2] PL330: Add PL330 DMA controller driver

Previous thread: x86_64 virtual memory map by hayfeng Lee on Wednesday, March 24, 2010 - 7:20 pm. (2 messages)

Next thread: [git pull] drm fixes by Dave Airlie on Wednesday, March 24, 2010 - 8:35 pm. (3 messages)
From: Joonyoung Shim
Date: Wednesday, March 24, 2010 - 8:17 pm

The PL330 is currently the dma controller using at the S5PC1XX arm SoC.
This supports DMA_MEMCPY and DMA_SLAVE.

The datasheet for the PL330 can find below url:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0424a/DDI0424A_dmac_pl330_r0p0_trm...

Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
---
Change log:

v2: Convert into an amba_device driver.
    Code clean and update from v1 patch review.

 drivers/dma/Kconfig        |    7 +
 drivers/dma/Makefile       |    1 +
 drivers/dma/pl330_dmac.c   |  900 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/dma/pl330_dmac.h   |  175 +++++++++
 include/linux/amba/pl330.h |   64 ++++
 5 files changed, 1147 insertions(+), 0 deletions(-)
 create mode 100644 drivers/dma/pl330_dmac.c
 create mode 100644 drivers/dma/pl330_dmac.h
 create mode 100644 include/linux/amba/pl330.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index c27f80e..5989b6e 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -149,6 +149,13 @@ config AMCC_PPC440SPE_ADMA
 	help
 	  Enable support for the AMCC PPC440SPe RAID engines.
 
+config PL330_DMAC
+	bool "PrimeCell DMA Controller(PL330) support"
+	depends on ARCH_S5PC1XX
+	select DMA_ENGINE
+	help
+	  Enable support for the PrimeCell DMA Controller(PL330) support.
+
 config ARCH_HAS_ASYNC_TX_FIND_CHANNEL
 	bool
 
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 22bba3d..d98be12 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o
 obj-$(CONFIG_SH_DMAE) += shdma.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
+obj-$(CONFIG_PL330_DMAC) += pl330_dmac.o
diff --git a/drivers/dma/pl330_dmac.c b/drivers/dma/pl330_dmac.c
new file mode 100644
index 0000000..4427110
--- /dev/null
+++ b/drivers/dma/pl330_dmac.c
@@ -0,0 +1,900 @@
+/*
+ * pl330_dmac.c  --  Driver for PL330 DMA Controller
+ *
+ * Copyright (C) 2009 Samsung Electronics ...
From: jassi brar
Date: Wednesday, March 24, 2010 - 10:34 pm

On Thu, Mar 25, 2010 at 12:17 PM, Joonyoung Shim

[CC'ing Russell and Ben as stakeholders]

Dear Maintainers

I too have been writing a driver for PL330 after taking into account the
suggestions of Russell, Ben and other participants of the thread
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-February/009856.html

If you don't think this driver conflicts with the theme of the thread,
may I ask you
to please put this driver on hold until you checkout my implementation
of solution
to the issue... which should be soon.

Regards,
Jaswinder Singh
Solution-1 Group
Linux Kernel Team
System LSI Division
Samsung Electronics Co., Ltd.
--

From: Linus Walleij
Date: Thursday, March 25, 2010 - 1:30 am

Please post the code as it looks today even if it's not compiling
instead of asking others
to hold their patches back. It will be obvious from what you have if
there is some special
use you're covering. Perhaps Joonyoung can simply port over the stuff
you need to
this driver if you show your code.

Yours,
Linus Walleij
--

From: jassi brar
Date: Thursday, March 25, 2010 - 5:17 am

On Thu, Mar 25, 2010 at 5:30 PM, Linus Walleij
My approach is to write a separate PL330 core driver as the backend which
can be reused by any DMA API implementer driver. That will avoid
having two copies
of the PL330 driver, among other benefits. And if this patch is accepted, there
_will_ exist two copies of the PL330 driver -- one in drivers/dma/pl330_dmac.c
and another in arch/arm/plat-samsung/. Only the former will be lying unused
until some other SoC vendor decided to use PL330, because S3C has come too
long a way to change its drivers to driver/dma/ API and modify DMA
drivers for every SoC.

I plan something like, arch/arm/common/pl330-core.c implementing the specs in
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0424a/DDI0424A_dmac_pl330_r0p0_trm...
and drivers/dma/pl330.c implement DMA API for SoCs that chose to use it...
and arch/arm/plat-samsung/dma-pl330.c implementing regular S3C DMA API.

I don't claim to have a silver bullet, nobody has atm, but my approach
is at least
more aligned with what maintainers want.

I have the pl330-core part almost ready, but i need time to implement
some _testable_
implementation of the scheme. If maintainers want to see structure of
my code, I can
Having worked on Samsung SoCs(with PL330 DMAC) based products, I would be
_very_ surprised if any user found this implementation useful.
Let alone testing, this implementation can't even explain usability
for fast peripherals
with shallow FIFOs. I didn't give feedback for this patch because I am
not sure if this
is the right way to go at all.

regards.
--

From: Dan Williams
Date: Thursday, March 25, 2010 - 8:13 am

This is the wrong attitude.  If it were not for a simple oversight 
Joonyoung's driver would already be upstream for the past two kernel 
releases.  So you need to work together to improve that driver to 
incorporate what you need.

It sounds like you just need to add an extension for the arch specific 
dma api.  At first glance this can mimic the approach taken by 
Nobuhiro-san with the shdma driver.

--
Dan

--

From: jassi brar
Date: Thursday, March 25, 2010 - 3:27 pm

Nothing wrong in attitude here.
Giving feedback on the code only comes after one is convinced with the
overall approach taken. The last time I raised the PL330 driver issue,
most people were not enthusiastic with this drivers/dma/ approach.
I wasn't active mainline discussions when the driver was originally
submitted a few months ago.
And now my replies are not very 'polite' because theres a lot going on
I actually plan more than that.
Apart from inefficient design, JoonYoung's driver has made some fatal
assumptions
about PL330, which will result in DMA aborts if used with SoCs that implement
configuration of PL330 that is very different from Samsung SoCs'
Of course, I address all such issues that I can think of, in my implementation.

regards.
--

From: Dan Williams
Date: Thursday, March 25, 2010 - 4:12 pm

Ok, I'll rely on acked-by's from the interested parties once your
driver is out as I do not have a vested interest in this hardware,
just the dmaengine framework issues.

--
Dan
--

From: jassi brar
Date: Thursday, March 25, 2010 - 4:59 pm

Fair enough. Thank you.
--

From: Kyungmin Park
Date: Thursday, March 25, 2010 - 5:29 pm

Exactly what? We are already tested it our board and play the music well.

As your previous mail, we can wait until this weekend. but If you
don't post the your codes until this weekend.
We assume that your works are not yet done so merge this patch first
and then fix it if your words(DMA aborts issue) are right.

Thank you,
Kyungmin Park
--

From: jassi brar
Date: Thursday, March 25, 2010 - 5:48 pm

On Fri, Mar 26, 2010 at 9:29 AM, Kyungmin Park
I didn't want to get into gory details, but it seems I need to explain .....

Please see my quick review to the original patch in this thread.
It's not upto you to give me any deadline. Not the "Until this weekend" one !!!
I think only the word of the maintainers count.
--

From: Joonyoung Shim
Date: Thursday, March 25, 2010 - 5:54 pm

I can wait your implementation and wonder what is the issue also.

I welcome you try other design and want better driver is committed at 
mainline kernel too.
--

From: jassi brar
Date: Thursday, March 25, 2010 - 6:01 pm

Good to have you at last reply to my posts
--

From: Linus Walleij
Date: Thursday, March 25, 2010 - 8:20 am

Why not just post it on the list? I'm curious! Since I'm working on a PrimeCell
DMA API I would love to look at PrimeCell DMA engine drivers.

Yours,
Linus Walleij
--

From: jassi brar
Date: Thursday, March 25, 2010 - 3:36 pm

On Fri, Mar 26, 2010 at 12:20 AM, Linus Walleij
The amount of code that will be modified or taken out of drivers/dma/pl330_dmac
will so much that I will be left only with constrained data structures in
the file to do tricks to make it work with the PL330 engine driver.
I am not very keen on authoring the driver/dma/ driver but neither am I
I'll post in a day or two when the PL330 core driver takes come shape
closer to what it is supposed to look.
That will help me getting suggestions for improvement, i hope.

regards.
--

From: jassi brar
Date: Wednesday, March 31, 2010 - 10:34 pm

On Fri, Mar 26, 2010 at 12:20 AM, Linus Walleij

Here is untested but only compilable PL330 engine driver, with the following
features/bugs/limitations ...

o  The DMA API driver submits 'request' to PL330 engine.
     A request is a sequence of DMA 'xfers' to be done before the DMA
API driver
     wants to be notified.
     A 'xfer' is the finest grain of DMA transaction that specifies
how many _bytes_
     are to be move from 'source address' to 'destination address'.
     A req can be a scatter-Gather-List.

o  PL330 engine accepts requests from DMA API drivers in ping-pong manner,
    i.e, at any time maximum two reqs can be queued. Other reqs have
    to be buffered by DMA API drivers and enqueued whenever a req-finish
    callback is made.

o  Only necessary resources for a DMAC are allocated according to the
configuration
    read from DMAC during startup.
    Since MicroCode buffers are a channel's resource, they are
allocated just once
    during startup and reused for every other xfer. A channel has its
MC buff divided into
    two equal parts - one for each ping-pong request. The DMA API driver can
    specify the size of MC buffer that each channel(hence req) has,
which decides the
    maximum possible total xfer size that the channel can do in one request.

o  It is possible to do DMA request at different channel configuration
every time; this feature can
    be used by DMA API drivers for implementing 'server' channels that
are not dedicated
    to a particular client but accept requests from any client. Of
course, such channels won't
    be able to guarantee any QOS.

o  Secure, Privilege and InsnAccess modes can be specified for each request.

o  TODO: Desirable is to implement true LLI using MicroCode
modification during each
    request enqueue, so that the xfer continues even while IRQ is
handled and callbacks made.
    To me, there doesn't seem to be a way to flush ICACHE of a channel
without halting it, so we
    can't modify MicroCode in ...
From: Linus Walleij
Date: Thursday, April 1, 2010 - 4:23 pm

Hi Jassi,

this is looking good.

The only advantage of the other driver by Joonyoung is that it is finished and
ready for integration. If you finalize the DMA devices/engine API and post
this in time for the next merge window I would easily vote for including this
one rather than the other one. (Whatever that means for the world.)
Simply for technical merits.

It's sad that you two have done duplicate work but such is life..

I understand it that as this is the core engine so you intend to keep the core
in arch/arm/common/* and then a separate interface to the DMAdevices
implementing <linux/dmaengine.h> in drivers/dma/ and this is what the
"DMA API" referenced below refers to?

In that case I really like this clear separation between hardware driver
and DMA devices/engine API. And I see that the DMA API is not far
away. If you implement it you will be able to excersise this with the
DMA engine memcpy test to assure it's working.

There is nothing wrong with moving this entire thing except the header
file into drivers/dma it will be more comfortable there, with the other
DMA drivers. Whether the header should be in include/linux/amba
or include/linux/dma is however a good question for the philosophers,
but I would stick it into linux/amba with the rest of the PrimeCells.
But perhaps you have better ideas.


This hints that there is some other patch to provide that API

This is great, do you also plan to support that for M<->M xfers like we
added for the DMA40? Then we might want to lift that into the generic


True, not as elegant as being able to do it with microcode but

From the DMA API level in the PrimeCell drivers the crucial driver that
need something like this is the AMBA PL011 UART driver, RX part,
where data comes in from the outside and we have no control over
the data flow. I trigger one transfer to a buffer here, then wait for it
to complete or be interrupted. If it completes, I immediately trigger
another transfer to the second buffer before I start ...
From: jassi brar
Date: Thursday, April 1, 2010 - 6:38 pm

On Fri, Apr 2, 2010 at 8:23 AM, Linus Walleij
Thank you. I am working on it full time. This submission was solely as
a preview, the
code needs to be optimized, polished and tested, though I don't anticipate much
changes in the API. By when should I submit the final version?

Yes, drivers/dma/pl330.c could be one DMA API driver.
Another may implement S3C DMA API and reside in arch/arm/plat-samsung/ or
wherever the S3C PL080 driver is atm.
So, DMA API driver is the frontend that makes use of the
arch/arm/common/pl330.c
Yes, implementing PL330 core was supposed to be the major challenge.
DMA API drivers should be easy, though I plan to first test the pl330.c with
S3C DMA API because that way I'll have a benchmark to compare the stability
and performance of this new driver.
Of course, I'll like to see driver/dma driver as well, but maybe JoonYoung wants
For platforms that choose to implement their own DMA API (how good or bad is a
different subject), arch/arm/common/ seems be more appropriate for this pl330.c
Right, and that is what I call Client/DMA-API driver. Though the patch is not
I hope the driver is efficient/fast enough for some tough test cases I have,
otherwise I might have to modify or add to the API to implement this
If I get it right, that is common issue with any 'slave-receiver' type device
and it might do good to have timeout option support in DMA API for such receive
requests. That is provide whatever data is collected within some
amount of time,
Even with this implementation, for PAUSE, the DMA API driver can call
pl330_chan_status, saving remaining requests locally and calling
That is mostly to self. It just means that every variable in the block
must be analyzed before making any change there.

I think it is considered good practice to export every symbol of an API.

As I said, this driver is designed to be usable with any DMA API, and
some, like S3C,
The check is there just for some robustness.

yes I was already thinking on similar lines. I'll merge them ...
From: Kyungmin Park
Date: Saturday, April 17, 2010 - 12:06 am

Hi,

If there's no comments. it's time to select to merge it.
I think it can't which is better. so I hope community select any one
Only consideration which is proper for linux and future usage

To Linux Walleij.
I hope it will merge your patch series together.

Thank you,
Kyungmin Park

--

From: jassi brar
Date: Sunday, April 18, 2010 - 6:14 pm

As i said, I am working full time on it.
PL330 core and S3C DMA API driver is done and it'll be a day or two
before I am done testing them.
So, around this wednesday I plan to submit patches for PL330 core driver
and the S3C-DMA-API driver for S5P-6442, C1XX and V210
Once the PL330 core is accepted, we can freeze it's API and start work on
DMA engine API driver for it.
--

From: Marc Zyngier
Date: Wednesday, March 24, 2010 - 10:44 pm

On Thu, 25 Mar 2010 12:17:15 +0900

desc_pool_virt is a virtual address (from dma_alloc_coherent). In such
case, write[bwl] seems to be the wrong interface. I suggest the
following code:



	M.
-- 
I'm the slime oozin' out from your TV set...
--

From: Joonyoung Shim
Date: Thursday, March 25, 2010 - 2:01 am

Right. The write[bwl] is api for address ioremapped of io device. I will

PL330 DMA controller fetches variable length instructions that consist of

--

From: Marc Zyngier
Date: Thursday, March 25, 2010 - 2:32 am

On Thu, 25 Mar 2010 18:01:00 +0900, Joonyoung Shim

I'm not too concerned about the device side of things. I'm more worried
about the CPU access when writing the 'imm' value to memory.

Consider desc_pool_virt 16bit aligned when entering the function. Writing
the opcode makes it unaligned and then writing the 'imm' value will result
as an unaligned access.

        M.
-- 
Who you jivin' with that Cosmik Debris?
--

From: Joonyoung Shim
Date: Thursday, March 25, 2010 - 3:05 am

Why desc_pool_virt should be aligned more than 16bit?
--

From: Marc Zyngier
Date: Thursday, March 25, 2010 - 3:32 am

On Thu, 25 Mar 2010 19:05:47 +0900, Joonyoung Shim

There is reason for desc_pool_virt to be 16bit aligned. It's just that you
have 50% chance that it will.
In such case, you will write 'imm' to a non 16bit-aligned address. In my
book, that's bad.

Same for pl330_dmamov(), which tries to write a 32bit value without
checking the proper alignment.
In such case, please use the put_unaligned macro to handle the possible
unaligned access.

        M.
-- 
Who you jivin' with that Cosmik Debris?
--

From: Joonyoung Shim
Date: Thursday, March 25, 2010 - 4:48 am

OK. i will use put_unaligned.
Thanks.
--

From: Linus Walleij
Date: Thursday, March 25, 2010 - 1:26 am

Looks good to me now so:
Acked-by: Linus Walleij <linus.walleij@stericsson.com>

Linus Walleij
--

From: jassi brar
Date: Thursday, March 25, 2010 - 7:08 pm

On Thu, Mar 25, 2010 at 12:17 PM, Joonyoung Shim

Here goes some quick technical feedback of the code.
Please remember that these issues are only secondary.
The primary drawback is the approach that this patch adopts,
as already explained in other posts.

As already mentioned by Marc, it doesn't have to be read/write.
PL330 specifies the microcode buffers to be on system memory and that
need not be treated like ioports.

These allocations are inefficient and don't need to be done so often.
My implementation allocates a pool of such buffers(size specified by
DMA API driver)
and manage them by simple pointer manipulation.

This limit, though not serious, is unconditionally imposed by your design.
There are ways to get around this situation by smarter generation of
Again, the DMA API drivers will suffer just because someone didn't care
to generate microcode efficiently.
The microcode template for xfer takes only about 50 bytes and despite
having PL330_POOL_SIZE buffer, you have to drop xfer requests just because
the template is not properly designed.
My implementation is limited only by the microcode buffer size, which in turn
This instruction generation leaves no scope for Security permissions for xfers,
that is a feature of PL330.

I think the DMA API already prohibits doing non-irq-context things(like enqueue)
in the callbacks, so why implement tasklets here?
This may still get you "audio working fine" with Samsung I2S controller,
but is likely to cause problems with more demanding peripherals like SPDIF
if they operate at best QOS(even 24bits/sample Stereo at 96000Hz) and has
shallow FIFO(8 samples deep and hence 84 usecs acceptable latency).
Remember that SPDIF usually goes with other system load like HDMI HD
playaback which only increases the interrupt latency.

Not to forget, the overall throughput hit taken by other dma clients,
like MMC over SPI that use 256/512 bytes DMA xfers, due to delayed
DMA-DONE notifications.

Also, using tasklet here may break any ...
From: Ben Dooks
Date: Tuesday, March 30, 2010 - 6:07 pm

-- 
-- 
Ben

Q:      What's a light-year?
A:      One-third less calories than a regular year.

--

From: jassi brar
Date: Tuesday, March 30, 2010 - 6:40 pm

I meant during 'probe' of the DMAC a chunk of dma consistent memory is allocated
for MicroCode for each channel.
We use the same chunk during xfers, since we can generate MC in a way that 256
bytes are enough to do xfer of 2.5MB at burst size of 1 byte and for
bulkier requests
the DMA API driver can either break the bigger request or allocate
bigger chunk for
the channels.
--

Previous thread: x86_64 virtual memory map by hayfeng Lee on Wednesday, March 24, 2010 - 7:20 pm. (2 messages)

Next thread: [git pull] drm fixes by Dave Airlie on Wednesday, March 24, 2010 - 8:35 pm. (3 messages)