Re: [RFC][PATCH] add dma_reserve_coherent_memory()/dma_free_reserved_memory() API

Previous thread: Which kernel.org WWW is the master server? by Piotr Hosowicz on Thursday, August 19, 2010 - 8:06 am. (5 messages)

Next thread: [PATCH] oom: __task_cred() need rcu_read_lock() by David Howells on Thursday, August 19, 2010 - 8:26 am. (4 messages)
From: Marin Mitov
Date: Thursday, August 19, 2010 - 8:18 am

Hi all,

struct device contains a member: struct dma_coherent_mem *dma_mem;
to hold information for a piece of memory declared dma-coherent.
Alternatively the same member could also be used to hold preallocated
dma-coherent memory for latter per-device use.

This tric is already used in drivers/staging/dt3155v4l.c
dt3155_alloc_coherent()/dt3155_free_coherent()

Here proposed for general use by popular demand from video4linux folks.
Helps for videobuf-dma-contig framework.

Signed-off-by: Marin Mitov <mitov@issp.bas.bg>

======================================================================
--- a/drivers/base/dma-coherent.c	2010-08-19 15:50:42.000000000 +0300
+++ b/drivers/base/dma-coherent.c	2010-08-19 17:27:56.000000000 +0300
@@ -93,6 +93,83 @@ void *dma_mark_declared_memory_occupied(
 EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 
 /**
+ * dma_reserve_coherent_memory() - reserve coherent memory for per-device use
+ *
+ * @dev:	device from which we allocate memory
+ * @size:	size of requested memory area in bytes
+ * @flags:	same as in dma_declare_coherent_memory()
+ *
+ * This function reserves coherent memory allocating it early (during probe())
+ * to support latter allocations from per-device coherent memory pools.
+ * For a given device one could use either dma_declare_coherent_memory() or
+ * dma_reserve_coherent_memory(), but not both, becase the result of these
+ * functions is stored in a single struct device member - dma_mem
+ *
+ * Returns DMA_MEMORY_MAP on success, or 0 if failed.
+ * (same as dma_declare_coherent_memory()
+ */
+int dma_reserve_coherent_memory(struct device *dev, size_t size, int flags)
+{
+	struct dma_coherent_mem *mem;
+	dma_addr_t dev_base;
+	int pages = size >> PAGE_SHIFT;
+	int bitmap_size = BITS_TO_LONGS(pages) * sizeof(long);
+
+	if ((flags & DMA_MEMORY_MAP) == 0)
+		goto out;
+	if (!size)
+		goto out;
+	if (dev->dma_mem)
+		goto out;
+
+	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	if (!mem)
+		goto ...
From: FUJITA Tomonori
Date: Friday, August 20, 2010 - 12:17 am

On Thu, 19 Aug 2010 18:18:35 +0300

I think that drivers/base/dma-coherent.c is for architectures to
implement dma_alloc_coherent(). So using it for drivers doesn't look

What you guys exactly want to do? If you just want to pre-allocate
coherent memory for latter usage, why dma_pool API (mm/dmapool.c)
doesn't work?
--

From: Marin Mitov
Date: Friday, August 20, 2010 - 1:13 am

It depends. Imagine your frame grabber has built-in RAM buffer on board
just as the frame buffer RAM on graphics cards, defined in BAR. You can use
dma_declare_coherent_memory()/dma_release_declared_memory() in
your driver and then use dma_alloc_coherent()/dma_free_coherent()
to allocate dma buffers from it and falling back transparently to system RAM

Yes, just to preallocate not coherent, but rather contiguous memory for latter usage.

I do not know why dma_pool API doesn't work for frame grabber buffers.
May be they are too big ~400KB. I have tried dma_pool APIs without success 
some time ago, so I had to find some other way to solve my problem leading to 
the proposed dma_reserve_coherent_memory()/dma_free_reserved_memory().

Thanks.

Marin Mitov


--

From: FUJITA Tomonori
Date: Friday, August 20, 2010 - 1:35 am

On Fri, 20 Aug 2010 11:13:45 +0300

Hmm, you don't care about coherency? You just need contiguous memory?

Then, I prefer to invent the API to allocate contiguous

I think that dma_pool API is for small coherent memory (smaller than
PAGE_SIZE) so it might not work for you. However, the purpose of
dma_pool API is exactly for what you want to do, creating a pool for
coherent memory per device for drivers.

I don't see any reason why we can't extend the dma_pool API for your
case. And it looks better to me rather than inventing the new API.
--

From: Marin Mitov
Date: Friday, August 20, 2010 - 4:50 am

Yes. We just need contiguous memory. Coherency is important as far as when dma
transfer finishes user land is able to see the new data. Could be done by something like

Sure, but in any case videobuf-dma-contig framework in drivers/media/video
is already built around dma-coherent (nevertheless it is precious), so the two new
functions are just a helpful extension to the existing use of dma-coherent memory.

In any case, as far as these two functions will be mainly used by media/video folks
they could be added not to the drivers/base/dma-coherent.c (where I see their place),
but to drivers/media/video/videobuf-dma-contig.c. In that case the disadvantage will be
that if someone out of the media tree will need this functionality he(she) will need to


That will help. I will be happy if someone can do it. I am inpaciently waiting for 
alloc_huhepages()/free_hugepages() API - (transparent hugepages patches, may be)
That also could be a solution for media/video folks with hardware that cannot do 
scatter/gatter. Another solution will be an IOMMU that could present a scattered
user land buffer as contiguous dma address range (I have played in the past with 
AGP-GART without great success).

Thanks.

Marin Mitov

 
--

From: FUJITA Tomonori
Date: Wednesday, August 25, 2010 - 10:40 pm

On Fri, 20 Aug 2010 14:50:12 +0300

Then, we should avoid using coherent memory as I exaplained before. In
addition, dma_alloc_coherent can't provide large enough contigous
memory for some drivers so this patch doesn't help much.

We need the proper API for contiguous memory. Seem that we could have
something:

http://lkml.org/lkml/2010/8/20/167
--

From: Marin Mitov
Date: Wednesday, August 25, 2010 - 11:04 pm

Please, look at drivers/media/video/videobuf-dma-contig.c. Using coherent memory
is inavoidable for now, there is no alternative for it for now. The two new functions,
which I propose are just helpers for those of us who already use coherent memory
(via videobuf-dma-contig API). May be adding these two functions to 
drivers/media/video/videobuf-dma-contig.c will be better solution?

Thanks.

--

From: FUJITA Tomonori
Date: Wednesday, August 25, 2010 - 11:24 pm

On Thu, 26 Aug 2010 09:04:14 +0300

If you add something to the videobuf-dma-contig API, that's fine by me
because drivers/media/video/videobuf-dma-contig.c uses the own
structure and plays with dma_alloc_coherent. As long as a driver
doesn't touch device->dma_mem directly, it's fine, I think (that is,
dt3155v4l driver is broken). There are already some workarounds for
contigous memory in several drivers anyway.

We will have the proper API for contiguous memory. I don't think that
adding such workaround to the DMA API is a good idea.
--

From: Marin Mitov
Date: Thursday, August 26, 2010 - 12:01 am

Why, my understanding is that device->dma_mem is designed exactly for keeping 
some chunk of coherent memory for device's private use via dma_alloc_from_coherent()

If you mean that allocating some coherent memory (4MB in case of dt3155v4l) during
pci probe() (during system booting) for device's latter use (that is dead for the rest
of the system) you are right. But this gives me at least 8 full size buffers warranted for 
latter use. Without this hack the hardware will not work on strongly fragmented system.
With this hack even if the system is strongly fragmented, this chunk of 4MB is available 
for use (though videobuf-dma-contig APIs and dma_alloc_from_coherent()) __transparently__

Sure, can these workarounds be exposed as API for general use?

Thanks,

--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 2:43 am

On Thu, 26 Aug 2010 10:01:52 +0300

I don't think so. device->dma_mem can be accessed only via the
DMA-API. I think that the DMA-API says that
dma_declare_coherent_memory declares coherent memory that can be
access exclusively by a certain device. It's not for reserving
coherent memory that can be used for any device for a device.

Anway, you don't need coherent memory. So using the API for coherent

I don't think that's a good idea. Adding temporary workaround to the
generic API and removing it soon after that doesn't sound a good
developing maner.
--

From: Marin Mitov
Date: Thursday, August 26, 2010 - 3:14 am

Here I disagree with you: "that can be used for any device for a device".
Reserved coherent memory can be only and exclusively used by 
the __same__ device whose device->dma_mem is touched. No other devices 
are influenced because their device->dma_mem are NULL. and 
dma_alloc_from_coherent() is not invoked for them. That is why I think
this hack is not dangerous. If some device driver decide to reserve some
chunk of memory it is for its private use and no other device in the system

Here I agree with you, but for now we have no alternative in media/video

Yes, it is just a temporary solution. Just enhancing an existing temporary solution.

Thanks,

--

From: Guennadi Liakhovetski
Date: Thursday, August 26, 2010 - 2:06 am

No, this will not work - this API has to be used from board code and 

We have currently a number of boards broken in the mainline. They must be 
fixed for 2.6.36. I don't think the mentioned API will do this for us. So, 
as I suggested earlier, we need either this or my patch series

http://thread.gmane.org/gmane.linux.ports.sh.devel/8595

for 2.6.36.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 2:17 am

Hello,

this seems to be more mature to me.  The original patch in this thread
uses a symbol DT3155_COH_FLAGS which seems misplaced in generic code and
doesn't put the new functions in a header.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: Marin Mitov
Date: Thursday, August 26, 2010 - 3:18 am

You are right. DT3155_COH_FLAGS should be defined, and a declaration should be 
put in the headers.

But it is just RFC :-)

--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 2:30 am

On Thu, 26 Aug 2010 11:06:20 +0200 (CEST)

Why can't you revert a commit that causes the regression?

The related DMA API wasn't changed in 2.6.36-rc1. The DMA API is not
responsible for the regression. And the patchset even exnteds the
definition of the DMA API (dma_declare_coherent_memory). Such change
shouldn't applied after rc1. I think that DMA-API.txt says that
dma_declare_coherent_memory() handles coherent memory for a particular
device. It's not for the API that reserves coherent memory that can be
used for any device for a single device.
--

From: Guennadi Liakhovetski
Date: Thursday, August 26, 2010 - 2:45 am

See this reply, and the complete thread too.


Anyway, we need a way to fix the regression.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 2:51 am

On Thu, 26 Aug 2010 11:45:58 +0200 (CEST)

Needs to find a different way.
--

From: Russell King - ARM Linux
Date: Thursday, August 26, 2010 - 10:49 am

No.  ioremap on memory mapped by the kernel is just plain not permitted
with ARMv6 and ARMv7 architectures.

It's not something you can say "oh, need to find another way" because there
is _no_ software solution to having physical regions mapped multiple times
with different attributes.  It's an architectural restriction.

We can't unmap the kernel's memory mapping either, as I've already explained
several times this month - and I'm getting frustrated at having to keep
on explaining that point.

Just accept the plain fact that multiple mappings of the same physical
regions have become illegal.

What we need is another alternative other than using ioremap on memory
already mapped by the kernel - eg, by reserving a certain chunk of
memory for this purpose at boot time which his _never_ mapped by the
kernel, except via ioremap.
--

From: Marin Mitov
Date: Thursday, August 26, 2010 - 11:32 am

Hi Russell,

Just because ioremap on memory mapped by the kernel is just plain not permitted
I have proposed a new pair of functions: dma_reserve_coherent_memory()/dma_free_reserved_memory()

http://lkml.org/lkml/2010/8/19/200

but it is not quite well accepted from the community.
What is your opinion?

Thanks,

--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 2:53 am

The patch that made the problem obvious for ARM is
309caa9cc6ff39d261264ec4ff10e29489afc8f8 aka v2.6.36-rc1~591^2~2^4~12.
So this went in before v2.6.36-rc1.  One of the "architectures which
similar restrictions" is x86 BTW.

And no, we won't revert 309caa9cc6ff39d261264ec4ff10e29489afc8f8 as it
addresses a hardware restriction.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 3:00 am

On Thu, 26 Aug 2010 11:53:11 +0200

How these drivers were able to work without hitting the hardware restriction?
--

From: Russell King - ARM Linux
Date: Thursday, August 26, 2010 - 10:54 am

Well, OMAP processors have experienced lock-ups due to multiple mappings of
memory, so the restriction in the architecture manual is for real.

But more the issue is that the behaviour you get from a region is _totally_
unpredictable (as the arch manual says).  With the VIPT caches, they can
be searched irrespective of whether the page tabkes indicate that it's
supposed to be cached or not - which means you can still hit cache lines
for an ioremap'd region.

And if you do, how do you know that the cached data is still valid - what
if it's some critical data that results in corruption - how do you know
whether that's happened or not?  It might not even cause a kernel
exception.

We have to adhere to the restrictions placed upon us by the architecture
at hand, and if that means device drivers break, so be it - at least we
get to know what needs to be fixed for these restrictions.
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 5:26 pm

On Thu, 26 Aug 2010 18:54:40 +0100

I didn't say the commit is technically wrong. I simply meant that the
commit broke some of working systems (so some complain, I guess).

As I wrote, the related DMA API wasn't changed in 2.6.36-rc1. It's not
related with the regression at all. As long as nobody tries to extend
the API wrongly after rc2, I have no complaint.

btw, Marin Mitov said that these drivers don't need coherent memory,
they just want contiguous memory. Telling the page allocater to
reserve some memory at boot time is enough, I guess.
--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 9:41 pm

Hello,

In my case the machine in question is an ARMv5, the hardware restriction
is on ARMv6+ only.  You could argue that so the breaking patch for arm
should only break ARMv6, but I don't think this is sensible from a
maintainers POV.  We need an API that works independant of the machine
that runs the code.  And it's good to let developers that don't have the full
range of machines supported by the kernel at hand notice when they
introduce an incompatibility.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 10:00 pm

On Fri, 27 Aug 2010 06:41:42 +0200

Agreed. But insisting that the DMA API needs to be extended wrongly
after rc2 to fix the regression is not sensible too. The related DMA
API wasn't changed in 2.6.36-rc1. The API isn't responsible for the
regression at all.
--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 10:19 pm

Hey,

I think this isn't about "responsiblity".  Someone in arm-land found
that the way dma memory allocation worked for some time doesn't work
anymore on new generation chips.  As pointing out this problem was
expected to find some matches it was merged in the merge window.  One
such match is the current usage of the DMA API that doesn't currently
offer a way to do it right, so it needs a patch, no?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 10:57 pm

On Fri, 27 Aug 2010 07:19:07 +0200

No, I don't think so. We are talking about a regression, right?

On new generation chips, something often doesn't work (which have
worked on old chips for some time). It's not a regresiion. I don't
think that it's sensible to make large change (especially after rc1)
to fix such issue. If you say that the DMA API doesn't work on new
chips and proposes a patch for the next merge window, it's sensible, I
suppose.

Btw, the patch isn't a fix for the DMA API. It tries to extend the DMA
API (and IMO in the wrong way). In addition, the patch might break the
current code. I really don't think that applying such patch after rc1
is senseble.
--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 11:13 pm

Hello,

So you suggest to revert 309caa9cc6ff39d261264ec4ff10e29489afc8f8 or at
least restrict it to ARMv6+ and fix the problem during the next merge
window?  Russell?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: Marin Mitov
Date: Thursday, August 26, 2010 - 11:23 pm

To "break the current code" is simply not possible. Sorry to oppose. As you have written it 
"extend the DMA API", so if you do not use the new API (and no current code is using it)
you cannot "break the current code". 

Thanks,

--

From: FUJITA Tomonori
Date: Thursday, August 26, 2010 - 11:32 pm

On Fri, 27 Aug 2010 09:23:21 +0300

Looks like that the patch adds the new API that touches the exisitng
code. It means the existing code could break. So the exsising API
could break too.

http://thread.gmane.org/gmane.linux.ports.sh.devel/8595
--

From: Uwe Kleine-König
Date: Thursday, August 26, 2010 - 11:38 pm

Hello,

I'm still trying to find out what you actually suggest we should do now.
Maybe this is a request for a minimal "fix" without the cleanups
Guennadi did?  That is only patches 2(?), 4 and 5 of the series?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--

From: Marin Mitov
Date: Friday, August 27, 2010 - 11:14 pm

The above reference is not my patch. I am speaking for my patch:

http://lkml.org/lkml/2010/8/19/200

The only point my patch touches the existing code is struct device's member dma_mem
and that is in condition you __use__ the new API, so you could decide yourself if it 
could break the current code. As far as one does not use the new API - nothing is touched,
nothing can break. If one uses the new API, only the user can suffer if the new API have
bugs.

Thanks,

Marin Mitov
--

From: FUJITA Tomonori
Date: Saturday, August 28, 2010 - 12:10 am

On Sat, 28 Aug 2010 09:14:25 +0300

I think that I already NACK'ed the patch.

1) drivers/media/videobuf-dma-contig.c should not use
dma_alloc_coherent. We shouldn't support the proposed API.

2) I don't think that the DMA API (drivers/base/dma-mapping.c) is not
for creating "cache". Generally, the kernel uses "pool" concept for
something like that.

IMHO, reverting the commit 309caa9cc6ff39d261264ec4ff10e29489afc8f8
temporary (or temporary disabling it for systems that had worked) is
the most reasonable approach. I don't think that breaking systems that
had worked is a good idea even if the patch does the right thing. I
believe that we need to fix the broken solution
(videobuf-dma-contig.c) before the commit.
--

From: Marin Mitov
Date: Saturday, August 28, 2010 - 12:19 am

OK.

Thanks,

--

From: FUJITA Tomonori
Date: Sunday, October 10, 2010 - 7:08 am

On Fri, 20 Aug 2010 14:50:12 +0300

Anyone is working on this?

KAMEZAWA posted a patch to improve the generic page allocator to
allocate physically contiguous memory. He said that he can push it
into mainline.

The approach enables us to solve this issue without adding any new
API.
--

From: Marin Mitov
Date: Sunday, October 10, 2010 - 7:36 am

I am waiting for the new videobuf2 framework to become part of the kernel.
Then KAMEZAWA's improvements can help.

--

From: Guennadi Liakhovetski
Date: Sunday, October 10, 2010 - 11:21 am

You probably have seen this related thread: 
http://marc.info/?t=128644473600004&r=1&w=2

Thanks

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--

From: Marin Mitov
Date: Sunday, October 10, 2010 - 11:48 am

Thanks.

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, October 13, 2010 - 1:04 am

On Sun, 10 Oct 2010 23:08:22 +0900
I said I do make an effort ;)
New one here.

http://lkml.org/lkml/2010/10/12/421

Thanks,
-Kame

--

From: Marin Mitov
Date: Wednesday, October 13, 2010 - 9:42 am

I like the patch. The possibility to allocate a contiguous chunk of memory
(or few of them) is what I need. The next step will be to get a dma handle 
(for dma transfers to/from) and then mmap them to user space.

Thanks.

--

From: FUJITA Tomonori
Date: Thursday, October 14, 2010 - 12:16 am

On Wed, 13 Oct 2010 19:42:56 +0300


Let's help him to push this patch to upstream first. The next step is
a different issue (and the dma stuff isn't even a problem; we can
handle it with the current API).
--

From: Guennadi Liakhovetski
Date: Friday, August 20, 2010 - 1:05 pm

Ok, so, we've got two solutions to this problem submitted on the same 
day;) Following this thread:

http://marc.info/?t=128128236400002&r=1&w=2

on the ARM Linux kernel ML, I submitted a patch series

http://thread.gmane.org/gmane.linux.ports.sh.devel/8595

with a couple of fixes and improvements, the actual new API and a use 
example. My approach is slightly different, in that instead of requiring 
drivers to issue two calls - one to reserve RAM (usually 
dma_alloc_coherent()) and one to assign it to a device, my patch follows 
the suggestion from Russell King from the first thread and unites these 
two operations. So, now we have a choice;) Unfortunately, these two patch 
series went to orthogonal sets of recepients, I'm trying to fix this by 
adding a couple of CC entries.

Thanks

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--

Previous thread: Which kernel.org WWW is the master server? by Piotr Hosowicz on Thursday, August 19, 2010 - 8:06 am. (5 messages)

Next thread: [PATCH] oom: __task_cred() need rcu_read_lock() by David Howells on Thursday, August 19, 2010 - 8:26 am. (4 messages)