When a DMA device is unregistered, its reference count is decremented
twice for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's
remove() function completing before all channels have been cleaned
up, causing lots of use-after-free fun.Fix it by incrementing the device's reference count twice for each
channel during registration.Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
---
I'm not sure if this is the correct way to solve it, but it seems to
work. The remove() function does not hang, which indicates that the
device's reference count does drop all the way to zero on
unregistration, which in turn indicates that it did actually drop
_below_ zero before.drivers/dma/dmaengine.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8248992..302eded 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -397,6 +397,8 @@ int dma_async_device_register(struct dma_device *device)
goto err_out;
}+ /* One for the channel, one of the class device */
+ kref_get(&device->refcount);
kref_get(&device->refcount);
kref_init(&chan->refcount);
chan->slow_ref = 0;
--
1.5.2.5-
As Dan said, we've been discussing this offline, and hadn't come to an
agreement yet. My version of the patch is the opposite of yours -
instead of adding a kref_get(), I remove one of the kref_put() calls.
--When a channel is removed from dmaengine, too many kref_put() calls
are made and the device removal happens too soon, usually causing
a panic.Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
---drivers/dma/dmaengine.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8248992..144a1b7 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -131,7 +131,6 @@ static void dma_async_device_cleanup(struct kref
*kref);
static void dma_class_dev_release(struct class_device *cd)
{
struct dma_chan *chan = container_of(cd, struct dma_chan,
class_dev);
- kref_put(&chan->device->refcount, dma_async_device_cleanup);
}static struct class dma_devclass = {
-
Yeah, Shannon ran into this too... I'd like to be able clean this up by
reducing the number of time we take the device reference, but the
following patch is still showing problems in Shannon's environment, so I
missed one...---
dmaengine: fix up dma_device refcounting
From: Dan Williams <dan.j.williams@intel.com>
Currently the code drops too many references on the parent device. Change
the scheme to:+ take a reference at registration:
dma_async_device_register()
+ take a reference for each channel device registered:
device_register(&chan->dev)
- drop a reference for each channel device unregistered:
device_unregister(&chan->dev)
- drop a reference at unregistration:
dma_async_device_unregister()Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---drivers/dma/dmaengine.c | 16 ++++------------
1 files changed, 4 insertions(+), 12 deletions(-)diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 84257f7..d2b600b 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client)
/* we are done once this client rejects
* an available resource
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_get(chan);
- kref_get(&device->refcount);
- } else if (ack == DMA_NAK)
+ else if (ack == DMA_NAK)
return;
}
}
@@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
{
struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
chan->device->device_free_chan_resources(chan);
- kref_put(&chan->device->refcount, dma_async_device_cleanup);
}
EXPORT_SYMBOL(dma_chan_cleanup);@@ -276,11 +274,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan)
/* client was holding resources for this channel so
* free it
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_pu...
On Fri, 26 Oct 2007 09:36:17 -0700
While I can't see any problems with the rest of the patch, I think this
part is wrong for the same reasons removing the kref_put() from the
class device cleanup function is. I don't see any constraint that
guarantees that dma_chan_cleanup() will always be called before
dma_dev_release(), which means that "chan" may have been freed before
this function gets a chance to run. Please correct me if I'm wrong.Håvard
-
Absolutely right, the driver, not dmaengine, frees the memory so there
must be a per channel reference on the device to hold off the driver'sSo how about this...
---snip---
dmaengine: Fix broken device refcountingFrom: Haavard Skinnemoen <hskinnemoen@atmel.com>
When a DMA device is unregistered, its reference count is decremented
twice for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's
remove() function completing before all channels have been cleaned
up, causing lots of use-after-free fun.Fix it by incrementing the device's reference count twice for each
channel during registration.Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
[dan.j.williams@intel.com: kill unnecessary client refcounting]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---drivers/dma/dmaengine.c | 17 ++++++-----------
1 files changed, 6 insertions(+), 11 deletions(-)diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 84257f7..ec7e871 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client)
/* we are done once this client rejects
* an available resource
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_get(chan);
- kref_get(&device->refcount);
- } else if (ack == DMA_NAK)
+ else if (ack == DMA_NAK)
return;
}
}
@@ -276,11 +275,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan)
/* client was holding resources for this channel so
* free it
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_put(&chan->device->refcount,
- dma_async_device_cleanup);
- }
}mutex_unlock(&dma_list_mutex);
@@ -320,11 +316,8 @@ void dma_async_client_unregister(struct dma_client *client)
ack = client->event_callback(client, chan,...
I tested this in my ioatdma setup and no longer get the panic. I'm good with this if you two are happy with it.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
sln
-
On Mon, 29 Oct 2007 09:02:34 -0700
Looks good to me too, although I haven't had a chance to test it yet.
Thanks,
Håvard
-
Thanks - when I get back in tomorrow morning I'll test this to see
that it gets rid of the panic that I've been getting.sln
--
==============================================
Mr. Shannon Nelson Parents can't afford to be squeamish.
-
| Brandeburg, Jesse | RE: [regression] e1000e broke e1000 (was: Re: [ANNOUNCE] e1000 toe1000e migration ... |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Valdis.Kletnieks | Re: ndiswrapper and GPL-only symbols redux |
git: | |
| Sander | 'struct task_struct' has no member named 'mems_allowed' (was: Re: 2.6.20-rc4-mm1) |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| Paweł Staszewski | rib_trie / Fix inflate_threshold_root. Now=15 size=11 bits |
