[adding back linux-usb-devel. please don't drop cc] [also adding lkml and Greg K-H] Alright, it took a lot of work but the winner is the "dev" node. Gee... We could have caught this earlier unless I forgot to print the name of the last component in the initial debug patch. Well, better late than never. :-( Anyways, the problem is that the attribute "dev" is dynamically allocated using kmalloc() in device_add() and freed immediately after the file is removed in device_del(). It's basically assuming immediate-disconnect but that apparently isn't true yet. The dynamic allocation is to set the owner of the attribute to the owner of the device being registered. This is twisted in that the field is added to work around sysfs/module lifetime problem but the workaround itself is broken with regard to lifetime rules. Aieeeeeeee... This sort of things are why I implemented immediate disconnect on sysfs in the first place. So, we can fix the problem Chris is seeing by breaking module unload (by allowing it to unload too early). It doesn't sound too hot but module unloading race is much less likely than sysfs node deletion/open race. Thanks. -- tejun -
Yikes! Just temporary breakage, I hope :-)!! The only modules I unload on a regular basis these
days are things like "microcode", which the init scripts take care of as part of the boot-up
process.
Thanks for all that hard work,
Cheers,
Chris
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
-
Okay, here's a half-assed fix. With this patch applied, if you try to unload a module while you're opening it's dev attribute, kernel will oops later when the file is accessed or closed later but it should fix the bug winecfg triggers. I really dunno how to fix this the right way in the stable kernel. Better ideas? Anyone? -- tejun
How about a WARN() and a small(?) memory leak? Better than an oops, surely?
Cheers,
Chris
___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html
-
Device node creation/deletion can be quite often depending on configuration, so I don't think we can afford memory leak here. It can develop into a big problem for long running hosts. IMHO, just introducing module unload/deletion race is much better. It's the lesser evil, difficult to trigger and already broken in other places anyway. I think we need to hear what other people are thinking about it. Cc'ing Maneesh, Dmitry and Cornelia. The whole thread can be read at... http://thread.gmane.org/gmane.linux.usb.devel/53559 http://thread.gmane.org/gmane.linux.usb.devel/53846 The thread is rather long but just reading the message from the second URL should be enough. The problem is that dev->devt_attr (class dev has the same problem) is deallocated when the device is deleted. If the dev sysfs attribute has users at that point, the dev sysfs node is left with garbled struct attribute causing oops later. IMHO, the proper fix for this is immediate-disconnect which is no in -mm as the problem is caused by expecting immediate-disconnect behavior when it isn't implemented. As written above, I think it's better to risk module unload / sysfs race than keeping the current sysfs deletion / open race. What do you guys think? Thanks. -- tejun -
Hi Tejun, How about embedding struct attribute fro devt into struct [class_]device for now? It is not too big and device is still going to be pinned into memory while there are sysfs users... I don't like fattening of device structures but leaks and/or oopses are worse in my book. -- Dmitry -
Right, your book is apparently much better than mine. Actually, we can just free devt_attr in device_release(). Looking at the code, class device is already doing it that way, so here's the full-assed fix. Chris, can you please test the attached patch? Thanks. -- tejun
Tejun,
So far so good; my 2.6.20.11+patch kernel hasn't oopsed yet. I'm going to start using this kernel
as my default boot and see how it goes.
Cheers,
Chris
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
-
This looks like the correct fix, the only reason I made this dynamic was to properly set up the module owner. And now that you have removed that link in your sysfs rework, it can go and become static again. Oh, and this isn't a "device node", it's just a text file that contains the major:minor number for a device node to be created from. So, care to fix up the class code too and send a patch with a signed-off-by:? thanks, greg k-h -
I think the proper fix for -stable is to free the structure in device_release(). I asked Chris to test it. After he confirms, I'll send a proper patch with S-O-B. For -mm, yeap, we can simply make the whole thing static. I'll send patch for that soon. Thanks. -- tejun -
Ah, yes, that's the correct fix for this. thanks, greg k-h -
Currently, devt_attr for the "dev" file is freed immediately on device
removal, but if the "dev" sysfs file is open when a device is removed,
sysfs will access its attribute structure for further access including
close resulting in jumping to garbled address. Fix it by postponing
freeing devt_attr to device release time.
Note that devt_attr for class_device is already freed on release.
This bug is reported by Chris Rankin as bugzilla bug#8198.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Chris Rankin <rankincj@yahoo.com>
---
Applies well to 2.6.20 and 21. As sysfs-immediate-disconnect doesn't
seem to be included in 2.6.22, this should be included in linus#master
too (applies well there as well).
* This is the second post. Something went wrong with the recipients
list on the first posting. Both are same.
Thanks.
drivers/base/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Index: tree0/drivers/base/core.c
===================================================================
--- tree0.orig/drivers/base/core.c
+++ tree0/drivers/base/core.c
@@ -93,6 +93,9 @@ static void device_release(struct kobjec
{
struct device * dev = to_dev(kobj);
+ kfree(dev->devt_attr);
+ dev->devt_attr = NULL;
+
if (dev->release)
dev->release(dev);
else if (dev->class && dev->class->dev_release)
@@ -650,10 +653,8 @@ void device_del(struct device * dev)
if (parent)
klist_del(&dev->knode_parent);
- if (dev->devt_attr) {
+ if (dev->devt_attr)
device_remove_file(dev, dev->devt_attr);
- kfree(dev->devt_attr);
- }
if (dev->class) {
sysfs_remove_link(&dev->kobj, "subsystem");
/* If this is not a "fake" compatible device, remove the
-
As I don't think we should be adding your sysfs rework to 2.6.22 just yet, any objections to me just sending this to Linus for 2.6.22 and waiting on your previous one for when the whole sysfs rework patchset is sent? thanks, greg k-h -
No objection at all. That actually is exactly my intention. The make-attrs-static patch will conflict with this fix tho. Just let me know when it breaks, I'll post updated one. Thanks. -- tejun -
If it breaks, I'll fix up the merge issues, don't worry about it :) Thanks again for taking the time and tracking it down and fixing this. greg k-h -
Hi Tejun, your rework removes the "owner" field from the attributes. I think we kept the "dev" and "uevent" attribute as part of "struct device" only to be able to assign it the actual owner of the module that has created the device. The attribute can probably just live as one instance statically in the driver core now? Thanks, Kay -
Yeah, that's -mm and this is for -stable and -22. For -mm, we can just make all those attributes static which is the other patch is this thread. :-) -- tejun -
Although sysfs-immediate-disconnect may not be included in 2.6.22, the old attribute-orphan code by Oliver Neukum is present there and also in 2.6.21. Shouldn't that suffice? Alan Stern -
sysfs_release() still needs to deference attr->owner to put it, so I think there's the same problem even with attribute-orphan. We end up calling module_put() on garbage. -- tejun -
devt_attr and uevent_attr are either allocated dynamically with or
embedded in device and class_device as they needed their owner field
set to the module implementing the driver. Now that sysfs implements
immediate disconnect and owner field removed from struct attribute,
there is no reason to do this. Remove these attributes from
[class_]device and use static attribute structures instead.
Signed-off-by: Tejun Heo <htejun@gmail.com>
---
drivers/base/class.c | 44 ++++++++++++++++----------------------------
drivers/base/core.c | 45 +++++++++++++++------------------------------
include/linux/device.h | 5 -----
3 files changed, 31 insertions(+), 63 deletions(-)
Index: tree0/drivers/base/class.c
===================================================================
--- tree0.orig/drivers/base/class.c
+++ tree0/drivers/base/class.c
@@ -312,9 +312,6 @@ static void class_dev_release(struct kob
pr_debug("device class '%s': release.\n", cd->class_id);
- kfree(cd->devt_attr);
- cd->devt_attr = NULL;
-
if (cd->release)
cd->release(cd);
else if (cls->release)
@@ -564,6 +561,9 @@ static ssize_t show_dev(struct class_dev
return print_dev_t(buf, class_dev->devt);
}
+static struct class_device_attribute class_devt_attr =
+ __ATTR(dev, S_IRUGO, show_dev, NULL);
+
static ssize_t store_uevent(struct class_device *class_dev,
const char *buf, size_t count)
{
@@ -571,6 +571,9 @@ static ssize_t store_uevent(struct class
return count;
}
+static struct class_device_attribute class_uevent_attr =
+ __ATTR(uevent, S_IWUSR, NULL, store_uevent);
+
void class_device_initialize(struct class_device *class_dev)
{
kobj_set_kset_s(class_dev, class_obj_subsys);
@@ -620,30 +623,15 @@ int class_device_add(struct class_device
&parent_class->subsys.kobj, "subsystem");
if (error)
goto out3;
- class_dev->uevent_attr.attr.name = "uevent";
- class_dev->uevent_attr.attr.mode = S_IWUSR;
- class_dev->uevent_attr.store = ...