Re: [linux-usb-devel] Bug creating USB endpoints in 2.6.20.x (kernel bug 8198)

Previous thread: please pull from the trivial tree by Adrian Bunk on Wednesday, May 9, 2007 - 2:30 am. (1 message)

Next thread: lguest re-review by Andrew Morton on Wednesday, May 9, 2007 - 2:51 am. (5 messages)
From: Tejun Heo
Date: Wednesday, May 9, 2007 - 2:40 am

[adding back linux-usb-devel.  please don't drop cc]
[also adding lkml and Greg K-H]


Alright, it took a lot of work but the winner is the "dev" node.  Gee...
We could have caught this earlier unless I forgot to print the name of
the last component in the initial debug patch.  Well, better late than
never.  :-(

Anyways, the problem is that the attribute "dev" is dynamically
allocated using kmalloc() in device_add() and freed immediately after
the file is removed in device_del().  It's basically assuming
immediate-disconnect but that apparently isn't true yet.

The dynamic allocation is to set the owner of the attribute to the owner
of the device being registered.  This is twisted in that the field is
added to work around sysfs/module lifetime problem but the workaround
itself is broken with regard to lifetime rules.  Aieeeeeeee...  This
sort of things are why I implemented immediate disconnect on sysfs in
the first place.

So, we can fix the problem Chris is seeing by breaking module unload (by
allowing it to unload too early).  It doesn't sound too hot but module
unloading race is much less likely than sysfs node deletion/open race.

Thanks.

-- 
tejun
-

From: Chris Rankin
Date: Wednesday, May 9, 2007 - 5:24 am

Yikes! Just temporary breakage, I hope :-)!! The only modules I unload on a regular basis these
days are things like "microcode", which the init scripts take care of as part of the boot-up
process.

Thanks for all that hard work,
Cheers,
Chris



      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
-

From: Tejun Heo
Date: Wednesday, May 9, 2007 - 6:30 am

Okay, here's a half-assed fix.  With this patch applied, if you try to
unload a module while you're opening it's dev attribute, kernel will
oops later when the file is accessed or closed later but it should fix
the bug winecfg triggers.  I really dunno how to fix this the right way
in the stable kernel.  Better ideas?  Anyone?

-- 
tejun
From: Chris Rankin
Date: Wednesday, May 9, 2007 - 6:56 am

How about a WARN() and a small(?) memory leak? Better than an oops, surely?

Cheers,
Chris



      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html
-

From: Tejun Heo
Date: Wednesday, May 9, 2007 - 7:11 am

Device node creation/deletion can be quite often depending on
configuration, so I don't think we can afford memory leak here.  It can
develop into a big problem for long running hosts.  IMHO, just
introducing module unload/deletion race is much better.  It's the lesser
evil, difficult to trigger and already broken in other places anyway.

I think we need to hear what other people are thinking about it.  Cc'ing
Maneesh, Dmitry and Cornelia.  The whole thread can be read at...

  http://thread.gmane.org/gmane.linux.usb.devel/53559
  http://thread.gmane.org/gmane.linux.usb.devel/53846

The thread is rather long but just reading the message from the second
URL should be enough.  The problem is that dev->devt_attr (class dev has
the same problem) is deallocated when the device is deleted.  If the dev
sysfs attribute has users at that point, the dev sysfs node is left with
garbled struct attribute causing oops later.

IMHO, the proper fix for this is immediate-disconnect which is no in -mm
as the problem is caused by expecting immediate-disconnect behavior when
it isn't implemented.

As written above, I think it's better to risk module unload / sysfs race
than keeping the current sysfs deletion / open race.  What do you guys
think?

Thanks.

-- 
tejun
-

From: Dmitry Torokhov
Date: Wednesday, May 9, 2007 - 7:35 am

Hi Tejun,


How about embedding struct attribute fro devt into struct
[class_]device for now? It is not too big and device is still going to
be pinned into memory while there are sysfs users... I don't like
fattening of device structures but leaks and/or oopses are worse in my
book.

-- 
Dmitry
-

From: Tejun Heo
Date: Wednesday, May 9, 2007 - 7:58 am

Right, your book is apparently much better than mine.  Actually, we can
just free devt_attr in device_release().  Looking at the code, class
device is already doing it that way, so here's the full-assed fix.
Chris, can you please test the attached patch?

Thanks.

-- 
tejun
From: Chris Rankin
Date: Wednesday, May 9, 2007 - 2:09 pm

Tejun,

So far so good; my 2.6.20.11+patch kernel hasn't oopsed yet. I'm going to start using this kernel
as my default boot and see how it goes.

Cheers,
Chris


      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
-

From: Greg KH
Date: Wednesday, May 9, 2007 - 7:57 am

This looks like the correct fix, the only reason I made this dynamic was
to properly set up the module owner.  And now that you have removed that
link in your sysfs rework, it can go and become static again.

Oh, and this isn't a "device node", it's just a text file that contains
the major:minor number for a device node to be created from.

So, care to fix up the class code too and send a patch with a
signed-off-by:?

thanks,

greg k-h
-

From: Tejun Heo
Date: Wednesday, May 9, 2007 - 8:01 am

I think the proper fix for -stable is to free the structure in
device_release().  I asked Chris to test it.  After he confirms, I'll
send a proper patch with S-O-B.

For -mm, yeap, we can simply make the whole thing static.  I'll send
patch for that soon.

Thanks.

-- 
tejun
-

From: Greg KH
Date: Wednesday, May 9, 2007 - 8:40 am

Ah, yes, that's the correct fix for this.

thanks,

greg k-h
-

From: Tejun Heo
Date: Thursday, May 10, 2007 - 7:45 am

Currently, devt_attr for the "dev" file is freed immediately on device
removal, but if the "dev" sysfs file is open when a device is removed,
sysfs will access its attribute structure for further access including
close resulting in jumping to garbled address.  Fix it by postponing
freeing devt_attr to device release time.

Note that devt_attr for class_device is already freed on release.

This bug is reported by Chris Rankin as bugzilla bug#8198.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Chris Rankin <rankincj@yahoo.com>
---
Applies well to 2.6.20 and 21.  As sysfs-immediate-disconnect doesn't
seem to be included in 2.6.22, this should be included in linus#master
too (applies well there as well).

* This is the second post.  Something went wrong with the recipients
  list on the first posting.  Both are same.

Thanks.

 drivers/base/core.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: tree0/drivers/base/core.c
===================================================================
--- tree0.orig/drivers/base/core.c
+++ tree0/drivers/base/core.c
@@ -93,6 +93,9 @@ static void device_release(struct kobjec
 {
 	struct device * dev = to_dev(kobj);
 
+	kfree(dev->devt_attr);
+	dev->devt_attr = NULL;
+
 	if (dev->release)
 		dev->release(dev);
 	else if (dev->class && dev->class->dev_release)
@@ -650,10 +653,8 @@ void device_del(struct device * dev)
 
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (dev->devt_attr) {
+	if (dev->devt_attr)
 		device_remove_file(dev, dev->devt_attr);
-		kfree(dev->devt_attr);
-	}
 	if (dev->class) {
 		sysfs_remove_link(&dev->kobj, "subsystem");
 		/* If this is not a "fake" compatible device, remove the
-

From: Greg KH
Date: Thursday, May 10, 2007 - 8:05 am

As I don't think we should be adding your sysfs rework to 2.6.22 just
yet, any objections to me just sending this to Linus for 2.6.22 and
waiting on your previous one for when the whole sysfs rework patchset is
sent?

thanks,

greg k-h
-

From: Tejun Heo
Date: Thursday, May 10, 2007 - 8:13 am

No objection at all.  That actually is exactly my intention.  The
make-attrs-static patch will conflict with this fix tho.  Just let me
know when it breaks, I'll post updated one.

Thanks.

-- 
tejun
-

From: Greg KH
Date: Thursday, May 10, 2007 - 8:17 am

If it breaks, I'll fix up the merge issues, don't worry about it :)

Thanks again for taking the time and tracking it down and fixing this.

greg k-h
-

From: Kay Sievers
Date: Thursday, May 10, 2007 - 8:33 am

Hi Tejun,
your rework removes the "owner" field from the attributes. I think we
kept the "dev" and "uevent" attribute as part of "struct device" only
to be able to assign it the actual owner of the module that has
created the device. The attribute can probably just live as one
instance statically in the driver core now?

Thanks,
Kay
-

From: Tejun Heo
Date: Thursday, May 10, 2007 - 8:41 am

Yeah, that's -mm and this is for -stable and -22.  For -mm, we can just
make all those attributes static which is the other patch is this
thread.  :-)

-- 
tejun
-

From: Alan Stern
Date: Thursday, May 10, 2007 - 8:52 am

Although sysfs-immediate-disconnect may not be included in 2.6.22, the old 
attribute-orphan code by Oliver Neukum is present there and also in 
2.6.21.  Shouldn't that suffice?

Alan Stern

-

From: Tejun Heo
Date: Thursday, May 10, 2007 - 9:18 am

sysfs_release() still needs to deference attr->owner to put it, so I
think there's the same problem even with attribute-orphan.  We end up
calling module_put() on garbage.

-- 
tejun
-

From: Tejun Heo
Date: Thursday, May 10, 2007 - 7:25 am

devt_attr and uevent_attr are either allocated dynamically with or
embedded in device and class_device as they needed their owner field
set to the module implementing the driver.  Now that sysfs implements
immediate disconnect and owner field removed from struct attribute,
there is no reason to do this.  Remove these attributes from
[class_]device and use static attribute structures instead.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/base/class.c   |   44 ++++++++++++++++----------------------------
 drivers/base/core.c    |   45 +++++++++++++++------------------------------
 include/linux/device.h |    5 -----
 3 files changed, 31 insertions(+), 63 deletions(-)

Index: tree0/drivers/base/class.c
===================================================================
--- tree0.orig/drivers/base/class.c
+++ tree0/drivers/base/class.c
@@ -312,9 +312,6 @@ static void class_dev_release(struct kob
 
 	pr_debug("device class '%s': release.\n", cd->class_id);
 
-	kfree(cd->devt_attr);
-	cd->devt_attr = NULL;
-
 	if (cd->release)
 		cd->release(cd);
 	else if (cls->release)
@@ -564,6 +561,9 @@ static ssize_t show_dev(struct class_dev
 	return print_dev_t(buf, class_dev->devt);
 }
 
+static struct class_device_attribute class_devt_attr =
+	__ATTR(dev, S_IRUGO, show_dev, NULL);
+
 static ssize_t store_uevent(struct class_device *class_dev,
 			    const char *buf, size_t count)
 {
@@ -571,6 +571,9 @@ static ssize_t store_uevent(struct class
 	return count;
 }
 
+static struct class_device_attribute class_uevent_attr =
+	__ATTR(uevent, S_IWUSR, NULL, store_uevent);
+
 void class_device_initialize(struct class_device *class_dev)
 {
 	kobj_set_kset_s(class_dev, class_obj_subsys);
@@ -620,30 +623,15 @@ int class_device_add(struct class_device
 				  &parent_class->subsys.kobj, "subsystem");
 	if (error)
 		goto out3;
-	class_dev->uevent_attr.attr.name = "uevent";
-	class_dev->uevent_attr.attr.mode = S_IWUSR;
-	class_dev->uevent_attr.store = ...
Previous thread: please pull from the trivial tree by Adrian Bunk on Wednesday, May 9, 2007 - 2:30 am. (1 message)

Next thread: lguest re-review by Andrew Morton on Wednesday, May 9, 2007 - 2:51 am. (5 messages)