-mm has a debugging patch which warns when atomic_dec_and_test() takes an atomic_t negative (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20/2.6.20-mm2/broken...). When it is applied to current mainline, a simple `rmmod ipw2200' gives: [ 75.825072] BUG: atomic counter underflow at: [ 75.825180] [<c01c6eb4>] kref_put+0x66/0x82 [ 75.825278] [<c022e4d4>] bus_remove_driver+0x66/0x75 [ 75.825383] [<c022ee2c>] driver_unregister+0x8/0x13 [ 75.825484] [<c01d7add>] pci_unregister_driver+0xc/0x45 [ 75.825593] [<c0132147>] sys_delete_module+0x157/0x17c [ 75.825703] [<c013c663>] audit_syscall_entry+0x10d/0x137 [ 75.825818] [<c0103b14>] syscall_call+0x7/0xb [ 75.825913] [<c02d0000>] xfrm4_dst_destroy+0xe/0xd5 This didn't happen in 2.6.20-mm2, so this bug was introduced by a patch which was not in the -mm lineup twelve days ago. Presumably the effect of this is a memory leak or a use-after-free. -
Ugh. I'll add it to my local tree here and try to bisect to find the problem. thanks for the pointer, greg k-h -
Ok, after a zillion bisects, I've tracked this down to: commit 63ce18cfe685115ff8d341bae4c9204a79043cf0 Author: Mike Galbraith <efault@gmx.de> Date: Wed Feb 21 12:45:35 2007 -0800 driver core: refcounting fix Fix a reference counting bug exposed by commit 725522b5453dd680412f2b6463a988e4fd148757. If driver.mod_name exists, we take a reference in module_add_driver(), and never release it. Undo that reference in module_remove_driver(). Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Mike, I've reverted this patch, and I don't see any references leaking. And, as your patch released the reference on the driver, and the module_add_driver() call would not grab a reference to the driver, only the module kobject, I don't see what you were trying to fix with this patch. Do you have a test case that this fixes? Otherwise, I'll just revert it. thanks, greg k-h -
What it fixed for me was the hard hang reported below. http://lkml.org/lkml/2007/2/16/96 -Mike -
What specific module are you trying to unload that causes the hang? I think it might just be a problem with that module, and not with all others. So, I'm going to revert your patch and work to try to find the real cause of this problem. thanks, greg k-h -
Yeah, my stab at it seems busted. I'll take another poke at it to see if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757) I'm left with a reference. -Mike -
Ok, stab #2.
My reference count woes stem from module_remove_driver() not removing
the link created in module_add_driver(). With the below, my box boots
fine. Since I obviously know spit about driver layer glue, I'll just
call this one a diagnostic, and head for the hills :)
--- linux-2.6.20-rc3/kernel/module.c.org 2007-03-10 15:16:47.000000000 +0100
+++ linux-2.6.20-rc3/kernel/module.c 2007-03-10 15:43:09.000000000 +0100
@@ -2411,14 +2411,28 @@ void module_remove_driver(struct device_
return;
sysfs_remove_link(&drv->kobj, "module");
- if (drv->owner && drv->owner->mkobj.drivers_dir) {
- driver_name = make_driver_name(drv);
- if (driver_name) {
- sysfs_remove_link(drv->owner->mkobj.drivers_dir,
+ driver_name = make_driver_name(drv);
+ if (!driver_name)
+ return;
+ if (drv->owner && drv->owner->mkobj.drivers_dir)
+ sysfs_remove_link(drv->owner->mkobj.drivers_dir,
driver_name);
- kfree(driver_name);
- }
+ else if (drv->mod_name) {
+ struct module_kobject *mk;
+ struct kobject *mkobj;
+
+ /* Lookup built-in module entry in /sys/modules */
+ mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
+ if (!mkobj)
+ goto out_free;
+ mk = container_of(mkobj, struct module_kobject, kobj);
+ module_create_drivers_dir(mk);
+ sysfs_remove_link(mk->drivers_dir, driver_name);
+ /* Release reference taken via lookup */
+ kobject_put(mkobj);
}
+out_free:
+ kfree(driver_name);
}
EXPORT_SYMBOL(module_remove_driver);
#endif
-
P.S. forgot to include diagnostic log. Kobject c0644890 is the source of my woes. Printk's come below WARN_ON(is_ipmi_si_kobj). Post-tinker log is huge, and probably not interesting. [ 30.397160] kobject ipmi_devintf: registering. parent: <NULL>, set: module [ 30.404033] kobject_uevent_env [ 30.407098] fill_kobj_path: path = '/module/ipmi_devintf' [ 30.412524] BUG: at lib/kobject.c:448 kobject_get() [ 30.417402] [<c0105086>] show_trace_log_lvl+0x1a/0x30 [ 30.422564] [<c01057b2>] show_trace+0x12/0x14 [ 30.427031] [<c0105856>] dump_stack+0x16/0x18 [ 30.431501] [<c02d3c2b>] kobject_get+0x66/0x87 [ 30.436056] [<c02d3f1e>] kobject_shadow_add+0x10/0x1e8 [ 30.441312] [<c02d4100>] kobject_add+0xa/0xc [ 30.445695] [<c06804c0>] kernel_param_sysfs_setup+0x4d/0x7f [ 30.451376] [<c068067a>] param_sysfs_init+0x188/0x1c3 [ 30.456538] [<c06725bc>] init+0x144/0x26c [ 30.460661] [<c0104cfb>] kernel_thread_helper+0x7/0x1c [ 30.465907] ======================= [ 30.469486] get: c18f65c0 count after get is 2 [ 30.473927] kobject ipmi_si: registering. parent: <NULL>, set: module [ 30.480372] kobject_uevent_env [ 30.483430] fill_kobj_path: path = '/module/ipmi_si' ...... [ 73.266556] bus platform: add driver ipmi [ 73.278013] kobject ipmi: registering. parent: <NULL>, set: drivers [ 73.291847] kobject_uevent_env [ 73.302358] fill_kobj_path: path = '/bus/platform/drivers/ipmi' [ 73.315943] ipmi message handler version 39.1 [ 73.327839] ipmi device interface [ 73.338524] device class 'ipmi': registering [ 73.350158] subsystem ipmi: registering [ 73.361309] kobject ipmi: registering. parent: <NULL>, set: class [ 73.374841] bus platform: add driver ipmi_si [ 73.386442] BUG: at lib/kobject.c:448 kobject_get() [ 73.398617] [<c0105086>] show_trace_log_lvl+0x1a/0x30 [ 73.411079] [<c01057b2>] show_trace+0x12/0x14 [ 73.422780] [<c0105856>] dump_stack+0x16/0x18 [ 73.434324] [<c02d3c2b>] kobject_get+0x66/0x87 [ ...
Does ipmi_si not have a "owner"? Ah, that makes sense, not all modules
That's pretty good for not knowing much about the subject matter here.
But can you try this version instead? It should work a bit better than
yours.
thanks for your patience,
greg k-h
Subject: modules: fix reference counting logic for drivers without module pointers.
We weren't dropping the sysfs link for the module driver name if we
didn't happen to have the "owner" pointer in the driver.
Based on a patch from Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
kernel/module.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2405,20 +2405,30 @@ EXPORT_SYMBOL(module_add_driver);
void module_remove_driver(struct device_driver *drv)
{
+ struct module_kobject *mk = NULL;
+ struct kobject *mkobj = NULL;
char *driver_name;
if (!drv)
return;
sysfs_remove_link(&drv->kobj, "module");
- if (drv->owner && drv->owner->mkobj.drivers_dir) {
- driver_name = make_driver_name(drv);
- if (driver_name) {
- sysfs_remove_link(drv->owner->mkobj.drivers_dir,
- driver_name);
- kfree(driver_name);
- }
+ driver_name = make_driver_name(drv);
+ if (!driver_name)
+ return;
+
+ if (drv->owner && drv->owner->mkobj.drivers_dir)
+ mk = &drv->owner->mkobj;
+ else {
+ /* Lookup built-in module entry in /sys/modules */
+ mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
+ if (!mkobj)
+ return;
+ mk = container_of(mkobj, struct module_kobject, kobj);
}
+ sysfs_remove_link(mk->drivers_dir, driver_name);
+ kobject_put(mkobj);
+ kfree(driver_name);
}
EXPORT_SYMBOL(module_remove_driver);
#endif
-
It exploded in strcmp(). Unfortunately, the full oops didn't make it to
either console or serial console.
[ 30.783048] ipmi message handler version 39.1
[ 30.787632] ipmi device interface
[ 30.791166] IPMI System Interface driver.
[ 30.816961] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 30.832700] printing eip:
[ 30.842383] c02d4098
[ 30.851458] *pde = 00000000
[ 30.861089] Oops: 0000 [#1]
[ 30.870585] PREEMPT SMP
[ 30.879724] Modules linked in:
[ 30.889288] CPU: 1
[ 30.889290] EIP: 0060:[<c02d4098>] Not tainted VLI
[ 30.889292] EFLAGS: 00010282 (2.6.21-rc3-smp #81)
I did this...
--- kernel/module.c.org 2007-03-15 07:20:15.000000000 +0100
+++ kernel/module.c 2007-03-15 08:13:36.000000000 +0100
@@ -2419,15 +2419,19 @@ void module_remove_driver(struct device_
if (drv->owner && drv->owner->mkobj.drivers_dir)
mk = &drv->owner->mkobj;
- else {
+ else if (drv->mod_name) {
/* Lookup built-in module entry in /sys/modules */
mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
if (!mkobj)
- return;
+ goto out_free;
mk = container_of(mkobj, struct module_kobject, kobj);
+ } else {
+ WARN_ON(1);
+ goto out_free;
}
sysfs_remove_link(mk->drivers_dir, driver_name);
kobject_put(mkobj);
+out_free:
kfree(driver_name);
}
EXPORT_SYMBOL(module_remove_driver);
...and it booted.
[ 24.670410] ipmi message handler version 39.1
[ 24.675000] ipmi device interface
[ 24.678542] IPMI System Interface driver.
[ 24.703956] BUG: at kernel/module.c:2429 module_remove_driver()
[ 24.716837] [<c0105086>] show_trace_log_lvl+0x1a/0x30
[ 24.728909] [<c01057b2>] show_trace+0x12/0x14
[ 24.740239] [<c0105856>] dump_stack+0x16/0x18
[ 24.751469] [<c01441ce>] module_remove_driver+0xa5/0xa7
[ 24.763584] [<c035a014>] bus_remove_driver+0x6d/0x82
[ 24.775390] [<c035a978>] driver_unregister+0xb/0x18
[ 24.787019] [<c034c8c2>] ...Gah, that just happened to me too, sorry for not booting with that That's good. But why don't we have a module name for this driver? And if we don't have a module name, why would there be a symlink to With the above change, it all works correctly? If the ipmi driver is loaded, what does the /sys/module/MODULE_NAME/ tree look like (replacing MODULE_NAME with whatever the module name really is, sorry, I don't know)? thanks, greg k-h -
Ya got me, but according to my debug logs, what's causing my lockup is the reference we add while making the symlink when we hit... if (driver_name) in module_add_driver(). Maybe we go through there I don't know about _correctly_, but my diag patch _boots_, as does your Well, I will never see that, because ipmi_si finds no interfaces, so always backs out. After backout, with my patch and yours + my addons, I have there leftovers. root@Homer: ls -R /sys/module/ipmi_si /sys/module/ipmi_si: drivers parameters /sys/module/ipmi_si/drivers: /sys/module/ipmi_si/parameters: bt_debug hotmod kcs_debug smic_debug -
I found the log (i think), and even with the patched kernel, gdb still
says...
(gdb) list *module_add_driver+0x44
0xc01442e5 is in module_add_driver (kernel/module.c:2401).
2396 driver_name = make_driver_name(drv);
2397 if (driver_name) {
2398 module_create_drivers_dir(mk);
2399 no_warn = sysfs_create_link(mk->drivers_dir, &drv->kobj,
2400 driver_name);
2401 kfree(driver_name);
2402 }
2403 }
2404 EXPORT_SYMBOL(module_add_driver);
2405
(gdb)
See: kobject filter function caused the event to drop!
(erm, that spot caused some hmm action here. if that drop is bad, i have
another one as well.)
[ 30.397160] kobject ipmi_devintf: registering. parent: <NULL>, set: module
[ 30.404033] kobject_uevent_env
[ 30.407098] fill_kobj_path: path = '/module/ipmi_devintf'
[ 30.412524] BUG: at lib/kobject.c:448 kobject_get()
[ 30.417402] [<c0105086>] show_trace_log_lvl+0x1a/0x30
[ 30.422564] [<c01057b2>] show_trace+0x12/0x14
[ 30.427031] [<c0105856>] dump_stack+0x16/0x18
[ 30.431501] [<c02d3c2b>] kobject_get+0x66/0x87
[ 30.436056] [<c02d3f1e>] kobject_shadow_add+0x10/0x1e8
[ 30.441312] [<c02d4100>] kobject_add+0xa/0xc
[ 30.445695] [<c06804c0>] kernel_param_sysfs_setup+0x4d/0x7f
[ 30.451376] [<c068067a>] param_sysfs_init+0x188/0x1c3
[ 30.456538] [<c06725bc>] init+0x144/0x26c
[ 30.460661] [<c0104cfb>] kernel_thread_helper+0x7/0x1c
[ 30.465907] =======================
[ 30.469486] get: c18f65c0 count after get is 2
[ 30.473927] kobject ipmi_si: registering. parent: <NULL>, set: module
[ 30.480372] kobject_uevent_env
[ 30.483430] fill_kobj_path: path = '/module/ipmi_si'
......
[ 73.266556] bus platform: add driver ipmi
[ 73.278013] kobject ipmi: registering. parent: <NULL>, set: drivers
[ 73.291847] kobject_uevent_env
[ 73.302358] fill_kobj_path: path = '/bus/platform/drivers/ipmi'
[ ...On http://test.kernel.org, elm3b132 is showing a similar error message for the IPS driver on 2.6.21-rc2-mm1. There is no such device in the machine so this is a normal error path for no devices found. The warning looks like BUG: atomic counter underflow at: [<c027ccb4>] kref_put+0x7d/0x9d [<c02a4e38>] bus_remove_driver+0x36/0x41 [<c02a5997>] driver_unregister+0xb/0x13 [<c0286bdd>] pci_unregister_driver+0xb/0x13 [<c04c653a>] ips_module_init+0x41/0x57 [<c04b2bd2>] do_initcalls+0x58/0xf5 [<c01353d5>] register_irq_proc+0x75/0x92 [<c04b2c97>] init+0x0/0x8b [<c04b2ce0>] init+0x49/0x8b [<c01030ff>] kernel_thread_helper+0x7/0x10 This is essentially identical to the warning on ipw2200. It occurs whether the driver is compiled into the kernel or as a module. Reverting Mike's patch fixes the problem - or at least the warning has disappeared. However, I've cc'd the IPS maintainers so they can confirm they are calling pci_unregister_driver() correctly. I note that many drivers in drivers/scsi do not call pci_unregister_driver() in the module_init path. ipw2200 also calls pci_unregister_driver() in the module_init path when an error is encountered. So, maybe this is an error in how the drivers use pci_[un]register_driver() -- -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | Re: [PATCH] optimized ktime_get[_ts] for GENERIC_TIME=y |
git: | |
| Junio C Hamano | Re: Some advanced index playing |
| Jeff King | Re: confusion over the new branch and merge config |
| Robin Rosenberg | Re: cvs2svn conversion directly to git ready for experimentation |
| Linus Torvalds | git binary size... |
| Ævar Arnfjörð Bjarmason | Re: Challenge with Git-Bash |
| Linux Kernel Mailing List | md: move allocation of ->queue from mddev_find to md_probe |
| Linux Kernel Mailing List | md: raid0: Represent zone->zone_offset in sectors. |
