When I try to load i82875p_edac module on ASUS P4C800 Deluxe in 2.6.26 or 2.6.27 it fails due to BAR 0 collision. On 2.6.25 the i82875p_edac works just fine. Should i82875p_setup_overfl_dev() do some additional work to fix the missing resource of the hidden overflow device? When I try load i82875p_edac module on 2.6.27 I get # modprobe i82875p_edac FATAL: Error inserting i82875p_edac (/lib/modules/2.6.27.4/kernel/drivers/edac/i82875p_edac.ko): No such device And dmesg shows (from 2.6.27.4): EDAC MC: Ver: 2.1.0 Oct 30 2008 EDAC DEBUG: edac_pci_dev_parity_clear() ... EDAC DEBUG: edac_pci_dev_parity_clear() EDAC DEBUG: edac_sysfs_setup_mc_kset() EDAC DEBUG: edac_sysfs_setup_mc_kset() Registered '.../edac/mc' kobject EDAC DEBUG: i82875p_init_one() EDAC i82875p: i82875p init one EDAC DEBUG: i82875p_probe1() PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff] pci 0000:00:06.0: device not available because of BAR 0 [0xfecf0000-0xfecf0fff] collisions EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow device EDAC DEBUG: 875p init fail On 2.6.25.19 loading i82875p_edac works just fine and dmesg shows: EDAC MC: Ver: 2.1.0 Nov 4 2008 EDAC i82875p: i82875p init one EDAC MC0: Giving out device to 'i82875p_edac' 'i82875p': DEV 0000:00:00.0 EDAC PCI0: Giving out device to module 'i82875p_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED) Cheers Jarkko Lavinen --
(cc's added) Might be an EDAC driver regression. It might also be a consequence of PCI address space management fiddlings, but I think most of the changes there post-date 2.6.26? --
I tried having pci_bus_assign_resources() before enabling the overflow device in 2.6.27. The driver gets now loaded, but then I get false memory error reports and removing the module segfaults. Those false memory errors with i82875p edac driver are nothing new have been reported many times over the past years. On most kernels i82875p_edac is totally unusable because of these false errors. I ran "modprobe i82875p_edac; sleep 5; rmmod i82875p_edac" in 2.6.27.5. i82875p_setup_overfl_dev() is modified to use pci_bus_assign_resources() and to be more verbose: [ 235.248583] EDAC DEBUG: i82875p_init_one() [ 235.248592] EDAC i82875p: i82875p init one [ 235.248689] EDAC DEBUG: i82875p_probe1() [ 235.248700] i82875p_setup_overfl_dev pci_get_device: 00000000 [ 235.248813] pci 0000:00:06.0: found [8086/257e] class 000880 header type 00 [ 235.248827] PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff] [ 235.248875] pci 0000:00:06.0: calling pci_fixup_transparent_bridge+0x0/0x2b [ 235.249676] i82875p_setup_overfl_dev pci_scan_device: f6c4f000 [ 235.250982] i82875p_setup_overfl_dev calling pci_bus_assign_resources for dev->bus before enabling the device [ 235.251178] pci 0000:00:06.0: BAR 0: got res [0x88101000-0x88101fff] bus [0x88101000-0x88101fff] flags 0x20200 [ 235.251190] pci 0000:00:06.0: BAR 0: moved to bus [0x88101000-0x88101fff] flags 0x20200 [ 235.251196] pci 0000:00:01.0: PCI bridge, secondary bus 0000:01 [ 235.251290] pci 0000:00:01.0: IO window: disabled [ 235.251386] pci 0000:00:01.0: MEM window: 0xfd900000-0xfe9fffff [ 235.251483] pci 0000:00:01.0: PREFETCH window: 0x000000f3f00000-0x000000f7efffff [ 235.251631] pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02 [ 235.251727] pci 0000:00:1e.0: IO window: 0xd000-0xdfff [ 235.251823] pci 0000:00:1e.0: MEM window: 0xfea00000-0xfeafffff [ 235.251921] pci 0000:00:1e.0: PREFETCH window: 0x00000088000000-0x000000880fffff [ 235.252092] EDAC DEBUG: edac_mc_register_sysfs_main_kobj() [ ...
I can get around the modprobe problem by adding the missing resource after
the hidden overflow device is revealed. The diff below is against 2.6.27.
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -295,6 +295,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
"%s(): pci_bus_add_device() Failed\n",
__func__);
}
+ pci_bus_assign_resources(dev->bus);
}
*ovrfl_pdev = dev;
The access violation when doing "rmmod i82875p_edac" occurs
because the module exit function i82875p_exit() runs
pci_unregister_driver() while the edac_mc_workq_function() is
scheduled to be run. When the work queue runs, it accesses
something not available anymore.
static void __exit i82875p_exit(void)
{
debugf3("%s()\n", __func__);
+ mci_saved->op_state = OP_OFFLINE;
pci_unregister_driver(&i82875p_driver);
if (!i82875p_registered) {
i82875p_remove_one(mci_pdev);
pci_dev_put(mci_pdev);
}
}
I tried to stop the workqueue by saving mci pointer at the time
of its allocation and just set its state to OFFLINE just before
calling pci_unregister_driver.
This isn't the right way to remove the module and edac_dore has
refcount 2 after i82875p_edac has been removed. The refcount
should be 0.
Cheers
Jarkko Lavinen
--
There were 3 issues:
1, PCI resources of the hidden overflow device had to be added separately
2. In nodule exit there is one more mci struct kobject put than
correspondig gets.
3. The edac_mc waitqueue must be stopped before the polled memory
area disappears.
Th attached patch fixes these problems. Module loads and removes now
without problems, with pci, edac, slab and kobject debug options
enabled.
i82875p code looks so simlar to i82875p that is has likely the same
ref count issues as i82875p.
Cheers
Jarkko Lavinen
From 3abc62242a219d1466b32b59bb88b4c1b0e86a65 Mon Sep 17 00:00:00 2001
From: Jarkko Lavinen <jlavi@iki.fi>
Date: Sun, 23 Nov 2008 22:18:39 +0200
Subject: [PATCH] i82875p_edac: Fix module init and exit
The PCI resources of the hidden overflow device are missing after the overflow
device is revealed and resources must be added separately.
When exiting both edac_remove_sysfs_mci_device() in edac_mc_del_mc()
and edac_mc_free() in i82875p_remove_one() decrement the mci ref count.
Use an additional kobject_get() to keep mci valid edac_mc_del_mc() till the
final edac_mc_free().
Also i82875p_remove_one() should be called before pci_unregister_driver()
to stop the polling before the checked memory area disappearr.
Signed-off-by: Jarkko Lavinen <jlavi@iki.fi>
---
drivers/edac/i82875p_edac.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index e43bdc4..ebb037b 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -182,8 +182,6 @@ static struct pci_dev *mci_pdev; /* init dev: in case that AGP code has
* already registered driver
*/
-static int i82875p_registered = 1;
-
static struct edac_pci_ctl_info *i82875p_pci;
static void i82875p_get_error_info(struct mem_ctl_info *mci,
@@ -295,6 +293,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
"%s(): pci_bus_add_device() Failed\n",
...It's a little funky for a driver to be calling pci_bus_assign_resources; but if it really does act like a bus (sounds like this device does) it might be appropriate. I don't have any problems with the patch, but it's really up to Doug. -- Jesse Barnes, Intel Open Source Technology Center --
