Re: i82875p_edac: BAR 0 collision

Previous thread: [PATCH 0/4] I/OAT fixes by Maciej Sosnowski on Friday, November 7, 2008 - 4:45 am. (5 messages)

Next thread: [PATCH] ftrace: Allow section alignment by Matt Fleming on Friday, November 7, 2008 - 6:12 am. (7 messages)
From: Jarkko Lavinen
Date: Friday, November 7, 2008 - 4:43 am

When I try to load i82875p_edac module on ASUS P4C800 Deluxe in 2.6.26
or 2.6.27 it fails due to BAR 0 collision. On 2.6.25 the
i82875p_edac works just fine.

Should i82875p_setup_overfl_dev() do some additional work to fix the
missing resource of the hidden overflow device?

When I try load i82875p_edac module on 2.6.27 I get

  # modprobe i82875p_edac
  FATAL: Error inserting i82875p_edac
  (/lib/modules/2.6.27.4/kernel/drivers/edac/i82875p_edac.ko): No such device

And dmesg shows (from 2.6.27.4):

  EDAC MC: Ver: 2.1.0 Oct 30 2008
  EDAC DEBUG: edac_pci_dev_parity_clear()
  ...
  EDAC DEBUG: edac_pci_dev_parity_clear()
  EDAC DEBUG: edac_sysfs_setup_mc_kset()
  EDAC DEBUG: edac_sysfs_setup_mc_kset() Registered '.../edac/mc' kobject
  EDAC DEBUG: i82875p_init_one()
  EDAC i82875p: i82875p init one
  EDAC DEBUG: i82875p_probe1()
  PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
  pci 0000:00:06.0: device not available because of BAR 0 [0xfecf0000-0xfecf0fff] collisions
  EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow device
  EDAC DEBUG: 875p init fail

On 2.6.25.19 loading i82875p_edac works just fine and dmesg shows:

  EDAC MC: Ver: 2.1.0 Nov  4 2008
  EDAC i82875p: i82875p init one
  EDAC MC0: Giving out device to 'i82875p_edac' 'i82875p': DEV 0000:00:00.0
  EDAC PCI0: Giving out device to module 'i82875p_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)

Cheers
Jarkko Lavinen
--

From: Andrew Morton
Date: Tuesday, November 11, 2008 - 1:01 am

(cc's added)


Might be an EDAC driver regression.  It might also be a consequence of
PCI address space management fiddlings, but I think most of the changes
there post-date 2.6.26?

--

From: Jarkko Lavinen
Date: Wednesday, November 12, 2008 - 2:06 pm

I tried having pci_bus_assign_resources() before enabling the
overflow device in 2.6.27.  The driver gets now loaded, but then 
I get false memory error reports and removing the module segfaults.

Those false memory errors with i82875p edac driver are nothing
new have been reported many times over the past years. On most
kernels i82875p_edac is totally unusable because of these false
errors.

I ran "modprobe i82875p_edac; sleep 5; rmmod i82875p_edac" in
2.6.27.5.  i82875p_setup_overfl_dev() is modified to use
pci_bus_assign_resources() and to be more verbose:

[  235.248583] EDAC DEBUG: i82875p_init_one()
[  235.248592] EDAC i82875p: i82875p init one
[  235.248689] EDAC DEBUG: i82875p_probe1()
[  235.248700] i82875p_setup_overfl_dev pci_get_device: 00000000
[  235.248813] pci 0000:00:06.0: found [8086/257e] class 000880 header type 00
[  235.248827] PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
[  235.248875] pci 0000:00:06.0: calling pci_fixup_transparent_bridge+0x0/0x2b
[  235.249676] i82875p_setup_overfl_dev pci_scan_device: f6c4f000
[  235.250982] i82875p_setup_overfl_dev calling pci_bus_assign_resources for dev->bus before enabling the device
[  235.251178] pci 0000:00:06.0: BAR 0: got res [0x88101000-0x88101fff] bus [0x88101000-0x88101fff] flags 0x20200
[  235.251190] pci 0000:00:06.0: BAR 0: moved to bus [0x88101000-0x88101fff] flags 0x20200
[  235.251196] pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
[  235.251290] pci 0000:00:01.0:   IO window: disabled
[  235.251386] pci 0000:00:01.0:   MEM window: 0xfd900000-0xfe9fffff
[  235.251483] pci 0000:00:01.0:   PREFETCH window: 0x000000f3f00000-0x000000f7efffff
[  235.251631] pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02
[  235.251727] pci 0000:00:1e.0:   IO window: 0xd000-0xdfff
[  235.251823] pci 0000:00:1e.0:   MEM window: 0xfea00000-0xfeafffff
[  235.251921] pci 0000:00:1e.0:   PREFETCH window: 0x00000088000000-0x000000880fffff
[  235.252092] EDAC DEBUG: edac_mc_register_sysfs_main_kobj()
[  ...
From: Jarkko Lavinen
Date: Tuesday, November 18, 2008 - 2:57 pm

I can get around the modprobe problem by adding the missing resource after 
the hidden overflow device is revealed. The diff below is against 2.6.27.

	--- a/drivers/edac/i82875p_edac.c
	+++ b/drivers/edac/i82875p_edac.c
	@@ -295,6 +295,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
	                                "%s(): pci_bus_add_device() Failed\n",
	                                __func__);
	                }
	+               pci_bus_assign_resources(dev->bus);
	        }
	 
	        *ovrfl_pdev = dev;

The access violation when doing "rmmod i82875p_edac" occurs
because the module exit function i82875p_exit() runs
pci_unregister_driver() while the edac_mc_workq_function() is
scheduled to be run.  When the work queue runs, it accesses
something not available anymore.

	static void __exit i82875p_exit(void)
	{
		debugf3("%s()\n", __func__);

	+	mci_saved->op_state = OP_OFFLINE;

		pci_unregister_driver(&i82875p_driver);

		if (!i82875p_registered) {
			i82875p_remove_one(mci_pdev);
			pci_dev_put(mci_pdev);
		}
	}
		

I tried to stop the workqueue by saving mci pointer at the time
of its allocation and just set its state to OFFLINE just before
calling pci_unregister_driver.

This isn't the right way to remove the module and edac_dore has 
refcount 2 after i82875p_edac has been removed. The refcount 
should be 0.

Cheers
Jarkko Lavinen
--

From: Jarkko Lavinen
Date: Sunday, November 23, 2008 - 1:44 pm

There were 3 issues:
 1, PCI resources of the hidden overflow device had to be added separately
 2. In nodule exit there is one more mci struct kobject put than
     correspondig gets.
 3. The edac_mc waitqueue must be stopped before the polled memory
    area disappears.

Th attached patch fixes these problems. Module loads and removes now
without problems, with pci, edac, slab and kobject debug options
enabled.

i82875p code looks so simlar to i82875p that is has likely the same
ref count issues as i82875p.

Cheers
Jarkko Lavinen

From 3abc62242a219d1466b32b59bb88b4c1b0e86a65 Mon Sep 17 00:00:00 2001
From: Jarkko Lavinen <jlavi@iki.fi>
Date: Sun, 23 Nov 2008 22:18:39 +0200
Subject: [PATCH] i82875p_edac: Fix module init and exit

The PCI resources of the hidden overflow device are missing after the overflow
device is revealed and resources must be added separately.

When exiting both edac_remove_sysfs_mci_device() in edac_mc_del_mc()
and edac_mc_free() in i82875p_remove_one() decrement the mci ref count.
Use an additional kobject_get() to keep mci valid edac_mc_del_mc() till the
final edac_mc_free().

Also i82875p_remove_one() should be called before pci_unregister_driver()
to stop the polling before the checked memory area disappearr.

Signed-off-by: Jarkko Lavinen <jlavi@iki.fi>
---
 drivers/edac/i82875p_edac.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index e43bdc4..ebb037b 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -182,8 +182,6 @@ static struct pci_dev *mci_pdev;	/* init dev: in case that AGP code has
 					 * already registered driver
 					 */
 
-static int i82875p_registered = 1;
-
 static struct edac_pci_ctl_info *i82875p_pci;
 
 static void i82875p_get_error_info(struct mem_ctl_info *mci,
@@ -295,6 +293,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
 				"%s(): pci_bus_add_device() Failed\n",
 ...
From: Jesse Barnes
Date: Tuesday, December 16, 2008 - 12:39 pm

It's a little funky for a driver to be calling pci_bus_assign_resources; but 
if it really does act like a bus (sounds like this device does) it might be 
appropriate.  I don't have any problems with the patch, but it's really up to 
Doug.

-- 
Jesse Barnes, Intel Open Source Technology Center

--

Previous thread: [PATCH 0/4] I/OAT fixes by Maciej Sosnowski on Friday, November 7, 2008 - 4:45 am. (5 messages)

Next thread: [PATCH] ftrace: Allow section alignment by Matt Fleming on Friday, November 7, 2008 - 6:12 am. (7 messages)