Re: [PATCH 2.6.34-rcX] Do not expect PCI devices to return zeroes in PCIe space

Previous thread: [patch v2.2 0/4] IPVS full NAT support + netfilter 'ipvs' match support by Simon Horman on Friday, April 30, 2010 - 8:20 pm. (12 messages)

Next thread: [PATCH] Staging: comedi: ssc_dnp: adjusted comments: coding style issue by Dustin Dorroh on Friday, April 30, 2010 - 9:22 pm. (2 messages)
From: Petr Vandrovec
Date: Friday, April 30, 2010 - 7:54 pm

Hello,
  openSUSE11.3 32bit kernels hang when installed to the VMware's VMs because Moorestown
fixed capabilities detection code enters endless loop on Intel's AGP bridges (with
device ID=7191).  See https://bugzilla.kernel.org/show_bug.cgi?id=15888 for additional
details.  arch/x86/pci/mrst.c was introduced after 2.6.33, so only 2.6.34-rcX are
affected.
				Thanks,
					Petr Vandrovec


commit 11a35e56ad8275cbf62882d9c0dc2f17c2b5628b
Author: Petr Vandrovec <petr@vandrovec.name>
Date:   Fri Apr 30 19:17:43 2010 -0700

    Do not expect PCI devices to return zeroes in PCIe space

    There is no reason why old pre-PCIe/PCI-X devices should return zeroes when
    configuration space above 0x100 is accessed.  If these devices decode just
    low 8 bits of register number, conventional space repeats 15 times in
    PCIe config space.  And Moorestown parser for fixed bars then can enter
    endless loop when finding Intel AGP bridge device 0x7191 with secondary
    latency timer programmed to 0x40 - when such device is encountered, code
    will enter endless loop of reading registers 0x718 (reading 0x40010100)
    and 0x400 (reading 0x71918086).

    This change adds additional condition to the test: if device id/vendor
    match first PCIe capability, then device is not really PCIe.  It should
    not cause any problems: fixed_bar_cap is invoked only on Intel's devices,
    so only time there is possibilty to have false match would be if first
    PCIe capability would have ID 0x8086, and even then that address of
    next capability pointer and capability version will match device ID seems
    highly unlikely.

    This fix unbreaks 32bit 2.6.33+ kernels configured with Moorestown
    support to boot on AMD rev 10h+ processors under VMware in VMs which
    lack PCIe support.
    
    Signed-off-by: Petr Vandrovec <petr@vandrovec.name>

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 8bf2fcb..cd6c277 100644
--- a/arch/x86/pci/mrst.c
+++ ...
From: Pan, Jacob jun
Date: Monday, May 3, 2010 - 11:21 pm

Hi Petr,

There are other code in the kernel makes similar assumption of accessing pci cfg above 0x100. (but they do not hang in a loop)
e.g. in drivers/pci/probe.c
* accesses, or the device is behind a reverse Express bridge.  So we try
 * reading the dword at 0x100 which must either be 0 or a valid extended
 * capability header.
 */
int pci_cfg_space_size_ext(struct pci_dev *dev)
{
	u32 status;
	int pos = PCI_CFG_SPACE_SIZE;

	if (pci_read_config_dword(dev, pos, &status) != PCIBIOS_SUCCESSFUL)
		goto fail;
	if (status == 0xffffffff)
		goto fail;


Back to the problem itself, hpa has suggested a better fix might be using cfg_size for checking in fixed_bar_cap. But we can not use it right now since we have cfg_size set to 0x100 on MRST (due to lack of PCI_CAP_ID_EXP in the PCI shim). I will negotiate with FW guys so that we have the correct return from pci_cfg_space_size() for Moorestown.

Until then, your current fix should be good.

Thanks,


--

From: Petr Vandrovec
Date: Tuesday, May 4, 2010 - 12:31 am

Thanks.  Other possibility would be to modify amd_bus.c to verify that
ENABLE_CF8_EXT_CFG bit in MSR_AMD64_NB_CFG is actually writeable, and
not set PCI_HAS_IO_ECS if bit is read-as-zero.  That would fix both
Moorestown code as well as pci_cfg_space_size_ext - patch below can be
applied instead of mrst.c changes.
							Petr


Do not report AMD processors in VMware as ECS capable

In a VM AMD processors do not have integrated northbridge, and so their
northbridge-related MSRs do not work, and do not enable PCIe configuration
space accesses via I/O ports 0xCF8/0xCFC.  Virtualized processor can be
detected by having NB_CFG register read-only.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index fc1e8fe..cf03bff 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -349,6 +349,8 @@ static int __init early_fill_mp_bus_info(void)

  #define ENABLE_CF8_EXT_CFG      (1ULL << 46)

+static int ecs_ok = 1;
+
  static void enable_pci_io_ecs(void *unused)
  {
  	u64 reg;
@@ -356,6 +358,10 @@ static void enable_pci_io_ecs(void *unused)
  	if (!(reg & ENABLE_CF8_EXT_CFG)) {
  		reg |= ENABLE_CF8_EXT_CFG;
  		wrmsrl(MSR_AMD64_NB_CFG, reg);
+		/* VMware implements NB_CFG MSR as read-only.  Verify write worked... */
+		rdmsrl(MSR_AMD64_NB_CFG, reg);
+		if (!(reg & ENABLE_CF8_EXT_CFG))
+			ecs_ok = 0;
  	}
  }

@@ -390,7 +396,8 @@ static int __init pci_io_ecs_init(void)
  	for_each_online_cpu(cpu)
  		amd_cpu_notify(&amd_cpu_notifier, (unsigned long)CPU_ONLINE,
  			       (void *)(long)cpu);
-	pci_probe |= PCI_HAS_IO_ECS;
+	if (ecs_ok)
+		pci_probe |= PCI_HAS_IO_ECS;

  	return 0;
  }
--

From: H. Peter Anvin
Date: Friday, May 14, 2010 - 11:37 am

Hi Petr,

Could you check if this patch fixes your problem, and if so let me know
as soon as possible?

Sorry for the delay.

Thanks,

	-hpa
From: Petr Vandrovec
Date: Friday, May 14, 2010 - 1:51 pm

Thanks for the fix.  Yes, it fixes hang too, and seems much nicer..

Petr
--

From: tip-bot for H. Peter Anvin
Date: Friday, May 14, 2010 - 2:39 pm

Commit-ID:  e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e
Gitweb:     http://git.kernel.org/tip/e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e
Author:     H. Peter Anvin <hpa@linux.intel.com>
AuthorDate: Fri, 14 May 2010 13:55:57 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 14 May 2010 13:55:57 -0700

x86, mrst: Don't blindly access extended config space

Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform.  The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.

This fixes booting certain VMware configurations with CONFIG_MRST=y.

Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.

Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>
---
 arch/x86/pci/mrst.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 8bf2fcb..1cdc02c 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -247,6 +247,10 @@ static void __devinit pci_fixed_bar_fixup(struct pci_dev *dev)
 	u32 size;
 	int i;
 
+	/* Must have extended configuration space */
+	if (dev->cfg_size < PCIE_CAP_OFFSET + 4)
+		return;
+
 	/* Fixup the BAR sizes for fixed BAR devices and make them unmoveable */
 	offset = fixed_bar_cap(dev->bus, dev->devfn);
 	if (!offset || PCI_DEVFN(2, 0) == dev->devfn ||
--

Previous thread: [patch v2.2 0/4] IPVS full NAT support + netfilter 'ipvs' match support by Simon Horman on Friday, April 30, 2010 - 8:20 pm. (12 messages)

Next thread: [PATCH] Staging: comedi: ssc_dnp: adjusted comments: coding style issue by Dustin Dorroh on Friday, April 30, 2010 - 9:22 pm. (2 messages)