Hello, openSUSE11.3 32bit kernels hang when installed to the VMware's VMs because Moorestown fixed capabilities detection code enters endless loop on Intel's AGP bridges (with device ID=7191). See https://bugzilla.kernel.org/show_bug.cgi?id=15888 for additional details. arch/x86/pci/mrst.c was introduced after 2.6.33, so only 2.6.34-rcX are affected. Thanks, Petr Vandrovec commit 11a35e56ad8275cbf62882d9c0dc2f17c2b5628b Author: Petr Vandrovec <petr@vandrovec.name> Date: Fri Apr 30 19:17:43 2010 -0700 Do not expect PCI devices to return zeroes in PCIe space There is no reason why old pre-PCIe/PCI-X devices should return zeroes when configuration space above 0x100 is accessed. If these devices decode just low 8 bits of register number, conventional space repeats 15 times in PCIe config space. And Moorestown parser for fixed bars then can enter endless loop when finding Intel AGP bridge device 0x7191 with secondary latency timer programmed to 0x40 - when such device is encountered, code will enter endless loop of reading registers 0x718 (reading 0x40010100) and 0x400 (reading 0x71918086). This change adds additional condition to the test: if device id/vendor match first PCIe capability, then device is not really PCIe. It should not cause any problems: fixed_bar_cap is invoked only on Intel's devices, so only time there is possibilty to have false match would be if first PCIe capability would have ID 0x8086, and even then that address of next capability pointer and capability version will match device ID seems highly unlikely. This fix unbreaks 32bit 2.6.33+ kernels configured with Moorestown support to boot on AMD rev 10h+ processors under VMware in VMs which lack PCIe support. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c index 8bf2fcb..cd6c277 100644 --- a/arch/x86/pci/mrst.c +++ ...
Hi Petr,
There are other code in the kernel makes similar assumption of accessing pci cfg above 0x100. (but they do not hang in a loop)
e.g. in drivers/pci/probe.c
* accesses, or the device is behind a reverse Express bridge. So we try
* reading the dword at 0x100 which must either be 0 or a valid extended
* capability header.
*/
int pci_cfg_space_size_ext(struct pci_dev *dev)
{
u32 status;
int pos = PCI_CFG_SPACE_SIZE;
if (pci_read_config_dword(dev, pos, &status) != PCIBIOS_SUCCESSFUL)
goto fail;
if (status == 0xffffffff)
goto fail;
Back to the problem itself, hpa has suggested a better fix might be using cfg_size for checking in fixed_bar_cap. But we can not use it right now since we have cfg_size set to 0x100 on MRST (due to lack of PCI_CAP_ID_EXP in the PCI shim). I will negotiate with FW guys so that we have the correct return from pci_cfg_space_size() for Moorestown.
Until then, your current fix should be good.
Thanks,
--
Thanks. Other possibility would be to modify amd_bus.c to verify that
ENABLE_CF8_EXT_CFG bit in MSR_AMD64_NB_CFG is actually writeable, and
not set PCI_HAS_IO_ECS if bit is read-as-zero. That would fix both
Moorestown code as well as pci_cfg_space_size_ext - patch below can be
applied instead of mrst.c changes.
Petr
Do not report AMD processors in VMware as ECS capable
In a VM AMD processors do not have integrated northbridge, and so their
northbridge-related MSRs do not work, and do not enable PCIe configuration
space accesses via I/O ports 0xCF8/0xCFC. Virtualized processor can be
detected by having NB_CFG register read-only.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index fc1e8fe..cf03bff 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -349,6 +349,8 @@ static int __init early_fill_mp_bus_info(void)
#define ENABLE_CF8_EXT_CFG (1ULL << 46)
+static int ecs_ok = 1;
+
static void enable_pci_io_ecs(void *unused)
{
u64 reg;
@@ -356,6 +358,10 @@ static void enable_pci_io_ecs(void *unused)
if (!(reg & ENABLE_CF8_EXT_CFG)) {
reg |= ENABLE_CF8_EXT_CFG;
wrmsrl(MSR_AMD64_NB_CFG, reg);
+ /* VMware implements NB_CFG MSR as read-only. Verify write worked... */
+ rdmsrl(MSR_AMD64_NB_CFG, reg);
+ if (!(reg & ENABLE_CF8_EXT_CFG))
+ ecs_ok = 0;
}
}
@@ -390,7 +396,8 @@ static int __init pci_io_ecs_init(void)
for_each_online_cpu(cpu)
amd_cpu_notify(&amd_cpu_notifier, (unsigned long)CPU_ONLINE,
(void *)(long)cpu);
- pci_probe |= PCI_HAS_IO_ECS;
+ if (ecs_ok)
+ pci_probe |= PCI_HAS_IO_ECS;
return 0;
}
--
Hi Petr, Could you check if this patch fixes your problem, and if so let me know as soon as possible? Sorry for the delay. Thanks, -hpa
Thanks for the fix. Yes, it fixes hang too, and seems much nicer.. Petr --
Commit-ID: e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e Gitweb: http://git.kernel.org/tip/e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e Author: H. Peter Anvin <hpa@linux.intel.com> AuthorDate: Fri, 14 May 2010 13:55:57 -0700 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Fri, 14 May 2010 13:55:57 -0700 x86, mrst: Don't blindly access extended config space Do not blindly access extended configuration space unless we actively know we're on a Moorestown platform. The fixed-size BAR capability lives in the extended configuration space, and thus is not applicable if the configuration space isn't appropriately sized. This fixes booting certain VMware configurations with CONFIG_MRST=y. Moorestown will add a fake PCI-X 266 capability to advertise the presence of extended configuration space. Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Jacob Pan <jacob.jun.pan@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com> --- arch/x86/pci/mrst.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c index 8bf2fcb..1cdc02c 100644 --- a/arch/x86/pci/mrst.c +++ b/arch/x86/pci/mrst.c @@ -247,6 +247,10 @@ static void __devinit pci_fixed_bar_fixup(struct pci_dev *dev) u32 size; int i; + /* Must have extended configuration space */ + if (dev->cfg_size < PCIE_CAP_OFFSET + 4) + return; + /* Fixup the BAR sizes for fixed BAR devices and make them unmoveable */ offset = fixed_bar_cap(dev->bus, dev->devfn); if (!offset || PCI_DEVFN(2, 0) == dev->devfn || --
