This Dell Precision WorkStation T3400 doesn't boot 2.6.34-rc1 (tried 522dba71). 2.6.33 was fine, and it's been running various stable kernels for the last 18 months. Unfortunately I can't reasonably bisect as I need this machine to be usable, but I can test specific patches or options. (three or four reboots is fine, 15 is not.) full dmesg from failing boot and a successful boot at http://web.hexapodia.org/~adi/tmp/20100406-pci-ahci-reset-fail/ I suspect it's due to: [ 3.094038] pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff] [ 3.103001] pci 0000:00:1f.2: can't reserve [mem 0xff970000-0xff9707ff] so I've CCed a few recent committers to setup-res.c. dmesg up to point of failure: [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 2.6.34-rc1-00005-g522dba7 (andy@farthing) (gcc version 4.3.3 (Debian 4.3.3-5) ) #4 SMP Tue Apr 6 12:20:02 PDT 2010 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc1-00005-g522dba7 root=UUID=a2359eda-9295-451c-924f-c181c6f49d0d ro console=tty1 console=ttyS0,115200 [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009ec00 (usable) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 00000000bfe01c00 (usable) [ 0.000000] BIOS-e820: 00000000bfe01c00 - 00000000bfe53c00 (ACPI NVS) [ 0.000000] BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data) [ 0.000000] BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved) [ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) [ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) [ 0.000000] BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) [ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) [ 0.000000] BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] ...
can you try to boot with pci=nocrs ? also please check with -rc4. YH --
All I see in git is -rc3 + 299 commits ending with 0fdf867... That still fails with the same "controller reset failed" message from ahci. -andy --
Thanks a lot for reporting this! No need to bisect it. I'm pretty sure 2.6.34-rc1 will boot fine if you use "pci=use_crs" (obviously that's only a temporary workaround until we fix the real problem). The BIOS apparently reported this window: pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff] which doesn't enclose the [mem 0xff970000-0xff9707ff] region where BIOS put AHCI device, so we moved the AHCI device. Unfortunately, we put it at [mem 0x000a0000-0x000a07ff], which wasn't a very good choice because that's probably already used by a VGA device. If you happen to have Windows on this box, I'd love to know whether *it* moves the AHCI device, too, or whether Windows interprets the BIOS information differently than we do. If you have Windows and can collect screenshots of the Device Manager resources for the PCI bus and the AHCI controller, that would be a good start. Would you mind trying the patch below and the patch and kernel args here: https://bugzilla.kernel.org/show_bug.cgi?id=15533#c5 This will (1) reserve the VGA area, so we should put the AHCI device elsewhere, and (2) collect a few more details about exactly what the BIOS is reporting. Bjorn commit 46b6e80aae2ec1d073767c92bba1d98896bce700 Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Apr 6 21:44:12 2010 -0600 diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h index 86b1506..f4c0fe4 100644 --- a/arch/x86/include/asm/setup.h +++ b/arch/x86/include/asm/setup.h @@ -44,7 +44,6 @@ static inline void visws_early_detect(void) { } extern unsigned long saved_video_mode; extern void reserve_standard_io_resources(void); -extern void i386_reserve_resources(void); extern void setup_default_timer_irq(void); #ifdef CONFIG_X86_MRST diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index b2e2460..966b37f 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -22,7 +22,6 @@ static void __init ...
pci=nocrs worked on 2.6.34-rc3-00299-g0fdf867. I won't be back in front
The machine has one VGA controller exposed currently; there may be
another integrated Intel video controller on the motherboard and
disabled by the BIOS.
01:00.0 VGA compatible controller: nVidia Corporation Quadro NVS 290 (rev a1) (prog-if 00 [VGA controller])
Subsystem: nVidia Corporation Device 0492
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at dc80 [size=128]
Expansion ROM at fde00000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nouveau
The machine only has Linux installed, but I may have access to another
I'll try that on Thursday.
-andy
--
Oops, sorry, I meant it would probably work with "pci=nocrs", as you already confirmed. Don't bother trying "pci=use_crs"; that's turned on Great, thanks! Oh, and I forgot to ask: what BIOS version are you running? Google found several reports of USB issues in Windows on this box, e.g., http://tim.cexx.org/?p=529 . I think we still have a Linux bug in that we should be reserving the legacy VGA area, but if the BIOS is reporting an incorrect host bridge window, that will cause us to move the AHCI controller and tickle this bug when we wouldn't otherwise. Bjorn --
On another T3400 with BIOS A03, Win7's Device Manager -> IDE ATA/ATAPI
controllers -> Standard AHCI 1.0 -> Resources -> Memory Range setting is
ff97f800-ff97ffff. (If that's not the info you needed, let me know
BIOS Information
Vendor: Dell Inc.
Version: A04
I'll try the debug patch tomorrow morning.
-andy
--
Assuming this is the same AHCI controller (probably is, because I only see one mentioned in your logs), I think Win7 moved it from where BIOS left it. It probably started at 0xff970000, and Win7 moved it into one of the host bridge windows (but not the legacy VGA one): pci_root PNP0A03:00: host bridge window [mem 0xff980800-0xff980bff] pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff] pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff] Bjorn --
Yes, there's only one AHCI controller mentioned on either machine. -andy --
We established that the patch in the message above wasn't enough (the patch reserved 0xa0000-0xbffff, and Linux moved the AHCI controller to 0xc0000 instead of 0xa0000). But I'd still like to see the details of what ACPI is telling us, so if you wouldn't mind trying that patch from bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=15533#c5 and collecting an acpidump, and attaching both to the bug report: https://bugzilla.kernel.org/show_bug.cgi?id=15744 that would be great. Linux thinks the windows are: pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff] pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff] The 0xa0000-0xbffff one makes good sense. That's normally MMIO that's routed via PCI to the VGA device frame buffer, and we should be able to figure out how to avoid that area, e.g., by using BIOS info, PCI class codes, etc. Now we need to figure how to avoid the 0xc0000-0xeffff and 0xf0000-0xfffff windows. Maybe there's something special about how ACPI describes them. Or maybe we're just unlucky because these are the first windows in the _CRS list, and Linux tries them in order, while Windows uses a different strategy. Bjorn --
That's confusing, I think I figured it out but "try this patch" which links to a message that refers to another patch and some commandline options and some config options and doesn't say what the goal is, is a lot for me to parse since I don't actually understand what's going on here. I think I got it all: https://bugzilla.kernel.org/attachment.cgi?id=25969 https://bugzilla.kernel.org/attachment.cgi?id=25970 Let me know (using small words if necessary) if I screwed something up. Thanks, -andy --
Perhaps it's sufficient to try them in reverse order? --
Why bother? The first megabyte is really special in x86... it is historically used for legacy devices, it has specific functions for PCI firmware, and it has separate MTRRs. Simply put, "there there be dragons". There is no sane reason to allocate unassigned devices there (preassigned devices is another matter). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
I strongly suspects that Windows knows that < 1 MB is special, and only ever assigns it upon explicit allocation. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
I created a Bugzilla entry at https://bugzilla.kernel.org/show_bug.cgi?id=15744 for your bug report, please add your address to the CC list in there, thanks! -- Maciej Rutecki http://www.maciek.unixy.pl --
