This message contains a list of some regressions from 2.6.24 reported since 2.6.25-rc1 was released, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.24, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2008-03-22 159 35 31 2008-03-17 148 38 30 2008-03-16 146 42 35 2008-03-14 145 45 39 2008-03-12 143 51 41 2008-03-11 141 58 43 2008-03-10 138 66 47 2008-03-03 115 65 49 2008-02-25 90 51 39 2008-02-17 61 45 37 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9412 Subject : Since commit 'x86: enable iommu_merge by default' (948062683004d13ca21c8c05ac052d387978a449) 2.6 is no go on SB600 AHCI Submitter : Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au> Date : 2007-11-19 14:43 (124 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962 Subject : mount: could not find filesystem Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Date : 2008-02-12 14:34 (39 days old) References : http://lkml.org/lkml/2008/2/12/91 Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Yinghai Lu <yhlu.kernel@gmail.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976 Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops Submitter : Ben Nizette <bn@niasdigital.com> Date : 2008-02-12 12:46 (39 days old) References : http://lkml.org/lkml/2008/2/12/148 Handled-By : Haavard Skinnemoen <hskinnemoen@atm...
Hmm. I just tested mine, and it still works fine (current -git, obviously). But as usual, I tend to run different kernel configs than most people (no modules, only drivers that are actually useful). So I think this needs bisecting or something. Linus --
.. I wonder if this is related to the USB suspend/resume bug I see here? One way to tell, is to rebuild the kernel with CONFIG_USB_SUSPEND=n. If suspend/resume then works, we might well be looking at the same bug. -ml --
Hi, Okay, USB now works fine on resume after fixing the iaa_watchdog issue. But twice now, once before that fix, and again just a minute ago, my machine has failed completely to resume after suspend-to-RAM. Again, resume here is 100% reliable in 2.6.24 and earlier. Symptoms: the usual: -- 2.6.25-rc8 (and -rc7 earlier), plus USB fix. -- not reproduceable on demand. -- black screen of death -- no backlight, no kernel messages. -- no hard drive activity. -- no alt-sysrq-anything. -- 5 second hold of the power button to poweroff and then recover. Not much info to go on, but it's worth knowing there's an issue here somewhere. Cheers --
.. Mmmm... Now that 2.6.25-rc* is usable here, there's another symptom that's been happening regularly enough that it's got to be a regression of some sort. This machine has an ATI X1400 video card, which doesn't work with any open source X server that I know of. Maybe latest RadeonHD would work but it didn't when I tried it in January. So I'm using the ATI binary fglrx X server, but without their kernel module, so no 3D acceleration (fine.. only affects Google Earth, really). Now.. on 2.6.25, after a suspend/resume cycle (or three), the framebuffer frequently starts going wonky. "snowy" pixels appear, and stay. Just a moment ago here, the entire background changed to zebra stripes. And so on.. Peculiar stuff. I'm wondering if something to do with video mode switching, or video register save/restore, has changed since 2.6.24. Because it's broken in 2.6.25, yet works fine in all earlier kernels. ??? --
Yes, that's probably related. However, I'm unable to reproduce the breakage. If you can do that, it would probably help a lot if you identified the commit causing that to happen. I also wonder if this report might be related to it: http://marc.info/?l=linux-acpi&m=120669137831005&w=4 Thanks, Rafael --
.. Well, it would help then if you identified the commits which hacked at the video mode switching. Then I can revert those and see what happens. There's probably not many commits there, and since it doesn't do it consistently (and sometimes the machine just doesn't resume at all with 2.6.25), it is tricky to bisect. Works perfectly in 2.6.24, though, which is what I'm now running again. Thanks --
I guess this is related to http://bugzilla.kernel.org/show_bug.cgi?id=10319 . I have no idea what can cause that to happen. It apparently is not related to CONFIG_DRM (that actually may help if you have an Intel graphics adapter), so it looks like an ACPI issue. Thanks, Rafael --
.. Something's weird there. I poked at my suspend/resume script again this morning, and discovered that it did have some vbetool hacks to deal with textmode on earlier kernels. So I wrapped those vbetool hacks with some logic to prevent them on 2.6.25, and for now suspend/resume seem to be working with good video on -rc8. It'll take a bit longer to know for sure, but by the end of today we'll see. cheers --
I already tried this, without success: $ grep CONFIG_USB_SUSPEND /boot/config-2.6.25-rc7 # CONFIG_USB_SUSPEND is not set Regards, Tino --
I just tested current -git (a9edadbf790d72adf6ebed476cb5caf7743e7e4a), and it's still broken. I attached my kernel config. Regards, Tino
This bug is alas an orphan. It is /not/ handled by me, because it is
not an IEEE 1394 subsystem bug and I have no idea what to do about it.
Note the following:
- Several or all of ohci1394's MMIO reads return ~0 (all bits set
to one) --- or 0 --- where different values are expected.
- In http://lkml.org/lkml/2008/2/23/244 we get to see a
WARNING: at arch/x86/mm/ioremap.c:137 __ioremap+0xa7/0x16a()
which is "WARN_ON_ONCE(page_is_ram(pfn));".
After that, the failures start.
But before that, "Unknown symbol" messages pop up when ohci1394
is loaded. These symbols are implemented by ieee1394 on which
ohci1394 depends.
...
MMIO read expected 0x00008400, got 0x00000000.
MMIO read got random bits in the interrupt event register.
The values for Max Packet and IR/IT contexts came from bogus MMIO reads.
Thomas, you wrote in http://lkml.org/lkml/2008/3/17/316 that the problem
resurfaced.
- Are the "Unknown symbol"s still there? These are not supposed to
happen.
- Is the "WARNING: at arch/x86/mm/ioremap.c" still there?
- Can you reproduce it without the atheros driver?
Rafael, no matter how it looks, this is not "handled-by: me". :-)
I don't know nothing about kbuild nor about arch/x86/mm nor about WLAN.
The IEEE 1394 bits in the bug of which I know something about are purely
accidental.
--
Stefan Richter
-=====-==--- --== =-==-
http://arcgraph.de/sr/
--I removed the ath driver, see bug I moved ohci1394.ko + ieee1394.ko file manually to avoid automatic loading. then i used insmod on those files and forgot to insmod ieee1394 Yes. Seems so. --
Thanks. Summary from today's bugzilla comments: No ioremap warning, but MMIO reads still give bogus values and let ohci1394 fail. ohci1394 still got the MMIO region 0x1'0000'0000 - 0x1'0000'07ff, FWIW. The length of the region is correct, but its contents bogus. -- Stefan Richter -=====-==--- --== =-==- http://arcgraph.de/sr/ --
Can an MMIO region reside above 0x1'0000'0000 on x86-32? ... Apparently yes, if CONFIG_RESOURCES_64BIT=y. From Thomas' dmesg: http://bugzilla.kernel.org/attachment.cgi?id=15397 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ede00 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007f0c8000 (usable) BIOS-e820: 000000007f0c8000 - 000000007f2c9000 (ACPI NVS) BIOS-e820: 000000007f2c9000 - 000000007feb9000 (ACPI data) BIOS-e820: 000000007feb9000 - 000000007feef000 (ACPI NVS) BIOS-e820: 000000007feef000 - 000000007ff00000 (ACPI data) BIOS-e820: 000000007ff00000 - 0000000080000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fed14000 - 00000000fed1a000 (reserved) BIOS-e820: 00000000fed1c000 - 00000000fed20000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved) 1136MB HIGHMEM available. 896MB LOWMEM available. ... ACPI: RSDP 000FE020, 0024 (r2 APPLE ) ACPI: XSDT 7FEFD120, 0074 (r1 APPLE Apple00 55 1000013) ACPI: FACP 7FEFB000, 00F4 (r3 APPLE Apple00 55 Loki 5F) ACPI: DSDT 7FEF0000, 48C0 (r1 APPLE MacBookP 10001 INTL 20050309) From Thomas' .config: http://bugzilla.kernel.org/attachment.cgi?id=15396 CONFIG_RESOURCES_64BIT=y I have a Mac mini but run x86-64 on it. However, I've got another i945GM based PC with x86-32 and three OHCI-1394 controllers in it. I always had CONFIG_RESOURCES_64BIT switched off. I shall try CONFIG_RESOURCES_64BIT=y on that PC. Thomas, did you use CONFIG_RESOURCES_64BIT=y already under Linux 2.6.24? All, is there anything special that drivers need to take care of for CONFIG_RESOURCES_64BIT=y? We use resource_size_t ohci_base to request the MMIO region in drivers/ieee139...
Hmm. It would only work if PAE (HIGHMEM64G) is enabled too. And obviously the hardware has to have working 64-bit BAR's. AND no, I don't think our x86-32 ioremap() actually works for this case, because while the resource data may have the full 64 bits, when the ioremap() happens it gets truncated to 32 bits. Ingo/Thomas - should ioremap*() perhaps take "resource_size_t" or a "u64" for the address (and then "__ioremap()" should probably take a PFN, not a physical address, and that one can remain just a "unsigned long"?) Has anybody ever had a working 64-bit BAR on x86? Ivan? Maybe I'm missing something.. Linus --
Hmm. No idea. I look into that on monday (tomorrow is family Same here. Thanks, tglx --
Does this patch make any difference?
(ENTIRELY untested, I checked that it compiles on x86-64, but didn't even
test a 32-bit build, I'm hoping whoever sees this issue can also fix up
the inevitable small missed pieces)
Linus
---
arch/x86/mm/ioremap.c | 6 +++---
include/asm-x86/io_32.h | 6 +++---
include/asm-x86/io_64.h | 6 +++---
lib/iomap.c | 2 +-
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 8fe576b..4afaba0 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -106,7 +106,7 @@ static int ioremap_change_attr(unsigned long vaddr, unsigned long size,
* have to convert them into an offset in a page-aligned mapping, but the
* caller shouldn't need to know that small detail.
*/
-static void __iomem *__ioremap(unsigned long phys_addr, unsigned long size,
+static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size,
enum ioremap_mode mode)
{
unsigned long pfn, offset, last_addr, vaddr;
@@ -193,13 +193,13 @@ static void __iomem *__ioremap(unsigned long phys_addr, unsigned long size,
*
* Must be freed with iounmap.
*/
-void __iomem *ioremap_nocache(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
{
return __ioremap(phys_addr, size, IOR_MODE_UNCACHED);
}
EXPORT_SYMBOL(ioremap_nocache);
-void __iomem *ioremap_cache(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
{
return __ioremap(phys_addr, size, IOR_MODE_CACHED);
}
diff --git a/include/asm-x86/io_32.h b/include/asm-x86/io_32.h
index 58d2c45..d4d8fbd 100644
--- a/include/asm-x86/io_32.h
+++ b/include/asm-x86/io_32.h
@@ -114,13 +114,13 @@ static inline void * phys_to_virt(unsigned long address)
* If the area you are trying to map is a PCI BAR you should have a
* look at pci_iomap().
*/
-extern...Yes. With this patch applied the error messages doesn't appear anymore. $ git describe v2.6.25-rc6-268-gd2532dd $ git status # On branch master # Changed but not updated: # (use "git add <file>..." to update what will be committed) # # modified: arch/x86/mm/ioremap.c # modified: include/asm-x86/io_32.h # modified: include/asm-x86/io_64.h # modified: lib/iomap.c # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # arch/x86/boot/offsets.h no changes added to commit (use "git add" and/or "git commit -a") See also attached dmesg.
[ Ahh, I was reading email in the wrong order, so I saw your other email first. ] Does the firewire port work too? Linus --
Thomas, if you don't have anything to plug in, something like the following log messages after module loading should be enough confirmation that the issue is resolved: (new driver) ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19 firewire_ohci: Added fw-ohci device 0000:03:03.0, OHCI version 1.0 firewire_core: created device fw0: GUID 0017f2fffe66fb80, S400 (or old driver) ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19 ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[90000000-900007ff] Max Packet=[2048] IR/IT contexts=[8/8] ieee1394: Host added: ID:BUS[0-00:1023] GUID[0017f2fffe66fb80] -- Stefan Richter -=====-==--- --== ==--- http://arcgraph.de/sr/ --
ioremap() fails now. -- Stefan Richter -=====-==--- --== ==--- http://arcgraph.de/sr/ --
64-bit ioremaps never worked on 32-bit, so we are in totally unchartered
waters now - but due to the unification we have a realistic chance to
make them work. At minimum we need the fix below in addition to Linus'
patch - does it make any difference?
Ingo
---
arch/x86/mm/ioremap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: v/arch/x86/mm/ioremap.c
===================================================================
--- v.orig/arch/x86/mm/ioremap.c
+++ v/arch/x86/mm/ioremap.c
@@ -39,7 +39,7 @@ EXPORT_SYMBOL(__phys_addr);
int page_is_ram(unsigned long pagenr)
{
- unsigned long addr, end;
+ resource_size_t addr, end;
int i;
/*
@@ -109,7 +109,8 @@ static int ioremap_change_attr(unsigned
static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size,
enum ioremap_mode mode)
{
- unsigned long pfn, offset, last_addr, vaddr;
+ unsigned long pfn, offset, vaddr;
+ resource_size_t last_addr;
struct vm_struct *area;
pgprot_t prot;
--could you please try x86.git/latest: http://people.redhat.com/mingo/x86.git/README which has all the fixes and debug patches integrated (no extra patching should be needed). If it still doesnt work then please send the new dmesg and /sys/kernel/debug/kernel_page_tables dump. head 0fef904c33841be92f or later. Ingo --
The kernel boots, but hangs at "populating dev tree with devices usings uevents". After removing both firewire drivers (ohci1394 and firewire_ohci) the system comes up correctly. See attached file. Modprobing either ohci1394 or firewire_ohci seems to lock up the system. mfg thomas
that's weird. If you do the modprobe from a VGA console and do a 'dmesg -n 8', do you get any ioremap printk shortly before the hard lockup? Ingo --
All i get is: [ 1175.600768] ACPI: PCI Interrupt 0000:0c:03.0[A] -> GSI 19 (level, low) -> IRQ 19 no ioremap debug offset. --
basically, old ioremap did this: [ 162.485605] ACPI: PCI Interrupt 0000:0c:03.0[A] -> GSI 19 (level, low) -> IRQ 19 [ 162.485695] ioremap: 00000000(00000800) => f8978000 the theory (fact?) was that the zero physical address there (the '00000000') was some 4GB+ address truncated down to 32-bits. OTOH, before this system worked for you before, i start to suspect that ioremap is a red herring here and that it's the code that gets to that physical address (which is ioremap-ed) is at fault here. the hard hang might be your southbridge totally dumbfounded by the host OS attempting to do an MMIO access to an above-4GB address? so the question is - what physical address did that ioremap do in 2.6.24 (which presumly had a working ohci1394, right?), and why did it change to something else in -git? Ingo --
See file attachments of this bug: http://bugzilla.kernel.org/show_bug.cgi?id=10080 and compare lspci -vv from 2.6.25 and 2.6.24: 2.6.25: 0c:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) (prog-if 10 [OHCI]) Subsystem: Agere Systems FW323 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 248 (3000ns min, 6000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: [virtual] Memory at 100000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ohci1394 Kernel modules: firewire-ohci, ohci1394 2.6.24: 0c:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) (prog-if 10 [OHCI]) Subsystem: Agere Systems FW323 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 248 (3000ns min, 6000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at 8c000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME+ Kernel driver in use: ohci1394 Maybe. Is it important that i have an core duo? (32 bit only - not a the Is the lspci output sufficient for you? --
Ok, so it didn't use to be at the 4GB mark. This seems to be a PCI and resource alloc issue. It would be really interesting to see where the 4GB allocation started. Ie ignore anything else (warnings, driver loadings etc), and _just_ look at lspci -vv output for where the memory got allocated. Did you already bisect that and I just missed it? Linus --
I didn't bisect this.But I recompiled with CONFIG_RESOURCES_64BIT
*unset* and everything is fine now, i.e. no error occurs anymore.
lspci -vv output:
0c:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) (prog-if 10
[OHCI])
Subsystem: Agere Systems FW323
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B+ DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 248 (3000ns min, 6000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at 94406000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME+
Kernel driver in use: ohci1394
Kernel modules: firewire-ohci, ohci1394
This is on ingo's git tree (branch: x86-latest) (commit
0fef904c33841be92f5ebdbdb9339dd11a133c92 - x86: ioremap of 64-bit
resource on 32-bit kernel fix)
mfg
thomas
--Note that 'Memory at 100000000' is totally bogus as this is 32-bit BAR and respective PCI bridge is 32-bit either. Most interesting thing is that under 2.6.24 Tomas had PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: 8c000000-8c0fffff PREFETCH window: disabled. but with 2.6.25 PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: 0x00000000-0x000fffff PREFETCH window: disabled. I'm sure that MEM window was actually 0x100000000-0x1000fffff as that Yeah, it's resource issue for sure. The 00:1e.0 is a transparent bridge, so I blame commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f (PCI: remove transparent bridge sizing). It's wrong for two reasons: - we cannot ignore standard windows of a transparent bridge as they always positive decode, so they are potential source of address conflicts; - that patch just broke whole bridge setup logic in unpredictable way. What confused me a lot initially is that the patch was already there in 2.6.24. But I think that issue was somehow masked by 'unsigned longs' used in setup-bus.c all over the place instead of resource_size_t, which has been fixed by Ben in 2.6.25... I think that's why Thomas has everything working again without CONFIG_RESOURCES_64BIT. Thomas, can you put CONFIG_RESOURCES_64BIT=y back and either revert commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, or just comment out these two lines in drivers/pci/setup-bus.c: if (bus->self->transparent) break; and check if it helps? Ivan. --
alternatively, try x86.git/latest which has the revert below included. Ingo ------------------> Subject: pci: revert "PCI: remove transparent bridge sizing" From: Ingo Molnar <mingo@elte.hu> Date: Wed Mar 26 14:38:07 CET 2008 revert commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f as it is wrong. Requested by Ivan Kokshaysky. Signed-off-by-if-Thomas-reports-success: Ingo Molnar <mingo@elte.hu> --- drivers/pci/setup-bus.c | 5 ----- 1 file changed, 5 deletions(-) Index: linux-x86.q/drivers/pci/setup-bus.c =================================================================== --- linux-x86.q.orig/drivers/pci/setup-bus.c +++ linux-x86.q/drivers/pci/setup-bus.c @@ -486,12 +486,7 @@ void __ref pci_bus_size_bridges(struct p break; case PCI_CLASS_BRIDGE_PCI: - /* don't size subtractive decoding (transparent) - * PCI-to-PCI bridges */ - if (bus->self->transparent) - break; pci_bridge_check_ranges(bus); - /* fall through */ default: pbus_size_io(bus); /* If the bridge supports prefetchable range, size it --
Good news everyone.
Everything works again.
lspci -vv:
0c:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) (prog-if 10
[OHCI])
Subsystem: Agere Systems FW323
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B+ DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 248 (3000ns min, 6000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at 94400000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME+
Kernel driver in use: ohci1394
Kernel modules: firewire-ohci, ohci1394
See also attached cfs-debug-info for config and dmesg.
I tried ingos's x86/latest:
$ git describe
v2.6.25-rc7-684-g74d77fe
thanks to everybody for the help.
thomasAs the author of the "PCI: remove transparent bridge sizing" change, I apologize for the trouble it seems to have caused. The same change had also exposed an issue reported by Paul Martin that has been causing an Oops while hotplugging ThinkPads to a ThinkPad Dock II. re: http://lkml.org/lkml/2008/2/19/405 http://bugzilla.kernel.org/show_bug.cgi?id=9961 I have a fix for the ThinkPad docking Oops but if the issue being discussed here is caused by the transparent bridge sizing removal change I totally agree that it should be reverted. The transparent bridge sizing removal change was motivated by insufficient PCI memory resource for a transparent bridge window that was being created as a result of expansion ROM(s) being included in the transparent bridge sizing calculations. A later "PCI: Remove default PCI expansion ROM memory allocation" change ( re: http://lkml.org/lkml/2007/12/11/361 ) removes the expansion ROM(s) from the transparent bridge sizing calculations which actually resolves the original issue in a different manner. So, even if the "PCI: remove transparent bridge sizing" is not problematic it is no longer needed anyway. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc --
Ok, I reverted it, but the docking Oops is still interesting, in that it implies that some piece of code wasn't exactly robust. I haven't seen the patch, but it sounds like that should be fixed independently.. Linus --
Could be but without the sizing removal I doubt that the fix I was going to propose (see below) would be the same. What I had found is that the removal of the transparent bridge sizing was leaving the resource record for at least region 7 (IO) of a hot-added transparent bridge on the docking station in a state that was not palatable with later executed code in pdev_sort_resource(). Even though the restoration of the transparent bridge sizing corrects the problem, pdev_sort_resource() could probably use some bulletproofing. I will take a look at this. Thanks, Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc --- linux-2.6.25-rc6/drivers/pci/probe.c.orig 2008-03-20 12:09:14.000000000 -0700 +++ linux-2.6.25-rc6/drivers/pci/probe.c 2008-03-20 12:11:29.000000000 -0700 @@ -328,6 +328,8 @@ void __devinit pci_read_bridge_bases(str if (!res->end) res->end = limit + 0xfff; } + if (!res->start && dev->transparent) + res->flags = 0; res = child->resource[1]; pci_read_config_word(dev, PCI_MEMORY_BASE, &mem_base_lo); @@ -339,6 +341,8 @@ void __devinit pci_read_bridge_bases(str res->start = base; res->end = limit + 0xfffff; } + if (!res->start && dev->transparent) + res->flags = 0; res = child->resource[2]; pci_read_config_word(dev, PCI_PREF_MEMORY_BASE, &mem_base_lo); @@ -373,6 +377,8 @@ void __devinit pci_read_bridge_bases(str res->start = base; res->end = limit + 0xfffff; } + if (!res->start && dev->transparent) + res->flags = 0; } static struct pci_bus * pci_alloc_bus(void) --
Ahh, ok, so it was directly caused by simply not sizing and setting up the Thanks, Linus --
Indeed.
This should prevent an oops in all cases.
Ivan.
---
PCI: improved sanity check for pdev_sort_resources()
Prevent potential oops with unsized PCI bridge resources.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
---
drivers/pci/setup-res.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 4be7ccf..fb57c8b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -226,18 +226,17 @@ void pdev_sort_resources(struct pci_dev *dev, struct resource_list *head)
if (r->flags & IORESOURCE_PCI_FIXED)
continue;
- r_align = r->end - r->start;
-
+ r_align = (i < PCI_BRIDGE_RESOURCES) ? r->end - r->start + 1 :
+ r->start;
if (!(r->flags) || r->parent)
continue;
- if (!r_align) {
+ if (r_align <= 1) {
printk(KERN_WARNING "PCI: Ignore bogus resource %d "
"[%llx:%llx] of %s\n",
i, (unsigned long long)r->start,
(unsigned long long)r->end, pci_name(dev));
continue;
}
- r_align = (i < PCI_BRIDGE_RESOURCES) ? r_align + 1 : r->start;
for (list = head; ; list = list->next) {
resource_size_t align = 0;
struct resource_list *ln = list->next;
--Hmm. This makes the code look totally nonsensical.
It seems to come from the fact that we had a very odd way to check bogus
resources (namely "start == end"), but your code makes it _really_ hard to
see what is going on.
In fact, I think the old code was buggy too, because we actually *do* have
single-byte resources where start == end, as showb by google:
PCI: Ignore bogus resource 1 [3f6:3f6] of Symphony Labs SL82c105
PCI: Ignore bogus resource 3 [376:376] of Symphony Labs SL82c105
where that resource actually looks valid, and should have a single byte
alignment! Admittedly I think it was created with a quirk (can you get
that kind of resource from actually _probing_ a PCI device?) but I do
think that a single-byte resource is valid.
So I wonder if we shouldn't just make this a bit more readable and also a
bit more explicit with something like the appended..
NOTE! This will also consider a bridge resource at 0 to be an invalid
resource (since now the alignment will be zero), which is a bit odd and
makes me worry a bit. I wouldn't be surprised if some non-PC architectures
have PCI bridges at zero. But maybe they should be (or already are?)
marked IORESOURCE_PCI_FIXED?
Ben - the obvious "odd PCI bus resources" architecture would be POWER. Any
commentary?
Linus
---
drivers/pci/setup-res.c | 21 +++++++++++++++++----
1 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 4be7ccf..048ed77 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -211,6 +211,20 @@ int pci_assign_resource_fixed(struct pci_dev *dev, int resno)
EXPORT_SYMBOL_GPL(pci_assign_resource_fixed);
#endif
+static inline resource_size_t get_resource_alignment(int resno, struct resource *r)
+{
+ resource_size_t start = r->start, end = r->end;
+ resource_size_t alignment = 0;
+
+ /* End == start == 0 - invalid resource */
+ if (end && start <= end) {
+ alignment = end -...Good point - even though 1-byte size/alignment is invalid for a regular BAR (minimum is 8 bytes for IO and 16 bytes for MEM, IIRC), nothing prevents us from using this code for non-standard stuff, including single-byte Well, at this point (pdev_sort_resources call) bridge resource->start has nothing to do with a bus address, it just a temporary storage for required alignment, filled by sizing routines (and 0 is definitely invalid here). I know, this is quite confusing, but I didn't want to add extra fields to existing structures or create temporary per-bus trees... But after pci_assign_resource() that resource can certainly be at 0, Extra 4 or 8 bytes per resource? Well, if you think that people won't start complain too much about that, I'll be absolutely happy with that. If the new "align" field (and then, maybe, "size" instead of "end"?) is OK, then I'm definitely willing to give it a try. Ivan. --
In fact, this is the resource assignment code, and PCIBIOS_MIN_* is set to non-0 for powerpc too so we don't try to assign things down at 0 anyway, so we don't have a problem. We only care about things at 0 that have already been put there by the firmware and that we decide not to reassign. So I don't have any objection anymore. Cheers, Ben. --
Adding an alignment field should be a non-issue: the size of this structure is not likely to be a big deal (yeah, we have something like 12 of them in each PCI device etc, so smaller is better, but it's still not going to be something anybody really notices). And yeah, it might be nice to have "size" instead of "end", but the real problem with that one is actually that on 32-bit (without the 64-bit resource configuration) we want it to be "size-1" in order to be able to fit a whole 0-0xffffffff resource into a resource. And *that* would be really ugly with "size": at least right now it makes a certain amount of sense with "end" pointing to the last entry. So while I can understand your wish for "start+len" rather than "start+end", I don't think it's really practical. Linus --
Actually, before we go any further, there might be a less intrusive alternative: add just a couple of flags to the resource flags field (we still have something like 8 unused bits on 32-bit), and use those to implement a generic "resource_alignment()" routine. Two flags would do it: - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device resources) - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources during probing) and then the case of both flags zero (or both bits set) would actually be "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we actually allocate the resource (so that we don't use the "start" field as alignment incorrectly when it no longer indicates alignment). That wouldn't be totally generic, but it would have the nice property of automatically at least add sanity checking for that whole "res->start has the odd meaning of 'alignment' during probing" and remove the need for a new field, and it would allow us to have a generic "resource_alignment()" routine that just gets a resource pointer. Linus --
Sounds good to me. So here we go (completely untested, just for review).
Ivan.
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2db2e4b..152ffa2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -225,7 +225,7 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
res->flags |= l & ~PCI_BASE_ADDRESS_IO_MASK;
}
res->end = res->start + (unsigned long) sz;
- res->flags |= pci_calc_resource_flags(l);
+ res->flags |= pci_calc_resource_flags(l) | IORESOURCE_SIZEALIGN;
if (is_64bit_memory(l)) {
u32 szhi, lhi;
@@ -278,7 +278,8 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
if (sz) {
res->flags = (l & IORESOURCE_ROM_ENABLE) |
IORESOURCE_MEM | IORESOURCE_PREFETCH |
- IORESOURCE_READONLY | IORESOURCE_CACHEABLE;
+ IORESOURCE_READONLY | IORESOURCE_CACHEABLE |
+ IORESOURCE_SIZEALIGN;
res->start = l & PCI_ROM_ADDRESS_MASK;
res->end = res->start + (unsigned long) sz;
}
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f7cb8e0..5cf8456 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -65,6 +65,7 @@ static void pbus_assign_resources_sorted(struct pci_bus *bus)
res = list->res;
idx = res - &list->dev->resource[0];
if (pci_assign_resource(list->dev, idx)) {
+ /* FIXME: get rid of this */
res->start = 0;
res->end = 0;
res->flags = 0;
@@ -327,6 +328,7 @@ static void pbus_size_io(struct pci_bus *bus)
/* Alignment of the IO window is always 4K */
b_res->start = 4096;
b_res->end = b_res->start + size - 1;
+ b_res->flags |= IORESOURCE_STARTALIGN;
}
/* Calculate the size of the bus and minimal alignment which
@@ -401,6 +403,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, unsigned long
}
b_res->start = min_align;
b_res->end = size + min_align - 1;
+ b_res-&g...Ivan, After adding a resource_alignment() prototype to
include/linux/ioport.h to get rid of
drivers/pci/setup-res.c: In function `pci_assign_resource':
drivers/pci/setup-res.c:141: error: implicit declaration of function `resource_alignment'
make[2]: *** [drivers/pci/setup-res.o] Error 1
make[1]: *** [drivers/pci] Error 2
I verified that pdev_sort_resources() longer encounters an Oops when
the ThinkPad is hot-added to the Dock II with the change that
exposed the problem (transparent bridge sizing removal) included.
So, although it is no suprise, I can at least say that the
pdev_sort_resources() bulletproofing has been test verified. :)
However, while doing the above I did see some "bogus alignment
of resource" messages both during boot with ThinkPad attached to
the Dock II and during hot-add of the ThinkPad to the Dock II.
The messages are associated with the cardbus bridges on the
docking station. dmesg and lspci output collected after the
hot-add included below.
Note that the
PCI: Cannot allocate resource region 7 of bridge 0000:02:03.0
PCI: Cannot allocate resource region 8 of bridge 0000:02:03.0
PCI: Cannot allocate resource region 9 of bridge 0000:02:03.0
messages associated with the transparent p2p bridge on the
docking station are _not_ new and come from
if (!r->start || !pr ||
request_resource(pr, r) < 0) {
printk(KERN_ERR "PCI: Cannot allocate "
"resource region %d "
"of bridge %s\n",
idx, pci_name(dev));
/*
* Something is wrong with the region.
* Invalidate the resource to prevent
* child resource allocations in this
* range.
*/
r->flags = 0;
}
in pcibios_allocate_bus_resources() [arch/x86/pci/i386.c] which
is visited during boot but not during the hot-add. This is where
the resource flags were cleared preventing the Oops from happening
during boot when the ThinkPad was already attached to the docking
station. This code should probably be modified to reduce dmesg
pollution.
Will now try ...Just tried it on an IBM x3850. No obvious problems or unexpected messages spotted during/following boot or during/following hotplug of PCI-X and PCIe cards. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc --
Thanks for testing. I've also tested an updated patch (with BUG_ON() replaced by printk and a resource_alignment() prototype) on alpha and x86 - works as expected. Ivan. --
Don't do the BUG_ON(). That would just cause a broken machine, and makes it much harder to report this issue. BUG_ON() should be used only for totally unfixable things. In this case, the easy thing to do is to just return an error, possibly with a printk() about bogus resources (ignoring it as a resource, the way we do it in pdev_sort_resource()). But other than that, the thing doesn't look horrible. Linus --
PCI bridges at zero is perfectly valid indeed and I'm sure we have that around at least for IO space. In fact, I'm surprised you don't have that on x86. Typically, things like an HT segment with a P2P bridge and behind that bridge an ISA bridge could well have the P2P bridge with a resource forwarding 0...0x1000 IO downstream for example even on x86 no ? (I'm not -that- familiar with the crazyness of legacy ISA on x86 but I've definitely seen such setup on other archs). For MMIO, it mostly depends whether the code gets to work on raw bus values, in which case 0 will be around, or already fixed up values (ie, translated in CPU bus space) in which case 0 is unlikely. In the case of pdev_sort_resources(), it will manipulate already fixed up resources, so MMIO should work, but I'm a bit worried by PIO. Ben. --
x86 has memory there, always has had, probably always will. Also, the reason I *think* this issue is ok is that I think the only PCI bus resources we can see in the whole pdev_sort_resources() mess is the ones that are behind the bus that we're not sizing for, and they've been set up by pbus_size_mem(). And pbus_size_mem() has this special magic setup where it calculates the size and the alignment of the bus resource, and then makes r->start = alignment; r->end = r->start + size - 1; so using "r->start" *should* be ok in this case because it really means "alignment" in this one case. That said, I'm not going to be willing to bet my life on it. I also wonder if we maybe should just add a separate "alignment" field to the resources. Rather than playing games like these (and having to compare the resource number to decide whether it is a bus resource or a normal PCI device resource), just adding the dang field would be a whole lot saner. I dunno. I'm not going to do anything in this area before 2.6.25 is out because this *does* make me a bit nervous, but if somebody wants to think about this and perhaps write patches for testing, that would be good. And once more: Ivan, can you again double-check my blatherings above? Linus --
One way to kill off some of the assumptions and gunge would be to add pci_resource_assigned(resource) [or indeed just resource_assigned()]. Iomap has similar problems - we have no portable defined "not mapped" at the moment, although we use NULL technically mmio maps of 0 end up at 0 in the implementation today. Alan --
There is IORESOURCE_UNSET... We could use that. I use it to some extent
on powerpc but x86 doesn't. Though I remember spotting a code path in
setup-res.c will not clear it when actually assigning the resource to a
bus. I can't remember if that hits in practice tho. I have a patch
anyway :-)
----
[PATCH] pci: Make pci_assign_resource always clear IORESOURCE_UNSET
For bus resources pci_assign_resrouce() needs to also clear
IORESOURCE_UNSET. (For device resources, it's handled by
pci_update_resource).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Index: linux-merge/drivers/pci/setup-res.c
===================================================================
--- linux-merge.orig/drivers/pci/setup-res.c 2007-12-13 13:06:27.000000000 +1100
+++ linux-merge/drivers/pci/setup-res.c 2007-12-13 13:06:50.000000000 +1100
@@ -167,7 +167,8 @@ int pci_assign_resource(struct pci_dev *
(unsigned long long)res->start, pci_name(dev));
} else if (resno < PCI_BRIDGE_RESOURCES) {
pci_update_resource(dev, res, resno);
- }
+ } else
+ res->flags &= ~IORESOURCE_UNSET;
return ret;
}
--0..0x1000 physical memory (== bus memory on x86) is reserved to the BIOS as RAM in essence and that legacy will be with us for at least 100 or maybe 200 years ;-) Ingo --
