Another week (my weeks do seem to be eight days, don't they? Very odd),
another -rc.The dirstat pretty much says it all:
43.0% arch/arm/configs/
43.9% arch/arm/
25.5% arch/powerpc/configs/
26.8% arch/powerpc/
73.9% arch/
4.4% drivers/usb/musb/
5.4% drivers/usb/
4.0% drivers/watchdog/
16.0% drivers/
3.5% fs/yeah, the bulk of it is all config updates, and with arm and powerpc
leading the pack.But seriously, while the config updates amount to about three quarters of
the diff, and if you don't use a rename-aware diff the blackfin include
file movement pretty much accounts for the rest, hidden behind all those
trivial (but bulky) changes are a lot of small changes that hopefully fix
a number of regressions.The most exciting (well, for me personally - my life is apparently too
boring for words) was how we had some stack overflows that totally
corrupted some basic thread data structures. That's exciting because we
haven't had those in a long time. The cause turned out to be a somewhat
overly optimistic increase in the maximum NR_CPUS value, but it also
caused some introspection about our stack usage in general. Including
things like a patch to gcc to fix insane stack usage for vararg functions
on x86-64.But that one would only hit anybody who was a bit too adventurous and
selected the big 4096 CPU configuration. The rest of the regressions fixed
are a bit more pedestrian.---
Adel Gadllah (1):
block: clean up cmdfilter sysfs interfaceAdrian Bunk (5):
ocfs2/cluster/tcp.c: make some functions static
removed unused #include <linux/version.h>'s
KVM: fix userspace ABI breakage
Blackfin arch: let PCI depend on BROKEN
[ARM] use bcd2bin/bin2bcdAl Viro (8):
fix efs_lookup()
fix osf_getdirents()
fix hpux_getdents()
fix regular readdir() and friends
fix ->llseek() for a bunch of directories
deal with the first call of ->show() generating...
r8169 is not working on an Aspire One. It looked like working some time,
but now it has begun to say:Sep 1 01:09:35 one klogd: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: cache line size of 32 is not supported
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: PCI INT A disabled
Sep 1 01:09:35 one klogd: r8169: probe of 0000:02:00.0 failed with error -22Any ideas ? Any more info needed ?
TIA
one:/var/log# lspci -vv -s 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E PCI Express Fast Ethernet controller (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel modules: r8169--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP
--
Doesn't boot on my quad core test box, apparently because of an AHCI failure.
Bisecting ...
Rafael
--
Bisection turned up commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd as the culprit:
commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Mon Aug 25 00:56:08 2008 -0700x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
Reverting this commit helps.
The symptom is that AHCI probe fails with this commit applied.
Thanks,
Rafael
--
Just to be sure... Does "helps" imply that unresolved AHCI behavior
exists after reverting that commit?Thanks,
Jeff
--
No, after reverting this commit AHCI works normally.
Thanks,
Rafael
--
Heh, interesting, since we were talking about reverting that one for other
reasons entirely.See the thread "x86: split e820 reserved entries record to late" (yeah, I
know that subject isn't very grammatical or sensible) for some patches
worth trying _after_ you've reverted that one.Anyway, clearly that commit needs to be reverted regardless, so I'll do
the revert. Can you please test the appended test-patch by Yinghai on top
of the revert?(This is not the final version, but it should be sufficient to be tested)
And if you have the whole dmesg, that would be useful.
Linus
---
From: Yinghai Lu <yhlu.kernel@gmail.com>
Subject: [PATCH] x86: split e820 reserved entries record to late v3
Date: Thu, 28 Aug 2008 17:41:29 -0700so could let BAR res register at first, or even pnp?
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
please drop the one in tip/x86/core use this one insteadSigned-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/e820.c | 20 ++++++++++++++++++--
arch/x86/pci/i386.c | 3 +++
include/asm-x86/e820.h | 1 +
3 files changed, 22 insertions(+), 2 deletions(-)Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,13 +1271,15 @@ static inline const char *e820_type_to_s
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
+struct resource __initdata *e820_res;
void __init e820_reserve_resources(void)
{
int i;
- struct resource *res;
u64 end;
+ struct resource *res;res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+ e820_res = res;
for (i = 0; i < e820.nr_map; i++) {
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,8 @@ void __init e820_reserve_resources(void)
res->end = end;res->f...
dmesg from -rc5 with the offending commit reverted and with the patch
below applied is at:http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/2.6.27-rc5-git.log
Thanks,
--
Ok, the more I look at this, the more interesting it gets.
In particular, this:
...
ACPI: bus type pnp registered
pnp 00:08: mem resource (0xfec00000-0xfec00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp 00:08: mem resource (0xfee00000-0xfee00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp 00:09: mem resource (0xffb80000-0xffbfffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp 00:09: mem resource (0xfff00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp 00:0b: mem resource (0xe0000000-0xefffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp 00:0c: mem resource (0xfec00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
pnp: PnP ACPI: found 13 devices
ACPI: ACPI bus type pnp unregistered
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
pci 0000:00:00.0: BAR 3: can't allocate resource
...there's a few things to note here:
- the resource at 0000:00:00.0 BAR 3 is totally bogus.
We know it's totally bogus because you actually have other resources in
the 0xf....... range, and they work fine. It's also likely to be
totally bogus because it so happens that the end-point of 0xffffffff is
commonly something that the BIOS leaves as a "I sized this resource",
because that's how resources are sized (you write all ones into them
and look what you can read back).But your lspci -vxx output clearly shows that (a) MEM is enabled in
the command word, and yes, the BAR register at 0x18 does indeed have
value 0xe0000000. So it's just the length that is really bogus.- pnp clearly sees that bogus resource at 0xe0000000-0xffffffff
- BUT: the "can't allocate resource" thing is from
pcibios_allocate_resour...
Well, I thought something like this happened, but I wasn't quite sure about the
exact mechanism. Thanks for the explanation. :-)Rafael
--
On Sat, Aug 30, 2008 at 10:39 AM, Linus Torvalds
again, should use MCFG end as the res _end
YH
--
No. Again - we shouldn't DO that insane crap.
We simply shouldn't try to compare the BAR start with randomly chosen
things.So the crap got reverted, and it's not going to get done again. Get over
it.Linus
--
Btw, looking at that bogus BAR#3 some more: I don't actually think it's
even an MCFG resource.I think it's literally the resource that describes the HT window for the
host bridge. So it's literally like the "root" resource - all external
MMIO resources that go over HT have to be in that window.IOW, I'm starting to think that it's not even broken. It is probably
perfectly real. It's not a "PCI bridge" in the sense that it doesn't
bridge one PCI bus to another, but it's a host bridge, and it bridges the
CPU memory accesses to another bus.The fact that the MCFG area happens to be at the start of that window is
probably just a random detail.Does anybody know how to find chipset docs for AMD/ATI chipsets? I find
CPU docs, and the GPU docs, but not the 790 chipset docs anywhere (yeah,
it looks promising with a link that says "AMD 790FX Chipset
Specifications", but the link just takes you to some trivial overview, not
any actual specs.Anybody?
Linus
--
On Sat, Aug 30, 2008 at 12:14 PM, Linus Torvalds
AMD CPU/NB (quad core aka fam 10h later) has MSR to state MMCONFIG, and
the ATI bridge BAR that have same address for MMCONFIG not even have
chance to decode that.it seems ATI chipset doesn't have public version of doc...like reg
info and BIOS/Kernel porting guide.YH
--
Ok, so it's similar to the local APIC in that respect (and presumably IO
APIC too, I haven't checked).But that still just implies that the BAR probably means something else
totally, and the fact that it happens to have the same value as the MCFGYeah, I'm not finding anything either. The 690G databook that Rafael
pointed to does mention the config registers in passing, but it's really
just about electricals (pin setup etc). No BIOS writers guide indeed..Linus
--
On Sat, Aug 30, 2008 at 12:41 PM, Linus Torvalds
Those BIOS porting guide need extra NDA...
they don't want to everyone know that there is lots workaround for
their silicon bugs.YH
--
There are some at:
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_151...Well, that's 690/SB600 only and I'm not sure how useful this is.
Thanks,
Rafael
--
that is for HW guys.
YH
--
On Sat, Aug 30, 2008 at 11:43 AM, Linus Torvalds
do you agree to use quirk to make the BAR res to have correct end
between pci_probe and pci_resource_survey?YH
--
In general I would agree, but now that I've looked at it a bit more, I
actually don't think it's a bug in the chipset any more. See my previous
email that crossed with yours.I suspect that that northbridge resource is basically acting as a bridge
resource. So 0xe0000000 - 0xffffffff is actually _correct_. And MCFG being
in that window (and being first in it) is just a detail.Look at the resource allocations on Rafael's machine: there are two
different classes:- outside that BAR3 window:
The "external gfx0 port A" decode (bridged by device 0000:02.0):
d8000000-dfffffff : PCI Bus 0000:01
d8000000-dfffffff : 0000:01:00.0
d8000000-d8ffffff : vesafband suspect the graphics port is special (considering that this is an
ATI chipset)- inside that BAR3 window: everything else (PCI express):
e0000000-efffffff : PCI MMCONFIG 0
fe6f4000-fe6f7fff : 0000:00:14.2
fe6f4000-fe6f7fff : ICH HD audio
fe6fa000-fe6fafff : 0000:00:13.4
fe6fa000-fe6fafff : ohci_hcd
fe6fb000-fe6fbfff : 0000:00:13.3
fe6fb000-fe6fbfff : ohci_hcd
fe6fc000-fe6fcfff : 0000:00:13.2
fe6fc000-fe6fcfff : ohci_hcd
fe6fd000-fe6fdfff : 0000:00:13.1
fe6fd000-fe6fdfff : ohci_hcd
fe6fe000-fe6fefff : 0000:00:13.0
fe6fe000-fe6fefff : ohci_hcd
fe6ff000-fe6ff0ff : 0000:00:13.5
fe6ff000-fe6ff0ff : ehci_hcd
fe6ff800-fe6ffbff : 0000:00:12.0
fe6ff800-fe6ffbff : ahci
fe700000-fe7fffff : PCI Bus 0000:01
fe7c0000-fe7dffff : 0000:01:00.0
fe7e0000-fe7effff : 0000:01:00.1
fe7f0000-fe7fffff : 0000:01:00.0
fe800000-fe8fffff : PCI Bus 0000:02
fe8ffc00-fe8fffff : 0000:02:00.0
fe8ffc00-fe8fffff : ahci
fe900000-fe9fffff : PCI Bus 0000:03
fe9c0000-fe9dffff : 0000:03:00.0
fe9fc000-fe9fffff : 0000:03:00.0
fe9fc000-fe9fffff : sky2
fea00000-feafffff : PCI Bus 0000:04
feaffc00-feafffff : 0000:04:00.0
feaffc00-feafffff : ahci
feb00000-febfffff : PCI Bus 0000:05
febff000-febfffff : 0000:05:08.0
febff000-febff7ff : oh...
On Sat, Aug 30, 2008 at 12:31 PM, Linus Torvalds
wonder:
in old kernel, after BAR3 request_filed, pci_assigned_unassigned
should get update resource for that... but it could find that big
space for it.that is interesting...
YH
--
Exactly. So what happens is that it doesn't actually re-allocate it at
all. Not that it is necessarily even possible - it's quite possible that
that field is effectively locked some way and is read-only. Without
knowing the chipset details, we can only guess.Linus
--
On Sat, Aug 30, 2008 at 3:41 PM, Linus Torvalds
wait, THAT BAR is 64BIT capable, So kernel should assign 64bit range to it...
not touch it at this time...
except Jordan could find some clue with the DOC.
YH
--
I don't think we've ever done new allocations in 64 bits. Although looking
for it, I have to admit that I don't see what would limit us right now.
There used to be some paths that weren't 64-bit clean, but I think we
fixed all of those.Linus
--
On Sat, Aug 30, 2008 at 4:28 PM, Linus Torvalds
would be some corner case...
didn't see anything there.
calling pcibios_assign_resources+0x0/0x90
request_resource: root: (PCI Bus 0000:01) [fe700000, fe7fffff], new:
(0000:01:00.0) [fe7c0000, fe7dffff] conflict 0
request_resource: root: (PCI Bus 0000:03) [fe900000, fe9fffff], new:
(0000:03:00.0) [fe9c0000, fe9dffff] conflict 0
pci 0000:00:02.0: PCI bridge, secondary bus 0000:01
pci 0000:00:02.0: IO window: 0x9000-0x9fff
pci 0000:00:02.0: MEM window: 0xfe700000-0xfe7fffff
pci 0000:00:02.0: PREFETCH window: 0x000000d8000000-0x000000dfffffff
pci 0000:00:04.0: PCI bridge, secondary bus 0000:02
pci 0000:00:04.0: IO window: 0xa000-0xbfff
pci 0000:00:04.0: MEM window: 0xfe800000-0xfe8fffff
pci 0000:00:04.0: PREFETCH window: disabled
pci 0000:00:06.0: PCI bridge, secondary bus 0000:03
pci 0000:00:06.0: IO window: 0xc000-0xcfff
pci 0000:00:06.0: MEM window: 0xfe900000-0xfe9fffff
pci 0000:00:06.0: PREFETCH window: disabled
pci 0000:00:07.0: PCI bridge, secondary bus 0000:04
pci 0000:00:07.0: IO window: 0xd000-0xefff
pci 0000:00:07.0: MEM window: 0xfea00000-0xfeafffff
pci 0000:00:07.0: PREFETCH window: disabled
pci 0000:00:14.4: PCI bridge, secondary bus 0000:05
pci 0000:00:14.4: IO window: disabled
pci 0000:00:14.4: MEM window: 0xfeb00000-0xfebfffff
pci 0000:00:14.4: PREFETCH window: disabled
pci 0000:00:02.0: setting latency timer to 64
pci 0000:00:04.0: setting latency timer to 64
pci 0000:00:06.0: setting latency timer to 64
pci 0000:00:07.0: setting latency timer to 64YH
--
pci_assign_unassigned_resources==>pci_bus_assign_resources==>pbus_assign_resources_sorted(struct
static void pbus_assign_resources_sorted(struct pci_bus *bus)
{
struct pci_dev *dev;
struct resource *res;
struct resource_list head, *list, *tmp;
int idx;head.next = NULL;
list_for_each_entry(dev, &bus->devices, bus_list) {
u16 class = dev->class >> 8;/* Don't touch classless devices or host bridges or ioapics. */
if (class == PCI_CLASS_NOT_DEFINED ||
class == PCI_CLASS_BRIDGE_HOST)
continue;it skips the host bridge...
YH
--
what's story for not touching host bridges?
YH
--
Ahh. Exactly because of things like this. The hist bridge BAR's are often
special.That code comes from almost four years ago, the commit message was:
Author: Maciej W. Rozycki <macro@mips.com>
Date: Thu Dec 16 21:44:31 2004 -0800[PATCH] PCI: Don't touch BARs of host bridges
BARs of host bridges often have special meaning and AFAIK are best left
to be setup by the firmware or system-specific startup code and kept
intact by the generic resource handler. For example a couple of host
bridges used for MIPS processors interpret BARs as target-mode decoders
for accessing host memory by PCI masters (which is quite reasonable).
For them it's desirable to keep their decoded address range overlapping
with the host RAM for simplicity if nothing else (I can imagine running
out of address space with lots of memory and 32-bit PCI with no DAC
support in the participating devices).This is already the case with the i386 and ppc platform-specific PCI
resource allocators. Please consider the following change for the generic
allocator. Currently we have a pile of hacks implemented for host bridges
to be left untouched and I'd be pleased to remove them.From: "Maciej W. Rozycki" <macro@mips.com>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>and we've had other things where host bridges are special (ie iirc, if you
turn off PCI_COMMAND_MEM from a host bridge, it stops access to real RAM
from the CPU for some bridges - so you must never turn those things off or
you get a dead system).(But at least Intel host bridges will just ignore writes to the CMD
register, I think - you cannot turn MEM off).Linus
--
On Sat, Aug 30, 2008 at 8:00 PM, Linus Torvalds
then
1. we should not probe them in probe.c
2. at least we should not try to request_resource for them in
pcibios_resource_survey...just pretend that they are not existing.
YH
--
You are missing the fact that we need to know where existing resources
are, even if we can't do anything about them!Read my explanation from yesterday about why we need to add the e820
resources to the resource map in the first place.Short recap:
- we need to populate the resource map with as much possible information
about the system as we can..- .. because when we assign _dynamic_ resources, we need to make sure
that they don't clash with random system resources that we don't really
otherwise have a lot of visibility into.So the resource tree is not just about resources we control, it's also
about resources that others control(led) and we don't necessarily know a
lot about.Linus
--
Btw, this is a problem that we seldom actually have on most desktops,
because the BIOS will normally set up just about _all_ the resources, and
we seldom have to worry about anything but just enumerating them (and the
occasional buggy setup).The problems with resource allocation mostly happen on laptops, and
especially with cardbus controllers. Now, that's obviously going away
(people mostly use USB for most things that Cardbus/PCMCIA was used for a
few years ago), but it still exists and with docking stations etc it can
actually be even worse (although that's mainly because access to docking
stations is much more limited, I suspect).So what used to happen _all_ the time was that cardbus worked fine on 99%
of all machines, but then some machines would lock up when you inserted a
card in them, or the card just wouldn't work. And the reason was that some
stupid motherboard resource (like the ACPI sleeping registers or the LPC
control regs) were not done as a normal BAR, so the kernel wouldn't know
about them, and the BIOS didn't necessarily even list it because it never
mattered with Windows (since Windows has a different algorithm for laying
out the bus resources, and wouldn't hit the magic resource).So this is why we populate the resources with everything we can _possibly_
try to find, including hardware-specific quirks (see things like
quirk_ali7101_acpi or all the quirk_ich4_lpc_acpi things etc) for finding
resources that aren't done by BAR's.And the hardware quirks have generally worked pretty well. I'd love to add
some quirk for the RD790 chipset, but I'd like to know what the rules are
for it. I know we have some AMD contacts, I wonder if they could give docs
(I don't personally do NDA's, but I can do "gentleman's agreements" where
I just say I won't spread things further, as long as I can write code
based on them. I know other kernel developers do similar things).Jordan?
Linus
--
please check
[PATCH] x86: split e820 reserved entries record to late v4
[PATCH] x86: split e820 reserved entries record to late v4 - fix v6YH
--
What kernel should I apply those to and in what order?
Rafael
--
linus git tree
1. [PATCH] x86: split e820 reserved entries record to late v4
2. [PATCH] x86: split e820 reserved entries record to late v4 - fix v6tip/master
1. Resource handling: add 'insert_resource_expand_to_fit()' function
2. [PATCH] x86: split e820 reserved entries record to late v4 - fix v6YH
actually it is almost the same to tar ball send you for your system...
YH
--
I've just tested these two patches on top of the current Linus' tree and the
system works normally.Thanks,
Rafael
--
Can you try attached in addition to those to patches ?
want to check if the BAR3 get new resource..., and after that what
could happen...YH
Works, dmesg is at:
http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/2.6.27-rc5-test.logPlease let me know if you want it with any more command line options.
Thanks,
Rafael
--
That BAR is indeed "locked". Now that we try to reallocate it, you get
this in the log:pci 0000:00:00.0: BAR 3: error updating (0x40000004 != 0xe0000004)
pci 0000:00:00.0: BAR 3: error updating (high 0x000001 != 0x000000)ie now the code _tried_ to update the BAR to point to 0x1_4000_0000
instead, but the hardware refused, and it is still at 0x0_e000_0000.So Yinghai's patch "worked", but it worked by doing nothing.
See my earlier guess about locked read-only resources a few emails back.
IOW, I'm not at all surprised. I really do suspect that that BAR is some
very special "this is the HT->PCIE region" BAR.Linus
--
On Sun, Aug 31, 2008 at 10:42 AM, Linus Torvalds
so the code could allocate the 64 bit resource above 4g,...
wonder how the probe could find out the size of is 1fff_ffff..
YH
--
Heh. That's how PCI sizing works: you write all ones to the register, and
read back the result. The low bits won't change, and that indicates the
size.But if _none_ of the bits change, then that simply means that the size
will be calculated to be 0xffffffff-start.So the sizing will "work", it will just always report that the BAR covers
everything from start to the 4G limit.Linus
--
On Sun, Aug 31, 2008 at 11:03 AM, Linus Torvalds
how about
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cce2f4c..3b5269a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -240,6 +240,11 @@ static int __pci_read_base(struct pci_dev *dev,
enum pci_bar_type type,
pci_read_config_dword(dev, pos, &l);
pci_write_config_dword(dev, pos, mask);
pci_read_config_dword(dev, pos, &sz);
+
+ /* sticky and non changable */
+ if (sz == l)
+ goto fail;
+
pci_write_config_dword(dev, pos, l);/*
Rafael,
can you check attach one to see if we still have warning ?
YH
No, because a resource really _can_ be at the end. It's perfectly ok to
have something like a memory resource at 0xff000000-0xffffffff, and then
the BAR register would always read 0xff000000 (or 0x...4 for a 64-bit
resource).So calling that a failure case would be wrong.
Linus
--
thanks,
David, can you test those two patches on top of linus tree?
YH
--
can i get whole bootlog with "debug"?
YH
--
http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/broken.log
Thanks,
Rafael
--
pci 0000:00:00.0: BAR has MMCONFIG at e0000000-ffffffff
pci 0000:00:12.0: BAR 5: can't allocate resource
pci 0000:00:13.0: BAR 0: can't allocate resource
pci 0000:00:13.1: BAR 0: can't allocate resource
pci 0000:00:13.2: BAR 0: can't allocate resource
pci 0000:00:13.3: BAR 0: can't allocate resource
pci 0000:00:13.4: BAR 0: can't allocate resource
pci 0000:00:13.5: BAR 0: can't allocate resource
pci 0000:00:14.2: BAR 0: can't allocate resourceyour mmconf in BAR is broken....
after forcibly insert that block all others...
YH
--
And that seems utter crap to begin with.
PCI: Using MMCONFIG at e0000000 - efffffff
Where did it get that bogus "ffffffff" end address?
Anyway, that whole MMCONFIG/BAR thing was totally broken to begin with,
and it's reverted now in my tree, so I guess it doesn't much matter.Linus
--
On Fri, Aug 29, 2008 at 5:08 PM, Linus Torvalds
we need to handle it. otherwise if the BAR go first, and it will stop
other BARs to be registered...a quirk should do the work....
Rafael, can you send out lspci -tv and lspci --vvxxx too.
YH
--
cat /proc/iomem please.
YH
--
00:00.0 Host bridge: ATI Technologies Inc RD790 Northbridge only dual
slot PCI-e_GFX and HT3 K8 part
Subsystem: ATI Technologies Inc RD790 Northbridge only dual slot
PCI-e_GFX and HT3 K8 part
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Region 3: Memory at <ignored> (64-bit, non-prefetchable)
Capabilities: [c4] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=0 UnitCnt=12 MastHost- DefDir- DUL-
Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
IsocEn- LSEn- ExtCTL- 64b-
Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- LWI=16bit
DwFcInEn- LWO=16bit DwFcOutEn-
Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0
IsocEn- LSEn- ExtCTL- 64b-
Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit
DwFcInEn- LWO=8bit DwFcOutEn-
Revision ID: 3.00
Link Frequency 0: [b]
Link Error 0: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 0: 200MHz+ 300MHz- 400MHz+ 500MHz- 600MHz+
800MHz+ 1.0GHz+ 1.2GHz+ 1.4GHz- 1.6GHz- Vend-
Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD-
Link Frequency 1: 200MHz
Link Error 1: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 1: 200MHz- 300MHz- 400MHz- 500MHz- 600MHz-
800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend-
Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE- CRCFE- SERRFE- CF-
RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE-
Prefetchable memory behind bridge Upper: 00-00
Bus Number: 00
Capabilities: [40] HyperTransport: Retry Mode
Capabilities: [54] HyperTransport: UnitID Clumping
Capabilities: [9c] HyperTransport: #1a
00: 02 10 56 59 06 00 30 22 00 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 e0so bar3 of 00:00.0 has oxe0000000 - 0xffffffff
and request_resource failed, so
Region 3: Memory at <ignore...
Could you please rebase them on top of current -git?
Rafael
--
please check attached quilt series based on linus tree.
YH
there is some problem with fix -v4...on one test machine.
please don't use it now
YH
--
this one should work.
YH
dmesg -s 262144
http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/dmesg-test.log
cat /proc/iomem
http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/iomem-test.txt
Thanks,
Rafael
--
calling pci_subsys_init+0x0/0x120
PCI: Using ACPI for IRQ routing
request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:01)
[9000, 9fff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:01) [fe700000, fe7fffff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:01) [d8000000, dfffffff] conflict 0
request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:02)
[a000, bfff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:02) [fe800000, fe8fffff] conflict 0
request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:03)
[c000, cfff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:03) [fe900000, fe9fffff] conflict 0
request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:04)
[d000, efff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:04) [fea00000, feafffff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus
0000:05) [feb00000, febfffff] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new:
(0000:00:00.0) [e0000000, ffffffff] conflict 1
pci 0000:00:00.0: BAR 3: can't allocate resourceso pci_resource_survey is depth first. sub buses request some resource
at first...we don't need quirk to handle that strange BAR res.
and we got reserved register correctly
in /proc/iomem
d7fe0000-d7ffffff : reserved
..
fff00000-ffffffff : reservedfor
BIOS-e820: 00000000d7fe0000 - 00000000d8000000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)YH
--
On Fri, Aug 29, 2008 at 5:08 PM, Linus Torvalds
the BAR is from pci_read_bases..., so that chipset is broken...
they are even supposed to to hide that BAR to os.YH
--
Ok, can we please
- *do* get a quirk for known-broken chipsets (at a *PCI* level, this is
not an x86 issue)- *not* get any more random PCI work-arounds that go through the x86 tree
and aren't even looked at by the (very few) people who actually
understand the PCI resource handling?IOW, for the first issue, just teach pci_mmcfg_check_hostbridge() about
this broken bridge, and have it fix things up (including hiding the thing,
but also just verifying that the dang thing even -works- etc).For the second issue - please do realize that we have had much over a
_decade_ of work on the PCI resource handling, and it's fragile. The thing
I reverted really isn't something that Ingo should ever have committed in
the first place. It's not something an x86 maintainer can even make sane
decisions on.Resource handling things _need_ to get ACK's from people like Ivan
Kokshaysky or me. Or at least _several_ other people who actually really
understand not just PCI resource handling, but have actually seen all the
horrible crap it causes, and understand how fragile this stuff is. It's
all different, and it's all about all the million of broken machines out
there that screw things up.Linus
--
Btw, what was the original regression that commit was
a2bd7274b47124d2fc4dfdb8c0591f545ba749dd trying to fix?It's not listed in that commit, even though the commit has a "Bisected-by:
David Witbrodt <dawitbro@sbcglobal.net>".In fact, I can find it with google by searching for
David Witbrodt bisect
and I see that it is 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f.
I'm wondering why that commit wasn't just reverted? Because now that I see
it, I notice that _that_ is the real bug to begin with.That commit really was buggy. NO WAY can you insert the code/bss/data
resources before you've done e820 handling, because it may well be that
some strange e820 table contains things that cross the resources.So that original thing was buggy, and made x86-64 do odd thigns. They were
doubly odd, since x86-32 did it differently (and better, I think).Then, when actally doing the common arch/x86/kernel/setup.c, the commit
that does so _claims_ that the common code came from the 32-bit version,
but that doesn't seem to be true, at least wrt this thing. The current
setup.c comes from the *broken* cleanup of setup_64.c that had been
bisected to be broken.And that, in turn, happened in 41c094fd3ca54f1a71233049cf136ff94c91f4ae
("x86: move e820_resource_resources to e820.c") which also did "and make
32-bit resource registration more like 64 bit.", so it got the bug into
32-bit code that had been introduced in 64-bit code.Ugh.
So why was then that other broken commit added to paper it over, even
though the original broken commit had been bisected and the breakage was
known to have been due to _that_?Hmm?
Yinghai - I'm hoping that the code movement is all over and done with, but
you need to be a _lot_ more careful here. And Ingo, this really wasn't
very well done.Linus
--
On Fri, Aug 29, 2008 at 6:11 PM, Linus Torvalds
we reverted the commit , David's problem still happen.
the root cause is:
before 2.6.26, call init_apic_mapping and will insert_resource for
lapic address.
and then call e820_resource_resouce (with request_resource) to
register e820 entries.
so the lapic entry in the resource tree will prevent some entry in
e820 to be registered.
later request_resource for BAR res (==hpet) will succeed.from 2.6.26. we move lapic address registering to late_initcall, so
the entry is reserved in e820 getting into resource tree at first.
and later pci_resource_survey::request_resource for BAR res (==hpet,
0xfed00000) will fail. so pci_assign_unsigned... will get new
res for the BAR, so it messed up hpet setting.solutions will be
1. use quirk to protect hpet in BAR, Ingo said it is not generic.
2. or the one you are reverted... check_bar_with_valid. (hpet, ioapic,
mmconfig) --> happenly reveal another problem with Rafael's
system/chipset.
3. or sticky resource... , but could have particallly overlapping
4. or don't register reserved entries in e820.. Eric, Nacked.
5. or you sugges, regiser some reserved entries later...., and have
insert_resource_expand_to_fit...YH
--
So the problem there was that traditionally, e820_reserve_resource()
expected to be the first one to populate any resources. That's changed,
and that's why it now needs to use "insert_resource()" rather thanYeah, I don't like it. The quirk I was talking about was the one about
Yeah, no, we do want reserved entries from e820 to show up to at least
Yes. And I do think this is a workable model.
Linus
--
On Fri, Aug 29, 2008 at 7:33 PM, Linus Torvalds
orginally it works, because lapic address entry open the big hole for
if update res->end according mmconfig end, before insert it forcibly,
BTW, insert_resource_expand_to_fit need to be replaced with
insert_resource_split_to_fit....
test stub reveal expand will make __request_region not working for
some devices...because reserved_entries from e820 take
IORESOUCE_BUSY...YH
--
Except it's still a horrible patch that special-cases all the wrong things
(ie random resources that we just happen to know that ACPI etc cares
about).There's no way to know in general if ACPI might care deeply where some
random resource is (say, graphics memory) and it might be done with a BAR.Well, we should probably just remove the IORESOURCE_BUSY part.
Again, that comes from the fact that the e820 resources used to _override_
everything - they were inserted first, and nothing else was _ever_ allowed
to allocate in that region.But if we're changing that, then the whole IORESOURCE_BUSY part doesn't
make sense.In fact, in general, IORESOURCE_BUSY doesn't much make sense any more in
general, because it was actually more of an ISA-timeframe locking model
saying "you can't touch this region". But if the whole point is that we
now try to allow PCI device BAR's and the e820 maps to co-exist, then the
whole - and only - reason for IORESOURCE_BUSY for them goes away..Linus
--
Ok, and here's the patch to do
insert_resource_expand_to_fit(root, new);
and while I still haven't actually tested it, it looks sane and compiles
to code that also looks sane.I'll happily commit this as basic infrastructure as soon as somebody ack's
it and tests that it works (and I'll try it myself soon enough, just for
fun)Linus
---
include/linux/ioport.h | 1 +
kernel/resource.c | 88 ++++++++++++++++++++++++++++++++++-------------
2 files changed, 64 insertions(+), 25 deletions(-)diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 22d2115..8d3b7a9 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -109,6 +109,7 @@ extern struct resource iomem_resource;
extern int request_resource(struct resource *root, struct resource *new);
extern int release_resource(struct resource *new);
extern int insert_resource(struct resource *parent, struct resource *new);
+extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
extern int allocate_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
resource_size_t max, resource_size_t align,
diff --git a/kernel/resource.c b/kernel/resource.c
index f5b518e..72ee95b 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -362,35 +362,21 @@ int allocate_resource(struct resource *root, struct resource *new,EXPORT_SYMBOL(allocate_resource);
-/**
- * insert_resource - Inserts a resource in the resource tree
- * @parent: parent of the new resource
- * @new: new resource to insert
- *
- * Returns 0 on success, -EBUSY if the resource can't be inserted.
- *
- * This function is equivalent to request_resource when no conflict
- * happens. If a conflict happens, and the conflicting resources
- * entirely fit within the range of the new resource, then the new
- * resource is inserted and the conflicting resources become children of
- * the new resource.
+/*
+ * Insert a resourc...
.. and it even works (apart from a missing '\n' for the expansion report
;).I tested it with the appended silly test-case, and it shows
...
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000160000000 (usable)
Expanded resource Kernel dummy due to conflict with Kernel code
Expanded resource Kernel dummy due to conflict with Kernel data
last_pfn = 0x160000 max_arch_pfn = 0x3ffffffff
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
...and /proc/iomem shows
...
00100000-9cf64fff : System RAM
00200000-006ea27f : Kernel dummy
00200000-00561f37 : Kernel code
00561f38-006ea27f : Kernel data
00777000-007d6cc7 : Kernel bss
...so it correctly expanded that "Kernel dummy" resource to cover the
resources it had clashed with.And no, it's not perfect. We certainly _could_ split things instead. But I
hope that odd "e820 resources were bogus" case almost never would actually
trigger in practice, and the expansion case is not only simpler, it's also
slightly more robust in the sense that a single big resource is likely to
fit the things we need than multiple smaller resources that have been
chopped up.Linus
--- dummy test patch for the 'insert-resource-expand-to-fit' thing ---
arch/x86/kernel/setup.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 362d4e7..6265a38 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -578,6 +578,14 @@ static struct x86_quirks default_x86_quirks __initdata;struct x86_quirks *x86_quirks __initdata = &default_x86_quirks;
+static struct resource dummy_resource = {
+ .name = "Kernel dummy",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
+
/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use the...
On Fri, Aug 29, 2008 at 7:56 PM, Linus Torvalds
we need to use insert_resource_split_to_fit instead...
otherwise __request_region will not be happy.
have one shrink one
only work with
|----------------|
|---------------------|still has problem with
|----------------| |------------| |-----------|
|------------------------------------|need to get rid of middle one too.
YH
---
arch/x86/kernel/e820.c | 20 +++++++++++++-
include/linux/ioport.h | 2 +
kernel/resource.c | 66 ++++++++++++++++++++++++++++++++++++++++---------
3 files changed, 74 insertions(+), 14 deletions(-)Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1319,8 +1319,24 @@ void __init e820_reserve_resources_late(res = e820_res;
for (i = 0; i < e820.nr_map; i++) {
- if (!res->parent && res->end)
- insert_resource(&iomem_resource, res);
+#if 1
+ /* test for shrink_with fit */
+ if (!res->parent && res->end) {
+ if (res->start == 0xe0000000)
+ res->start = 0xde000000;
+ }
+#endif
+
+ if (!res->parent && res->end &&
insert_resource(&iomem_resource, res)) {
+
+ printk(KERN_WARNING "found conflict for %s
[%08llx, %08llx], try to insert with shrink\n",
+ res->name, res->start, res->end);
+
+ insert_resource_shrink_to_fit(&iomem_resource, res);
+
+ printk(KERN_WARNING " shrink to %s [%08llx,
%08llx]\n",
+ res->name, res->start, res->end);
+ }
res++;
}
}
Index: linux-2.6/include/linux/ioport.h
=...
Are you really really sure?
Try just removing the IORESOURCE_BUSY. As mentioned, if we expect the PCI
BAR's to work with the e820 resources, then BUSY really is simply not
right any more. Not that I think it should matter either..The ones that are added _early_ should be IORESOURCE_BUSY (ie the ones
that cover RAM), but the others we now expect to nest with PCI BARs.But since we add them after we have parsed the BAR's, I don't even see why
the BUSY bit should even matter - we've already added the fixed BARs, and
any newly allocated non-fixed ones shouldn't be allocated in e820 areas
_regardless_ of whether the BUSY bit is set or not.So pls explain why it matters?
Linus
--
On Fri, Aug 29, 2008 at 8:24 PM, Linus Torvalds
please check
__request_region: conflict: (reserved) [dd000000, efffffff], res:
(qla2xxx) [ddffc000, ddffffff]
busy flag
qla2xxx 0000:83:00.0: BAR 1: can't reserve mem region [0xddffc000-0xddffffff]YH
...
Initializing cgroup subsys cpuset...............................................
Linux version 2.6.27-rc5-tip-00672-ge5c5407-dirty (yhlu@linux-zpir)
(gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision
135036] (SUSE Linux) ) #220 SMP Fri Aug 29 22:02:53 PDT 2008..
Command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 i8042.noaux initcall_debug apic=verbose pci=routeirq
ip=dhcp load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.27_k8.1
done
KERNEL supported cpus:s
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls done
BIOS-provided physical RAM map: done
BIOS-e820: 0000000000000000 - 0000000000097400 (usable) done
BIOS-e820: 0000000000097400 - 00000000000a0000 (reserved) done
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)s.
BIOS-e820: 0000000000100000 - 00000000d7fa0000 (usable)
BIOS-e820: 00000000d7fae000 - 00000000d7fb0000 (usable)
BIOS-e820: 00000000d7fb0000 - 00000000d7fbe000 (ACPI data)
BIOS-e820: 00000000d7fbe000 - 00000000d7ff0000 (ACPI NVS)
BIOS-e820: 00000000d7ff0000 - 00000000d8000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000002028000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (Kernel
code) [200000,...
Ok, this is actually when the driver wants to reserve the BAR, and then it
norices that there is an existing "reservation" there.So yes, drivers will care - they literally will think that somebody else
owns their resource if they have a BUSY resource inside of them. So this
is a driver protecting against another driver.The sad part is that it looks like it's entirely due to the PCI code
trying to emulate an ISA driver model, and use a flat resource space - so
it hits the upper resources first.Does this patch make a difference? It actually removes a fair chunk of
code, by just saying "we really don't care if the resource is IO or MEM,
we just want to reserve space inside of it, regardless of type".Untested - obviously.
Linus
---
drivers/pci/pci.c | 26 +++++++++-----------------
1 files changed, 9 insertions(+), 17 deletions(-)diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c9884bb..a3de4fe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1304,15 +1304,11 @@ pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge)
void pci_release_region(struct pci_dev *pdev, int bar)
{
struct pci_devres *dr;
+ struct resource *res = pdev->resource + bar;if (pci_resource_len(pdev, bar) == 0)
return;
- if (pci_resource_flags(pdev, bar) & IORESOURCE_IO)
- release_region(pci_resource_start(pdev, bar),
- pci_resource_len(pdev, bar));
- else if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM)
- release_mem_region(pci_resource_start(pdev, bar),
- pci_resource_len(pdev, bar));
+ __release_region(res, pci_resource_start(pdev, bar), pci_resource_len(pdev, bar));dr = find_pci_dr(pdev);
if (dr)
@@ -1336,20 +1332,16 @@ void pci_release_region(struct pci_dev *pdev, int bar)
int pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
{
struct pci_devres *dr;
+ struct resource *res = pdev->resource + bar;if (pci_resource_len(pdev, bar) == 0)
return 0;
-
- if (pci_...
On Fri, Aug 29, 2008 at 8:24 PM, Linus Torvalds
not all. some are MMCONF, some are for GART, and some for fixed lapic,
if we don't add the IORESOURCE_BUSY, why bother to add these entries...
good layout from BIOS, it should only reserve mmio range is not showing in BAR.
for example:
0xdc000000 - 0xdd000000 for GART ( some offset BAR 0x94)
0xdd000000 - 0xde000000 is for bus 0x80
0xde000000 - 0xdf000000 is for bus 0x00
0xe0000000 - 0xf0000000 is for mmconfig ( CPU set it in MSR for amd fam 10h)if one stupid BIOS set
0xdc000000 - 0x100000000 for reserved.then when in insert that range late
we should still have set ranges other than range 0xdd000000 - 0xdf000000also do we need set other leaf range in 0xdd000000 - 0xdf0000000 ?
YH
--
You don't understand how the resource allocator works.
IORESOURCE_BUSY is really more of a "legacy bit". It has almost no bearing
on the actual allocations.Just grep for IORSOURCE_BUSY in kernel/resource.c. The _only_ thing that
cares about busy/non-busy is the legact "request_region()" function. That
one isn't actually used by any core PCI code - it's more of a driver
issue to claim exclusive ownership of particular resources by inserting a
marker in that resource.So IORESOURCE_BUSY is a red herring. The only reason I said you can clear
it is because you claimed it causes problems, but the more I look at it,
the more I think you're likely just mistaken - because IORESOURCE_BUSY
doesn't make any difference at all to normal resource handling until you
get to actual drivers.The bigger issue is that just inserting the resource (and it really
doesn't matter if it is marked busy or not) is in itself a mark of
"there's something here". THAT is what all the resource code cares about.
The IORESOURCE_BUSY bit is almost immaterial (ie _is_ immaterial except
for some very specific cases).And the reason we need to add the e820 resources is exactly so that we
don't try to allocate PCI resources on top of some system resources weI agree, but "good layour" and "BIOS" don't really go together. There's
Sure, but really, the only point of even caring about e820 resources in
the first place has really nothing to do with the BAR's we can see
(because the kernel can handle _those_ perfectly well on its own), and has
everything to do with teh fact that a lot of devices have invisible
resources that we _cannot_ see (ie magic non-standard BAR's for the
motherboard chips).And those are exactly why we want to populate the resource map with the
e820 information - to avoid having dynamic resources (like Cardbus or PCI
hotplug, or just devices that weren't set up statically by the BIOS) be
then allocated by the kernel on top of those "invisible" resources.A...
And just to clarify - I think that while you get that error for the
qla2xxx driver, I suspect that your actual resource tree is all good, and
that the PCI allocations were fine.And then the problem you his is now that the driver literally thinks that
some other driver already took that resource.The patch I just sent is not actually the patch I think you should do: the
proper patch is to just remove IORESOURCE_BUSY from the e820 resources,
simply because they are _not_ indicative of a driver already holding on to
the resource.Of course, the sad part is that potentially IORESOURCE_BUSY might actually
be a really good bit for exactly that - we've had tons of issues with
hardware sensors literally having a kernel driver _and_ a system level
driver (ie ACPI), and things get confused exactly because there are now
two drivers trying to drive the same piece of hardware.But basically, if you have BAR's and the e820 resource areas co-existing,
then the e820 resources shouldn't be marked BUSY.Anyway - to just re-cap - you might as well just ignore the patch I just
sent out, and instead just avoid doing that BUSY bit to begin with in the
"late e820" case. Simpler and more correct.Linus
--
please check fix v3
[PATCH] x86: split e820 reserved entries record to late v4 - fix v3
try to insert_resource second time, by expand the resource...
for case: e820 reserved entry is partially overlapped with bar res...
hope it will never happen
v3: use reserve_region_with_split() instead to hand overlapping
with test case by extend 0xe0000000 - 0xeffffff to 0xdd800000 -
get
e0000000-efffffff : PCI MMCONFIG 0
e0000000-efffffff : reserved
in /proc/iomem
get
found conflict for reserved [dd800000, efffffff], try
to reserve with split
__reserve_region_with_split: (PCI Bus #80)
[dd000000, ddffffff], res: (reserved) [dd800000, efffffff]
__reserve_region_with_split: (PCI Bus #00)
[de000000, dfffffff], res: (reserved) [de000000, efffffff]
initcall pci_subsys_init+0x0/0x121 returned 0 after 381 msecs
in dmesgYH
--
we may not need put reserve entries from e820 into resource tree.
and only insert those sticky resources (with _BUSY) before
pci_assign_unassign and _request_region etc.YH
--
On Fri, Aug 29, 2008 at 5:45 PM, Linus Torvalds
the quirk work at the first point for David' system.
[PATCH] x86: protect hpet in BAR for one ATI chipset v3
so avoid kernel don't allocate nre resource for it because it can not
allocate the old
address from BIOS.the same way like some IO APIC address in BAR handling
v3: device id should be 0x4385
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
drivers/pci/quirks.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -1918,6 +1918,22 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_B
PCI_DEVICE_ID_NX2_5709S,
quirk_brcm_570x_limit_vpd);+static void __init quirk_hpet_in_bar(struct pci_dev *pdev)
+{
+ int i;
+ u64 base, size;
+
+ /* the BAR1 is the location of the HPET...we must
+ * not touch this, so forcibly insert it into the resource tree */
+ base = pci_resource_start(pdev, 1);
+ size = pci_resource_len(pdev, 1);
+ if (base && size) {
+ insert_resource(&iomem_resource, &pdev->resource[1]);
+ dev_info(&pdev->dev, "HPET at %08llx-%08llx\n", base,
base + size - 1);
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x4385, quirk_hpet_in_bar);
+
#ifdef CONFIG_PCI_MSI
/* Some chipsets do not support MSI. We cannot easily rely on setting
* PCI_BUS_FLAGS_NO_MSI in its bus flags because there are actuallystop working on following path?
[PATCH] x86: split e820 reserved entries record to late v4sound good, will look at after get lspci -tv and lspci -vvxxx from Rafael.
also quirk between probe::pci_read_bases and pci_resource_survey
YH
--
Now, this is probably fine too in theory, but
- you didn't check if the BAR is even enabled, afaik
- the other patch - to move the reserved e820 range later - should make
No, I think this is worth doing, BUT IT MUST NOT BE MERGED BY JUST SENDING
IT TO INGO.It's not an "x86 patch". It's about the PCI resources.
And those kinds of patches need to be acked by people who know and
understand the PCI resource issues and have some memory of just how
broken machines can exist.Linus
--
On Fri, Aug 29, 2008 at 7:16 PM, Linus Torvalds
i see.
YH
--
can you try tip/master? we have another fix according to Linus..
YH
--
I have tested the patch that Linus sent me and it works. Please see my reply
to Linus for the link to the dmesg output.Thanks,
Rafael
--
(Don't know who's responsible for this one, so I've just added Ragael to CC)
I only noticed this recently but it's probably been happening for a while
(doesn't seem to happen on 2.6.26):ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 83033595 0.0 0 0 ? S< 10:30 21114574:23 [kthreadd]
root 1740 83078470 0.0 0 0 ? S< 10:31 21114574:23 [md0_raid1]Seems to happen only to kernel threads and at random. Last time I booted
it was two XFS threads.Before I start another bisection, does anybody have any ideas?
--
Cheers,
Alistair.
--
Okay this is a duplicate report of:
http://bugzilla.kernel.org/show_bug.cgi?id=11209
Which seems to have stalled..
--
Cheers,
Alistair.
--
| Greg Kroah-Hartman | [PATCH 008/196] Chinese: add translation of volatile-considered-harmful.txt |
| Amit K. Arora | [RFC] Heads up on sys_fallocate() |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Linus Torvalds | Re: Slow DOWN, please!!! |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
