Another week (my weeks do seem to be eight days, don't they? Very odd),
another -rc.
The dirstat pretty much says it all:
43.0% arch/arm/configs/
43.9% arch/arm/
25.5% arch/powerpc/configs/
26.8% arch/powerpc/
73.9% arch/
4.4% drivers/usb/musb/
5.4% drivers/usb/
4.0% drivers/watchdog/
16.0% drivers/
3.5% fs/
yeah, the bulk of it is all config updates, and with arm and powerpc
leading the pack.
But seriously, while the config updates amount to about three quarters of
the diff, and if you don't use a rename-aware diff the blackfin include
file movement pretty much accounts for the rest, hidden behind all those
trivial (but bulky) changes are a lot of small changes that hopefully fix
a number of regressions.
The most exciting (well, for me personally - my life is apparently too
boring for words) was how we had some stack overflows that totally
corrupted some basic thread data structures. That's exciting because we
haven't had those in a long time. The cause turned out to be a somewhat
overly optimistic increase in the maximum NR_CPUS value, but it also
caused some introspection about our stack usage in general. Including
things like a patch to gcc to fix insane stack usage for vararg functions
on x86-64.
But that one would only hit anybody who was a bit too adventurous and
selected the big 4096 CPU configuration. The rest of the regressions fixed
are a bit more pedestrian.
---
Adel Gadllah (1):
block: clean up cmdfilter sysfs interface
Adrian Bunk (5):
ocfs2/cluster/tcp.c: make some functions static
removed unused #include <linux/version.h>'s
KVM: fix userspace ABI breakage
Blackfin arch: let PCI depend on BROKEN
[ARM] use bcd2bin/bin2bcd
Al Viro (8):
fix efs_lookup()
fix osf_getdirents()
fix hpux_getdents()
fix regular readdir() and friends
fix ->llseek() for a bunch of directories
deal with the first call of ->show() generating no ...(Don't know who's responsible for this one, so I've just added Ragael to CC) I only noticed this recently but it's probably been happening for a while (doesn't seem to happen on 2.6.26): ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 83033595 0.0 0 0 ? S< 10:30 21114574:23 [kthreadd] root 1740 83078470 0.0 0 0 ? S< 10:31 21114574:23 [md0_raid1] Seems to happen only to kernel threads and at random. Last time I booted it was two XFS threads. Before I start another bisection, does anybody have any ideas? -- Cheers, Alistair. --
Okay this is a duplicate report of: http://bugzilla.kernel.org/show_bug.cgi?id=11209 Which seems to have stalled.. -- Cheers, Alistair. --
Doesn't boot on my quad core test box, apparently because of an AHCI failure. Bisecting ... Rafael --
Bisection turned up commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd as the culprit:
commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Mon Aug 25 00:56:08 2008 -0700
x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
Reverting this commit helps.
The symptom is that AHCI probe fails with this commit applied.
Thanks,
Rafael
--
can i get whole bootlog with "debug"? YH --
can you try tip/master? we have another fix according to Linus.. YH --
I have tested the patch that Linus sent me and it works. Please see my reply to Linus for the link to the dmesg output. Thanks, Rafael --
http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/broken.log Thanks, Rafael --
pci 0000:00:00.0: BAR has MMCONFIG at e0000000-ffffffff pci 0000:00:12.0: BAR 5: can't allocate resource pci 0000:00:13.0: BAR 0: can't allocate resource pci 0000:00:13.1: BAR 0: can't allocate resource pci 0000:00:13.2: BAR 0: can't allocate resource pci 0000:00:13.3: BAR 0: can't allocate resource pci 0000:00:13.4: BAR 0: can't allocate resource pci 0000:00:13.5: BAR 0: can't allocate resource pci 0000:00:14.2: BAR 0: can't allocate resource your mmconf in BAR is broken.... after forcibly insert that block all others... YH --
And that seems utter crap to begin with. PCI: Using MMCONFIG at e0000000 - efffffff Where did it get that bogus "ffffffff" end address? Anyway, that whole MMCONFIG/BAR thing was totally broken to begin with, and it's reverted now in my tree, so I guess it doesn't much matter. Linus --
On Fri, Aug 29, 2008 at 5:08 PM, Linus Torvalds the BAR is from pci_read_bases..., so that chipset is broken... they are even supposed to to hide that BAR to os. YH --
Ok, can we please - *do* get a quirk for known-broken chipsets (at a *PCI* level, this is not an x86 issue) - *not* get any more random PCI work-arounds that go through the x86 tree and aren't even looked at by the (very few) people who actually understand the PCI resource handling? IOW, for the first issue, just teach pci_mmcfg_check_hostbridge() about this broken bridge, and have it fix things up (including hiding the thing, but also just verifying that the dang thing even -works- etc). For the second issue - please do realize that we have had much over a _decade_ of work on the PCI resource handling, and it's fragile. The thing I reverted really isn't something that Ingo should ever have committed in the first place. It's not something an x86 maintainer can even make sane decisions on. Resource handling things _need_ to get ACK's from people like Ivan Kokshaysky or me. Or at least _several_ other people who actually really understand not just PCI resource handling, but have actually seen all the horrible crap it causes, and understand how fragile this stuff is. It's all different, and it's all about all the million of broken machines out there that screw things up. Linus --
On Fri, Aug 29, 2008 at 5:45 PM, Linus Torvalds
the quirk work at the first point for David' system.
[PATCH] x86: protect hpet in BAR for one ATI chipset v3
so avoid kernel don't allocate nre resource for it because it can not
allocate the old
address from BIOS.
the same way like some IO APIC address in BAR handling
v3: device id should be 0x4385
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
drivers/pci/quirks.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -1918,6 +1918,22 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_B
PCI_DEVICE_ID_NX2_5709S,
quirk_brcm_570x_limit_vpd);
+static void __init quirk_hpet_in_bar(struct pci_dev *pdev)
+{
+ int i;
+ u64 base, size;
+
+ /* the BAR1 is the location of the HPET...we must
+ * not touch this, so forcibly insert it into the resource tree */
+ base = pci_resource_start(pdev, 1);
+ size = pci_resource_len(pdev, 1);
+ if (base && size) {
+ insert_resource(&iomem_resource, &pdev->resource[1]);
+ dev_info(&pdev->dev, "HPET at %08llx-%08llx\n", base,
base + size - 1);
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x4385, quirk_hpet_in_bar);
+
#ifdef CONFIG_PCI_MSI
/* Some chipsets do not support MSI. We cannot easily rely on setting
* PCI_BUS_FLAGS_NO_MSI in its bus flags because there are actually
stop working on following path?
[PATCH] x86: split e820 reserved entries record to late v4
sound good, will look at after get lspci -tv and lspci -vvxxx from Rafael.
also quirk between probe::pci_read_bases and pci_resource_survey
YH
--
Now, this is probably fine too in theory, but - you didn't check if the BAR is even enabled, afaik - the other patch - to move the reserved e820 range later - should make No, I think this is worth doing, BUT IT MUST NOT BE MERGED BY JUST SENDING IT TO INGO. It's not an "x86 patch". It's about the PCI resources. And those kinds of patches need to be acked by people who know and understand the PCI resource issues and have some memory of just how broken machines can exist. Linus --
On Fri, Aug 29, 2008 at 7:16 PM, Linus Torvalds i see. YH --
Btw, what was the original regression that commit was a2bd7274b47124d2fc4dfdb8c0591f545ba749dd trying to fix? It's not listed in that commit, even though the commit has a "Bisected-by: David Witbrodt <dawitbro@sbcglobal.net>". In fact, I can find it with google by searching for David Witbrodt bisect and I see that it is 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f. I'm wondering why that commit wasn't just reverted? Because now that I see it, I notice that _that_ is the real bug to begin with. That commit really was buggy. NO WAY can you insert the code/bss/data resources before you've done e820 handling, because it may well be that some strange e820 table contains things that cross the resources. So that original thing was buggy, and made x86-64 do odd thigns. They were doubly odd, since x86-32 did it differently (and better, I think). Then, when actally doing the common arch/x86/kernel/setup.c, the commit that does so _claims_ that the common code came from the 32-bit version, but that doesn't seem to be true, at least wrt this thing. The current setup.c comes from the *broken* cleanup of setup_64.c that had been bisected to be broken. And that, in turn, happened in 41c094fd3ca54f1a71233049cf136ff94c91f4ae ("x86: move e820_resource_resources to e820.c") which also did "and make 32-bit resource registration more like 64 bit.", so it got the bug into 32-bit code that had been introduced in 64-bit code. Ugh. So why was then that other broken commit added to paper it over, even though the original broken commit had been bisected and the breakage was known to have been due to _that_? Hmm? Yinghai - I'm hoping that the code movement is all over and done with, but you need to be a _lot_ more careful here. And Ingo, this really wasn't very well done. Linus --
On Fri, Aug 29, 2008 at 6:11 PM, Linus Torvalds we reverted the commit , David's problem still happen. the root cause is: before 2.6.26, call init_apic_mapping and will insert_resource for lapic address. and then call e820_resource_resouce (with request_resource) to register e820 entries. so the lapic entry in the resource tree will prevent some entry in e820 to be registered. later request_resource for BAR res (==hpet) will succeed. from 2.6.26. we move lapic address registering to late_initcall, so the entry is reserved in e820 getting into resource tree at first. and later pci_resource_survey::request_resource for BAR res (==hpet, 0xfed00000) will fail. so pci_assign_unsigned... will get new res for the BAR, so it messed up hpet setting. solutions will be 1. use quirk to protect hpet in BAR, Ingo said it is not generic. 2. or the one you are reverted... check_bar_with_valid. (hpet, ioapic, mmconfig) --> happenly reveal another problem with Rafael's system/chipset. 3. or sticky resource... , but could have particallly overlapping 4. or don't register reserved entries in e820.. Eric, Nacked. 5. or you sugges, regiser some reserved entries later...., and have insert_resource_expand_to_fit... YH --
So the problem there was that traditionally, e820_reserve_resource() expected to be the first one to populate any resources. That's changed, and that's why it now needs to use "insert_resource()" rather than Yeah, I don't like it. The quirk I was talking about was the one about Yeah, no, we do want reserved entries from e820 to show up to at least Yes. And I do think this is a workable model. Linus --
Ok, and here's the patch to do insert_resource_expand_to_fit(root, new); and while I still haven't actually tested it, it looks sane and compiles to code that also looks sane. I'll happily commit this as basic infrastructure as soon as somebody ack's it and tests that it works (and I'll try it myself soon enough, just for fun) Linus --- include/linux/ioport.h | 1 + kernel/resource.c | 88 ++++++++++++++++++++++++++++++++++------------- 2 files changed, 64 insertions(+), 25 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 22d2115..8d3b7a9 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -109,6 +109,7 @@ extern struct resource iomem_resource; extern int request_resource(struct resource *root, struct resource *new); extern int release_resource(struct resource *new); extern int insert_resource(struct resource *parent, struct resource *new); +extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); extern int allocate_resource(struct resource *root, struct resource *new, resource_size_t size, resource_size_t min, resource_size_t max, resource_size_t align, diff --git a/kernel/resource.c b/kernel/resource.c index f5b518e..72ee95b 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -362,35 +362,21 @@ int allocate_resource(struct resource *root, struct resource *new, EXPORT_SYMBOL(allocate_resource); -/** - * insert_resource - Inserts a resource in the resource tree - * @parent: parent of the new resource - * @new: new resource to insert - * - * Returns 0 on success, -EBUSY if the resource can't be inserted. - * - * This function is equivalent to request_resource when no conflict - * happens. If a conflict happens, and the conflicting resources - * entirely fit within the range of the new resource, then the new - * resource is inserted and the conflicting resources become children of - * the new resource. +/* + * Insert a ...
On Fri, Aug 29, 2008 at 7:56 PM, Linus Torvalds
we need to use insert_resource_split_to_fit instead...
otherwise __request_region will not be happy.
have one shrink one
only work with
|----------------|
|---------------------|
still has problem with
|----------------| |------------| |-----------|
|------------------------------------|
need to get rid of middle one too.
YH
---
arch/x86/kernel/e820.c | 20 +++++++++++++-
include/linux/ioport.h | 2 +
kernel/resource.c | 66 ++++++++++++++++++++++++++++++++++++++++---------
3 files changed, 74 insertions(+), 14 deletions(-)
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1319,8 +1319,24 @@ void __init e820_reserve_resources_late(
res = e820_res;
for (i = 0; i < e820.nr_map; i++) {
- if (!res->parent && res->end)
- insert_resource(&iomem_resource, res);
+#if 1
+ /* test for shrink_with fit */
+ if (!res->parent && res->end) {
+ if (res->start == 0xe0000000)
+ res->start = 0xde000000;
+ }
+#endif
+
+ if (!res->parent && res->end &&
insert_resource(&iomem_resource, res)) {
+
+ printk(KERN_WARNING "found conflict for %s
[%08llx, %08llx], try to insert with shrink\n",
+ res->name, res->start, res->end);
+
+ insert_resource_shrink_to_fit(&iomem_resource, res);
+
+ printk(KERN_WARNING " shrink to %s [%08llx,
%08llx]\n",
+ res->name, res->start, res->end);
+ }
res++;
}
}
Index: linux-2.6/include/linux/ioport.h
===================================================================
--- ...Are you really really sure? Try just removing the IORESOURCE_BUSY. As mentioned, if we expect the PCI BAR's to work with the e820 resources, then BUSY really is simply not right any more. Not that I think it should matter either.. The ones that are added _early_ should be IORESOURCE_BUSY (ie the ones that cover RAM), but the others we now expect to nest with PCI BARs. But since we add them after we have parsed the BAR's, I don't even see why the BUSY bit should even matter - we've already added the fixed BARs, and any newly allocated non-fixed ones shouldn't be allocated in e820 areas _regardless_ of whether the BUSY bit is set or not. So pls explain why it matters? Linus --
On Fri, Aug 29, 2008 at 8:24 PM, Linus Torvalds not all. some are MMCONF, some are for GART, and some for fixed lapic, if we don't add the IORESOURCE_BUSY, why bother to add these entries... good layout from BIOS, it should only reserve mmio range is not showing in BAR. for example: 0xdc000000 - 0xdd000000 for GART ( some offset BAR 0x94) 0xdd000000 - 0xde000000 is for bus 0x80 0xde000000 - 0xdf000000 is for bus 0x00 0xe0000000 - 0xf0000000 is for mmconfig ( CPU set it in MSR for amd fam 10h) if one stupid BIOS set 0xdc000000 - 0x100000000 for reserved. then when in insert that range late we should still have set ranges other than range 0xdd000000 - 0xdf000000 also do we need set other leaf range in 0xdd000000 - 0xdf0000000 ? YH --
we may not need put reserve entries from e820 into resource tree. and only insert those sticky resources (with _BUSY) before pci_assign_unassign and _request_region etc. YH --
You don't understand how the resource allocator works. IORESOURCE_BUSY is really more of a "legacy bit". It has almost no bearing on the actual allocations. Just grep for IORSOURCE_BUSY in kernel/resource.c. The _only_ thing that cares about busy/non-busy is the legact "request_region()" function. That one isn't actually used by any core PCI code - it's more of a driver issue to claim exclusive ownership of particular resources by inserting a marker in that resource. So IORESOURCE_BUSY is a red herring. The only reason I said you can clear it is because you claimed it causes problems, but the more I look at it, the more I think you're likely just mistaken - because IORESOURCE_BUSY doesn't make any difference at all to normal resource handling until you get to actual drivers. The bigger issue is that just inserting the resource (and it really doesn't matter if it is marked busy or not) is in itself a mark of "there's something here". THAT is what all the resource code cares about. The IORESOURCE_BUSY bit is almost immaterial (ie _is_ immaterial except for some very specific cases). And the reason we need to add the e820 resources is exactly so that we don't try to allocate PCI resources on top of some system resources we I agree, but "good layour" and "BIOS" don't really go together. There's Sure, but really, the only point of even caring about e820 resources in the first place has really nothing to do with the BAR's we can see (because the kernel can handle _those_ perfectly well on its own), and has everything to do with teh fact that a lot of devices have invisible resources that we _cannot_ see (ie magic non-standard BAR's for the motherboard chips). And those are exactly why we want to populate the resource map with the e820 information - to avoid having dynamic resources (like Cardbus or PCI hotplug, or just devices that weren't set up statically by the BIOS) be then allocated by the kernel on top of those "invisible" ...
And just to clarify - I think that while you get that error for the qla2xxx driver, I suspect that your actual resource tree is all good, and that the PCI allocations were fine. And then the problem you his is now that the driver literally thinks that some other driver already took that resource. The patch I just sent is not actually the patch I think you should do: the proper patch is to just remove IORESOURCE_BUSY from the e820 resources, simply because they are _not_ indicative of a driver already holding on to the resource. Of course, the sad part is that potentially IORESOURCE_BUSY might actually be a really good bit for exactly that - we've had tons of issues with hardware sensors literally having a kernel driver _and_ a system level driver (ie ACPI), and things get confused exactly because there are now two drivers trying to drive the same piece of hardware. But basically, if you have BAR's and the e820 resource areas co-existing, then the e820 resources shouldn't be marked BUSY. Anyway - to just re-cap - you might as well just ignore the patch I just sent out, and instead just avoid doing that BUSY bit to begin with in the "late e820" case. Simpler and more correct. Linus --
please check fix v3
[PATCH] x86: split e820 reserved entries record to late v4 - fix v3
try to insert_resource second time, by expand the resource...
for case: e820 reserved entry is partially overlapped with bar res...
hope it will never happen
v3: use reserve_region_with_split() instead to hand overlapping
with test case by extend 0xe0000000 - 0xeffffff to 0xdd800000 -
get
e0000000-efffffff : PCI MMCONFIG 0
e0000000-efffffff : reserved
in /proc/iomem
get
found conflict for reserved [dd800000, efffffff], try
to reserve with split
__reserve_region_with_split: (PCI Bus #80)
[dd000000, ddffffff], res: (reserved) [dd800000, efffffff]
__reserve_region_with_split: (PCI Bus #00)
[de000000, dfffffff], res: (reserved) [de000000, efffffff]
initcall pci_subsys_init+0x0/0x121 returned 0 after 381 msecs
in dmesg
YH
--
On Fri, Aug 29, 2008 at 8:24 PM, Linus Torvalds
please check
__request_region: conflict: (reserved) [dd000000, efffffff], res:
(qla2xxx) [ddffc000, ddffffff]
busy flag
qla2xxx 0000:83:00.0: BAR 1: can't reserve mem region [0xddffc000-0xddffffff]
YH
...
Initializing cgroup subsys cpuset...............................................
Linux version 2.6.27-rc5-tip-00672-ge5c5407-dirty (yhlu@linux-zpir)
(gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision
135036] (SUSE Linux) ) #220 SMP Fri Aug 29 22:02:53 PDT 2008..
Command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 i8042.noaux initcall_debug apic=verbose pci=routeirq
ip=dhcp load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.27_k8.1
done
KERNEL supported cpus:s
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls done
BIOS-provided physical RAM map: done
BIOS-e820: 0000000000000000 - 0000000000097400 (usable) done
BIOS-e820: 0000000000097400 - 00000000000a0000 (reserved) done
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)s.
BIOS-e820: 0000000000100000 - 00000000d7fa0000 (usable)
BIOS-e820: 00000000d7fae000 - 00000000d7fb0000 (usable)
BIOS-e820: 00000000d7fb0000 - 00000000d7fbe000 (ACPI data)
BIOS-e820: 00000000d7fbe000 - 00000000d7ff0000 (ACPI NVS)
BIOS-e820: 00000000d7ff0000 - 00000000d8000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000002028000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (Kernel
code) ...Ok, this is actually when the driver wants to reserve the BAR, and then it
norices that there is an existing "reservation" there.
So yes, drivers will care - they literally will think that somebody else
owns their resource if they have a BUSY resource inside of them. So this
is a driver protecting against another driver.
The sad part is that it looks like it's entirely due to the PCI code
trying to emulate an ISA driver model, and use a flat resource space - so
it hits the upper resources first.
Does this patch make a difference? It actually removes a fair chunk of
code, by just saying "we really don't care if the resource is IO or MEM,
we just want to reserve space inside of it, regardless of type".
Untested - obviously.
Linus
---
drivers/pci/pci.c | 26 +++++++++-----------------
1 files changed, 9 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c9884bb..a3de4fe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1304,15 +1304,11 @@ pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge)
void pci_release_region(struct pci_dev *pdev, int bar)
{
struct pci_devres *dr;
+ struct resource *res = pdev->resource + bar;
if (pci_resource_len(pdev, bar) == 0)
return;
- if (pci_resource_flags(pdev, bar) & IORESOURCE_IO)
- release_region(pci_resource_start(pdev, bar),
- pci_resource_len(pdev, bar));
- else if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM)
- release_mem_region(pci_resource_start(pdev, bar),
- pci_resource_len(pdev, bar));
+ __release_region(res, pci_resource_start(pdev, bar), pci_resource_len(pdev, bar));
dr = find_pci_dr(pdev);
if (dr)
@@ -1336,20 +1332,16 @@ void pci_release_region(struct pci_dev *pdev, int bar)
int pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
{
struct pci_devres *dr;
+ struct resource *res = pdev->resource + bar;
if (pci_resource_len(pdev, bar) == 0)
return 0;
-
- if ..... and it even works (apart from a missing '\n' for the expansion report
;).
I tested it with the appended silly test-case, and it shows
...
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000160000000 (usable)
Expanded resource Kernel dummy due to conflict with Kernel code
Expanded resource Kernel dummy due to conflict with Kernel data
last_pfn = 0x160000 max_arch_pfn = 0x3ffffffff
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
...
and /proc/iomem shows
...
00100000-9cf64fff : System RAM
00200000-006ea27f : Kernel dummy
00200000-00561f37 : Kernel code
00561f38-006ea27f : Kernel data
00777000-007d6cc7 : Kernel bss
...
so it correctly expanded that "Kernel dummy" resource to cover the
resources it had clashed with.
And no, it's not perfect. We certainly _could_ split things instead. But I
hope that odd "e820 resources were bogus" case almost never would actually
trigger in practice, and the expansion case is not only simpler, it's also
slightly more robust in the sense that a single big resource is likely to
fit the things we need than multiple smaller resources that have been
chopped up.
Linus
--- dummy test patch for the 'insert-resource-expand-to-fit' thing ---
arch/x86/kernel/setup.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 362d4e7..6265a38 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -578,6 +578,14 @@ static struct x86_quirks default_x86_quirks __initdata;
struct x86_quirks *x86_quirks __initdata = &default_x86_quirks;
+static struct resource dummy_resource = {
+ .name = "Kernel dummy",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
+
/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use ...On Fri, Aug 29, 2008 at 7:33 PM, Linus Torvalds orginally it works, because lapic address entry open the big hole for if update res->end according mmconfig end, before insert it forcibly, BTW, insert_resource_expand_to_fit need to be replaced with insert_resource_split_to_fit.... test stub reveal expand will make __request_region not working for some devices...because reserved_entries from e820 take IORESOUCE_BUSY... YH --
Except it's still a horrible patch that special-cases all the wrong things (ie random resources that we just happen to know that ACPI etc cares about). There's no way to know in general if ACPI might care deeply where some random resource is (say, graphics memory) and it might be done with a BAR. Well, we should probably just remove the IORESOURCE_BUSY part. Again, that comes from the fact that the e820 resources used to _override_ everything - they were inserted first, and nothing else was _ever_ allowed to allocate in that region. But if we're changing that, then the whole IORESOURCE_BUSY part doesn't make sense. In fact, in general, IORESOURCE_BUSY doesn't much make sense any more in general, because it was actually more of an ISA-timeframe locking model saying "you can't touch this region". But if the whole point is that we now try to allow PCI device BAR's and the e820 maps to co-exist, then the whole - and only - reason for IORESOURCE_BUSY for them goes away.. Linus --
On Fri, Aug 29, 2008 at 5:08 PM, Linus Torvalds we need to handle it. otherwise if the BAR go first, and it will stop other BARs to be registered... a quirk should do the work.... Rafael, can you send out lspci -tv and lspci --vvxxx too. YH --
00:00.0 Host bridge: ATI Technologies Inc RD790 Northbridge only dual slot PCI-e_GFX and HT3 K8 part Subsystem: ATI Technologies Inc RD790 Northbridge only dual slot PCI-e_GFX and HT3 K8 part Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Region 3: Memory at <ignored> (64-bit, non-prefetchable) Capabilities: [c4] HyperTransport: Slave or Primary Interface Command: BaseUnitID=0 UnitCnt=12 MastHost- DefDir- DUL- Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn- Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn- Revision ID: 3.00 Link Frequency 0: [b] Link Error 0: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 0: 200MHz+ 300MHz- 400MHz+ 500MHz- 600MHz+ 800MHz+ 1.0GHz+ 1.2GHz+ 1.4GHz- 1.6GHz- Vend- Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD- Link Frequency 1: 200MHz Link Error 1: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 1: 200MHz- 300MHz- 400MHz- 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend- Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE- CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE- Prefetchable memory behind bridge Upper: 00-00 Bus Number: 00 Capabilities: [40] HyperTransport: Retry Mode Capabilities: [54] HyperTransport: UnitID Clumping Capabilities: [9c] HyperTransport: #1a 00: 02 10 56 59 06 00 30 22 00 00 00 06 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 e0 so bar3 of 00:00.0 has oxe0000000 - 0xffffffff and request_resource failed, so Region 3: Memory at <ignored> (64-bit, non-prefetchable) BIOS should hide ...
Could you please rebase them on top of current -git? Rafael --
please check attached quilt series based on linus tree. YH
there is some problem with fix -v4...on one test machine. please don't use it now YH --
dmesg -s 262144 http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/dmesg-test.log cat /proc/iomem http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/iomem-test.txt Thanks, Rafael --
calling pci_subsys_init+0x0/0x120 PCI: Using ACPI for IRQ routing request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:01) [9000, 9fff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:01) [fe700000, fe7fffff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:01) [d8000000, dfffffff] conflict 0 request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:02) [a000, bfff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:02) [fe800000, fe8fffff] conflict 0 request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:03) [c000, cfff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:03) [fe900000, fe9fffff] conflict 0 request_resource: root: (PCI IO) [0, ffff], new: (PCI Bus 0000:04) [d000, efff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:04) [fea00000, feafffff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (PCI Bus 0000:05) [feb00000, febfffff] conflict 0 request_resource: root: (PCI mem) [0, ffffffffffffffff], new: (0000:00:00.0) [e0000000, ffffffff] conflict 1 pci 0000:00:00.0: BAR 3: can't allocate resource so pci_resource_survey is depth first. sub buses request some resource at first... we don't need quirk to handle that strange BAR res. and we got reserved register correctly in /proc/iomem d7fe0000-d7ffffff : reserved .. fff00000-ffffffff : reserved for BIOS-e820: 00000000d7fe0000 - 00000000d8000000 (reserved) BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved) YH --
Heh, interesting, since we were talking about reverting that one for other
reasons entirely.
See the thread "x86: split e820 reserved entries record to late" (yeah, I
know that subject isn't very grammatical or sensible) for some patches
worth trying _after_ you've reverted that one.
Anyway, clearly that commit needs to be reverted regardless, so I'll do
the revert. Can you please test the appended test-patch by Yinghai on top
of the revert?
(This is not the final version, but it should be sufficient to be tested)
And if you have the whole dmesg, that would be useful.
Linus
---
From: Yinghai Lu <yhlu.kernel@gmail.com>
Subject: [PATCH] x86: split e820 reserved entries record to late v3
Date: Thu, 28 Aug 2008 17:41:29 -0700
so could let BAR res register at first, or even pnp?
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
please drop the one in tip/x86/core use this one instead
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/e820.c | 20 ++++++++++++++++++--
arch/x86/pci/i386.c | 3 +++
include/asm-x86/e820.h | 1 +
3 files changed, 22 insertions(+), 2 deletions(-)
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,13 +1271,15 @@ static inline const char *e820_type_to_s
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
+struct resource __initdata *e820_res;
void __init e820_reserve_resources(void)
{
int i;
- struct resource *res;
u64 end;
+ struct resource *res;
res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+ e820_res = res;
for (i = 0; i < e820.nr_map; i++) {
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,8 @@ void __init e820_reserve_resources(void)
res->end = end;
res->flags = ...dmesg from -rc5 with the offending commit reverted and with the patch below applied is at: http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/2.6.27-rc5-git.log Thanks, --
Ok, the more I look at this, the more interesting it gets. In particular, this: ... ACPI: bus type pnp registered pnp 00:08: mem resource (0xfec00000-0xfec00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp 00:08: mem resource (0xfee00000-0xfee00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp 00:09: mem resource (0xffb80000-0xffbfffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp 00:09: mem resource (0xfff00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp 00:0b: mem resource (0xe0000000-0xefffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp 00:0c: mem resource (0xfec00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling pnp: PnP ACPI: found 13 devices ACPI: ACPI bus type pnp unregistered SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing pci 0000:00:00.0: BAR 3: can't allocate resource ... there's a few things to note here: - the resource at 0000:00:00.0 BAR 3 is totally bogus. We know it's totally bogus because you actually have other resources in the 0xf....... range, and they work fine. It's also likely to be totally bogus because it so happens that the end-point of 0xffffffff is commonly something that the BIOS leaves as a "I sized this resource", because that's how resources are sized (you write all ones into them and look what you can read back). But your lspci -vxx output clearly shows that (a) MEM is enabled in the command word, and yes, the BAR register at 0x18 does indeed have value 0xe0000000. So it's just the length that is really bogus. - pnp clearly sees that bogus resource at 0xe0000000-0xffffffff - BUT: the "can't allocate resource" thing is from ...
On Sat, Aug 30, 2008 at 10:39 AM, Linus Torvalds again, should use MCFG end as the res _end YH --
No. Again - we shouldn't DO that insane crap. We simply shouldn't try to compare the BAR start with randomly chosen things. So the crap got reverted, and it's not going to get done again. Get over it. Linus --
On Sat, Aug 30, 2008 at 11:43 AM, Linus Torvalds do you agree to use quirk to make the BAR res to have correct end between pci_probe and pci_resource_survey? YH --
In general I would agree, but now that I've looked at it a bit more, I actually don't think it's a bug in the chipset any more. See my previous email that crossed with yours. I suspect that that northbridge resource is basically acting as a bridge resource. So 0xe0000000 - 0xffffffff is actually _correct_. And MCFG being in that window (and being first in it) is just a detail. Look at the resource allocations on Rafael's machine: there are two different classes: - outside that BAR3 window: The "external gfx0 port A" decode (bridged by device 0000:02.0): d8000000-dfffffff : PCI Bus 0000:01 d8000000-dfffffff : 0000:01:00.0 d8000000-d8ffffff : vesafb and suspect the graphics port is special (considering that this is an ATI chipset) - inside that BAR3 window: everything else (PCI express): e0000000-efffffff : PCI MMCONFIG 0 fe6f4000-fe6f7fff : 0000:00:14.2 fe6f4000-fe6f7fff : ICH HD audio fe6fa000-fe6fafff : 0000:00:13.4 fe6fa000-fe6fafff : ohci_hcd fe6fb000-fe6fbfff : 0000:00:13.3 fe6fb000-fe6fbfff : ohci_hcd fe6fc000-fe6fcfff : 0000:00:13.2 fe6fc000-fe6fcfff : ohci_hcd fe6fd000-fe6fdfff : 0000:00:13.1 fe6fd000-fe6fdfff : ohci_hcd fe6fe000-fe6fefff : 0000:00:13.0 fe6fe000-fe6fefff : ohci_hcd fe6ff000-fe6ff0ff : 0000:00:13.5 fe6ff000-fe6ff0ff : ehci_hcd fe6ff800-fe6ffbff : 0000:00:12.0 fe6ff800-fe6ffbff : ahci fe700000-fe7fffff : PCI Bus 0000:01 fe7c0000-fe7dffff : 0000:01:00.0 fe7e0000-fe7effff : 0000:01:00.1 fe7f0000-fe7fffff : 0000:01:00.0 fe800000-fe8fffff : PCI Bus 0000:02 fe8ffc00-fe8fffff : 0000:02:00.0 fe8ffc00-fe8fffff : ahci fe900000-fe9fffff : PCI Bus 0000:03 fe9c0000-fe9dffff : 0000:03:00.0 fe9fc000-fe9fffff : 0000:03:00.0 fe9fc000-fe9fffff : sky2 fea00000-feafffff : PCI Bus 0000:04 feaffc00-feafffff : 0000:04:00.0 feaffc00-feafffff : ahci feb00000-febfffff : PCI Bus 0000:05 febff000-febfffff : 0000:05:08.0 febff000-febff7ff ...
On Sat, Aug 30, 2008 at 12:31 PM, Linus Torvalds wonder: in old kernel, after BAR3 request_filed, pci_assigned_unassigned should get update resource for that... but it could find that big space for it. that is interesting... YH --
please check [PATCH] x86: split e820 reserved entries record to late v4 [PATCH] x86: split e820 reserved entries record to late v4 - fix v6 YH --
What kernel should I apply those to and in what order? Rafael --
linus git tree 1. [PATCH] x86: split e820 reserved entries record to late v4 2. [PATCH] x86: split e820 reserved entries record to late v4 - fix v6 tip/master 1. Resource handling: add 'insert_resource_expand_to_fit()' function 2. [PATCH] x86: split e820 reserved entries record to late v4 - fix v6 YH
actually it is almost the same to tar ball send you for your system... YH --
I've just tested these two patches on top of the current Linus' tree and the system works normally. Thanks, Rafael --
thanks, David, can you test those two patches on top of linus tree? YH --
Can you try attached in addition to those to patches ? want to check if the BAR3 get new resource..., and after that what could happen... YH
Works, dmesg is at: http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/2.6.27-rc5-test.log Please let me know if you want it with any more command line options. Thanks, Rafael --
That BAR is indeed "locked". Now that we try to reallocate it, you get this in the log: pci 0000:00:00.0: BAR 3: error updating (0x40000004 != 0xe0000004) pci 0000:00:00.0: BAR 3: error updating (high 0x000001 != 0x000000) ie now the code _tried_ to update the BAR to point to 0x1_4000_0000 instead, but the hardware refused, and it is still at 0x0_e000_0000. So Yinghai's patch "worked", but it worked by doing nothing. See my earlier guess about locked read-only resources a few emails back. IOW, I'm not at all surprised. I really do suspect that that BAR is some very special "this is the HT->PCIE region" BAR. Linus --
On Sun, Aug 31, 2008 at 10:42 AM, Linus Torvalds so the code could allocate the 64 bit resource above 4g,... wonder how the probe could find out the size of is 1fff_ffff.. YH --
Heh. That's how PCI sizing works: you write all ones to the register, and read back the result. The low bits won't change, and that indicates the size. But if _none_ of the bits change, then that simply means that the size will be calculated to be 0xffffffff-start. So the sizing will "work", it will just always report that the BAR covers everything from start to the 4G limit. Linus --
On Sun, Aug 31, 2008 at 11:03 AM, Linus Torvalds
how about
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cce2f4c..3b5269a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -240,6 +240,11 @@ static int __pci_read_base(struct pci_dev *dev,
enum pci_bar_type type,
pci_read_config_dword(dev, pos, &l);
pci_write_config_dword(dev, pos, mask);
pci_read_config_dword(dev, pos, &sz);
+
+ /* sticky and non changable */
+ if (sz == l)
+ goto fail;
+
pci_write_config_dword(dev, pos, l);
/*
Rafael,
can you check attach one to see if we still have warning ?
YH
No, because a resource really _can_ be at the end. It's perfectly ok to have something like a memory resource at 0xff000000-0xffffffff, and then the BAR register would always read 0xff000000 (or 0x...4 for a 64-bit resource). So calling that a failure case would be wrong. Linus --
Exactly. So what happens is that it doesn't actually re-allocate it at all. Not that it is necessarily even possible - it's quite possible that that field is effectively locked some way and is read-only. Without knowing the chipset details, we can only guess. Linus --
On Sat, Aug 30, 2008 at 3:41 PM, Linus Torvalds wait, THAT BAR is 64BIT capable, So kernel should assign 64bit range to it... not touch it at this time... except Jordan could find some clue with the DOC. YH --
I don't think we've ever done new allocations in 64 bits. Although looking for it, I have to admit that I don't see what would limit us right now. There used to be some paths that weren't 64-bit clean, but I think we fixed all of those. Linus --
On Sat, Aug 30, 2008 at 4:28 PM, Linus Torvalds would be some corner case... didn't see anything there. calling pcibios_assign_resources+0x0/0x90 request_resource: root: (PCI Bus 0000:01) [fe700000, fe7fffff], new: (0000:01:00.0) [fe7c0000, fe7dffff] conflict 0 request_resource: root: (PCI Bus 0000:03) [fe900000, fe9fffff], new: (0000:03:00.0) [fe9c0000, fe9dffff] conflict 0 pci 0000:00:02.0: PCI bridge, secondary bus 0000:01 pci 0000:00:02.0: IO window: 0x9000-0x9fff pci 0000:00:02.0: MEM window: 0xfe700000-0xfe7fffff pci 0000:00:02.0: PREFETCH window: 0x000000d8000000-0x000000dfffffff pci 0000:00:04.0: PCI bridge, secondary bus 0000:02 pci 0000:00:04.0: IO window: 0xa000-0xbfff pci 0000:00:04.0: MEM window: 0xfe800000-0xfe8fffff pci 0000:00:04.0: PREFETCH window: disabled pci 0000:00:06.0: PCI bridge, secondary bus 0000:03 pci 0000:00:06.0: IO window: 0xc000-0xcfff pci 0000:00:06.0: MEM window: 0xfe900000-0xfe9fffff pci 0000:00:06.0: PREFETCH window: disabled pci 0000:00:07.0: PCI bridge, secondary bus 0000:04 pci 0000:00:07.0: IO window: 0xd000-0xefff pci 0000:00:07.0: MEM window: 0xfea00000-0xfeafffff pci 0000:00:07.0: PREFETCH window: disabled pci 0000:00:14.4: PCI bridge, secondary bus 0000:05 pci 0000:00:14.4: IO window: disabled pci 0000:00:14.4: MEM window: 0xfeb00000-0xfebfffff pci 0000:00:14.4: PREFETCH window: disabled pci 0000:00:02.0: setting latency timer to 64 pci 0000:00:04.0: setting latency timer to 64 pci 0000:00:06.0: setting latency timer to 64 pci 0000:00:07.0: setting latency timer to 64 YH --
pci_assign_unassigned_resources==>pci_bus_assign_resources==>pbus_assign_resources_sorted(struct
static void pbus_assign_resources_sorted(struct pci_bus *bus)
{
struct pci_dev *dev;
struct resource *res;
struct resource_list head, *list, *tmp;
int idx;
head.next = NULL;
list_for_each_entry(dev, &bus->devices, bus_list) {
u16 class = dev->class >> 8;
/* Don't touch classless devices or host bridges or ioapics. */
if (class == PCI_CLASS_NOT_DEFINED ||
class == PCI_CLASS_BRIDGE_HOST)
continue;
it skips the host bridge...
YH
--
what's story for not touching host bridges? YH --
Ahh. Exactly because of things like this. The hist bridge BAR's are often
special.
That code comes from almost four years ago, the commit message was:
Author: Maciej W. Rozycki <macro@mips.com>
Date: Thu Dec 16 21:44:31 2004 -0800
[PATCH] PCI: Don't touch BARs of host bridges
BARs of host bridges often have special meaning and AFAIK are best left
to be setup by the firmware or system-specific startup code and kept
intact by the generic resource handler. For example a couple of host
bridges used for MIPS processors interpret BARs as target-mode decoders
for accessing host memory by PCI masters (which is quite reasonable).
For them it's desirable to keep their decoded address range overlapping
with the host RAM for simplicity if nothing else (I can imagine running
out of address space with lots of memory and 32-bit PCI with no DAC
support in the participating devices).
This is already the case with the i386 and ppc platform-specific PCI
resource allocators. Please consider the following change for the generic
allocator. Currently we have a pile of hacks implemented for host bridges
to be left untouched and I'd be pleased to remove them.
From: "Maciej W. Rozycki" <macro@mips.com>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
and we've had other things where host bridges are special (ie iirc, if you
turn off PCI_COMMAND_MEM from a host bridge, it stops access to real RAM
from the CPU for some bridges - so you must never turn those things off or
you get a dead system).
(But at least Intel host bridges will just ignore writes to the CMD
register, I think - you cannot turn MEM off).
Linus
--
On Sat, Aug 30, 2008 at 8:00 PM, Linus Torvalds then 1. we should not probe them in probe.c 2. at least we should not try to request_resource for them in pcibios_resource_survey... just pretend that they are not existing. YH --
You are missing the fact that we need to know where existing resources are, even if we can't do anything about them! Read my explanation from yesterday about why we need to add the e820 resources to the resource map in the first place. Short recap: - we need to populate the resource map with as much possible information about the system as we can.. - .. because when we assign _dynamic_ resources, we need to make sure that they don't clash with random system resources that we don't really otherwise have a lot of visibility into. So the resource tree is not just about resources we control, it's also about resources that others control(led) and we don't necessarily know a lot about. Linus --
Btw, this is a problem that we seldom actually have on most desktops, because the BIOS will normally set up just about _all_ the resources, and we seldom have to worry about anything but just enumerating them (and the occasional buggy setup). The problems with resource allocation mostly happen on laptops, and especially with cardbus controllers. Now, that's obviously going away (people mostly use USB for most things that Cardbus/PCMCIA was used for a few years ago), but it still exists and with docking stations etc it can actually be even worse (although that's mainly because access to docking stations is much more limited, I suspect). So what used to happen _all_ the time was that cardbus worked fine on 99% of all machines, but then some machines would lock up when you inserted a card in them, or the card just wouldn't work. And the reason was that some stupid motherboard resource (like the ACPI sleeping registers or the LPC control regs) were not done as a normal BAR, so the kernel wouldn't know about them, and the BIOS didn't necessarily even list it because it never mattered with Windows (since Windows has a different algorithm for laying out the bus resources, and wouldn't hit the magic resource). So this is why we populate the resources with everything we can _possibly_ try to find, including hardware-specific quirks (see things like quirk_ali7101_acpi or all the quirk_ich4_lpc_acpi things etc) for finding resources that aren't done by BAR's. And the hardware quirks have generally worked pretty well. I'd love to add some quirk for the RD790 chipset, but I'd like to know what the rules are for it. I know we have some AMD contacts, I wonder if they could give docs (I don't personally do NDA's, but I can do "gentleman's agreements" where I just say I won't spread things further, as long as I can write code based on them. I know other kernel developers do similar things). Jordan? Linus --
Btw, looking at that bogus BAR#3 some more: I don't actually think it's even an MCFG resource. I think it's literally the resource that describes the HT window for the host bridge. So it's literally like the "root" resource - all external MMIO resources that go over HT have to be in that window. IOW, I'm starting to think that it's not even broken. It is probably perfectly real. It's not a "PCI bridge" in the sense that it doesn't bridge one PCI bus to another, but it's a host bridge, and it bridges the CPU memory accesses to another bus. The fact that the MCFG area happens to be at the start of that window is probably just a random detail. Does anybody know how to find chipset docs for AMD/ATI chipsets? I find CPU docs, and the GPU docs, but not the 790 chipset docs anywhere (yeah, it looks promising with a link that says "AMD 790FX Chipset Specifications", but the link just takes you to some trivial overview, not any actual specs. Anybody? Linus --
There are some at: http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_15137,00.html Well, that's 690/SB600 only and I'm not sure how useful this is. Thanks, Rafael --
On Sat, Aug 30, 2008 at 12:14 PM, Linus Torvalds AMD CPU/NB (quad core aka fam 10h later) has MSR to state MMCONFIG, and the ATI bridge BAR that have same address for MMCONFIG not even have chance to decode that. it seems ATI chipset doesn't have public version of doc...like reg info and BIOS/Kernel porting guide. YH --
Ok, so it's similar to the local APIC in that respect (and presumably IO APIC too, I haven't checked). But that still just implies that the BAR probably means something else totally, and the fact that it happens to have the same value as the MCFG Yeah, I'm not finding anything either. The 690G databook that Rafael pointed to does mention the config registers in passing, but it's really just about electricals (pin setup etc). No BIOS writers guide indeed.. Linus --
On Sat, Aug 30, 2008 at 12:41 PM, Linus Torvalds Those BIOS porting guide need extra NDA... they don't want to everyone know that there is lots workaround for their silicon bugs. YH --
Well, I thought something like this happened, but I wasn't quite sure about the exact mechanism. Thanks for the explanation. :-) Rafael --
Just to be sure... Does "helps" imply that unresolved AHCI behavior exists after reverting that commit? Thanks, Jeff --
No, after reverting this commit AHCI works normally. Thanks, Rafael --
r8169 is not working on an Aspire One. It looked like working some time,
but now it has begun to say:
Sep 1 01:09:35 one klogd: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: cache line size of 32 is not supported
Sep 1 01:09:35 one klogd: r8169 0000:02:00.0: PCI INT A disabled
Sep 1 01:09:35 one klogd: r8169: probe of 0000:02:00.0 failed with error -22
Any ideas ? Any more info needed ?
TIA
one:/var/log# lspci -vv -s 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E PCI Express Fast Ethernet controller (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel modules: r8169
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP
--
