This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.
Sometimes that can be done by just requesting a resource. This hook is to
cover cases where protected area doesn't fit well in the hierarchical
resource tree. For example, x86 BIOS E820 reservations are not related
to devices, so they may overlap part of, all of, or more than a device
resource.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
include/linux/ioport.h | 1 +
kernel/resource.c | 8 ++++++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index d377ea8..b92b8b4 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -124,6 +124,7 @@ extern void reserve_region_with_split(struct resource *root,
extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new);
extern int insert_resource(struct resource *parent, struct resource *new);
extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
+extern void arch_remove_reservations(struct resource *avail);
extern int allocate_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
resource_size_t max, resource_size_t align,
diff --git a/kernel/resource.c b/kernel/resource.c
index 9fad33e..246957e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -374,6 +374,10 @@ int __weak page_is_ram(unsigned long pfn)
return walk_system_ram_range(pfn, 1, NULL, __is_ram) == 1;
}
+void __weak arch_remove_reservations(struct resource *avail)
+{
+}
+
static resource_size_t simple_align_resource(void *data,
const struct resource *avail,
resource_size_t size,
@@ -426,6 +430,7 @@ static int find_resource_from_top(struct resource *root, struct resource *new,
struct resource *this;
struct resource tmp, avail, alloc;
+ tmp.flags = ...This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas. This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.
We previously avoided this area in pcibios_align_resource(). This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/resource.c | 14 ++++++++++++++
arch/x86/pci/i386.c | 3 ---
3 files changed, 15 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/kernel/resource.c
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 9e13763..1e99475 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -45,6 +45,7 @@ obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o
obj-y += pci-iommu_table.o
+obj-y += resource.o
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
obj-y += process.o
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
new file mode 100644
index 0000000..5dd6473
--- /dev/null
+++ b/arch/x86/kernel/resource.c
@@ -0,0 +1,14 @@
+#include <linux/ioport.h>
+#include <asm/e820.h>
+
+void arch_remove_reservations(struct resource *avail)
+{
+ /*
+ * Trim out the area reserved for BIOS (low 1MB). We could also remove
+ * E820 "reserved" areas here.
+ */
+ if (avail->flags & IORESOURCE_MEM) {
+ if (avail->start < BIOS_END)
+ avail->start = BIOS_END;
+ }
+}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index c4bb261..b261011 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -77,9 +77,6 @@ pcibios_align_resource(void *data, const struct resource *res,
*/
if (!skip_isa_ioresource_align(dev))
start &= ~0x300;
- } else if ...This allows platform quirks to fabricate PNP devices to describe things
that should have been described via ACPI. For example, if the BIOS writer
forgets to describe a device, we may assign its address space to another
device, causing a conflict. We can avoid the conflict by making a fake PNP
device to stand in for the one the BIOS missed.
In that case, there's no ACPI or PNPBIOS device, so we need a new
pnp_protocol that doesn't go back to firmware for get/set/wakeup/
suspend/etc.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
drivers/pnp/base.h | 2 ++
drivers/pnp/core.c | 9 ++++++++-
drivers/pnp/quirks.c | 15 +++++++++++++++
3 files changed, 25 insertions(+), 1 deletions(-)
diff --git a/drivers/pnp/base.h b/drivers/pnp/base.h
index 19bc736..dca301e 100644
--- a/drivers/pnp/base.h
+++ b/drivers/pnp/base.h
@@ -7,6 +7,8 @@ extern spinlock_t pnp_lock;
extern struct device_attribute pnp_interface_attrs[];
void *pnp_alloc(long size);
+void platform_pnp_fixups(void);
+
int pnp_register_protocol(struct pnp_protocol *protocol);
void pnp_unregister_protocol(struct pnp_protocol *protocol);
diff --git a/drivers/pnp/core.c b/drivers/pnp/core.c
index 0f34d96..5076493 100644
--- a/drivers/pnp/core.c
+++ b/drivers/pnp/core.c
@@ -212,7 +212,14 @@ void __pnp_remove_device(struct pnp_dev *dev)
static int __init pnp_init(void)
{
- return bus_register(&pnp_bus_type);
+ int ret;
+
+ ret = bus_register(&pnp_bus_type);
+ if (ret)
+ return ret;
+
+ platform_pnp_fixups();
+ return 0;
}
subsys_initcall(pnp_init);
diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index dfbd5a6..f18bb69 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -15,6 +15,7 @@
#include <linux/types.h>
#include <linux/kernel.h>
+#include <linux/dmi.h>
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/pnp.h>
@@ -337,3 +338,17 @@ void pnp_fixup_device(struct pnp_dev *dev)
f->quirk_function(dev);
}
...The HP nx6325 BIOS doesn't report any devices in the [0xf8000000-0xfbffffff] region via ACPI devices or the E820 memory map, but when we assign it to the 00:14.4 bridge as a prefetchable memory window, the machine hangs. I determined experimentally that there are only three 1MB regions in that area that cause trouble, so this fixup builds a fake PNP device that consumes those regions. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23332 Reported-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> --- drivers/pnp/quirks.c | 30 ++++++++++++++++++++++++++++++ 1 files changed, 30 insertions(+), 0 deletions(-) diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c index f18bb69..e7de402 100644 --- a/drivers/pnp/quirks.c +++ b/drivers/pnp/quirks.c @@ -343,7 +343,37 @@ static struct pnp_protocol pnp_fixup_protocol = { .name = "Plug and Play fixup", }; +static int __init hp_nx6325_fixup(const struct dmi_system_id *d) +{ + struct pnp_dev *dev; + + /* + * The BIOS apparently forgot to describe some regions in the + * address map. See https://bugzilla.kernel.org/show_bug.cgi?id=23332 + */ + + dev = pnp_alloc_dev(&pnp_fixup_protocol, 0, "LNXHAZRD"); + if (!dev) + return 0; + + dev->active = 1; + pnp_add_mem_resource(dev, 0xf8300000, 0xf83fffff, 0); + pnp_add_mem_resource(dev, 0xf8500000, 0xf85fffff, 0); + pnp_add_mem_resource(dev, 0xf9100000, 0xf91fffff, 0); + pnp_add_device(dev); + dev_info(&dev->dev, "added to work around BIOS defect\n"); + return 0; +} + static const struct dmi_system_id pnp_fixup_table[] __initconst = { + { + .callback = hp_nx6325_fixup, + .ident = "HP nx6325 laptop", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Compaq nx6325"), + }, + }, {} }; --
Quite frankly, I think this patch sucks.
It sucks because these kinds of hw-specific patches are fundamentally
a sign of something else being wrong. Why didn't windows hit this? Why
do we need this total hack?
And is there any reason at all to believe that that one particular
laptop is really special? I doubt it. And what happens for the next
random machine that comes along an hits this?
Maybe we should just say that if we know the bridge is negative
decode, and it hasn't been set up by the BIOS, we just don't allocate
it at all. And try to look like Windows.
Or figure out what else Windows is doing differently.
The whole "allocate bottom up" old PCI allocation has _years_ of
testing and quirk that have been gathered over a long time. We can't
just say "we'll do the same thing for the top-down allocator".
The WHOLE AND ONLY POINT of the top-down allocator was to act lik
Windows and not need crap like this. If that doesn't work, then I
seriously don't think we should change bottom-up to top-down at all,
and for 2.6.37 we should just revert the "set to top-down by default".
Seriously. That "whole and only point" thing is important. If we need
hacks like this, then we shouldn't do it. We're much better off with
the model that has year of testing an not the upheaval. Top-down
allocation is in _no_ way inherently better, the only excuse for it
was supposed to be "we don't need these kinds of hooks".
Linus
--
On Sun, Dec 12, 2010 at 1:30 PM, Linus Torvalds I agree, I've got an NX6125 predecessor of this and I'll take any money that is has an equally insane BIOS, considering its the worst laptop in my pile in nearly every way imaginable. and I suspect the HP laptop braindamage won't end with those two. Dave. --
I agree, it *does* suck, and I *am* quite worried about how many
issues like this we might trip over.
Windows didn't hit this because of other differences:
- Windows relies on subtractive decode; Linux programs a window
- Windows gives the downstream CardBus bridge a 4K and a 64M mem window;
Linux gives it two 64M windows
- Windows doesn't align the CardBus windows on their size; Linux does
Under Windows, the CardBus windows don't conflict with the unreported
devices. The larger windows allocated by Linux do. So relying on
subtractive decode would work until we plug in a CardBus device that
I do like this idea more than I did at first, even though it's not a
complete fix, because it's much better to have a non-working CardBus
device than a hanging machine. But I guess we'd still want to fix
that device, and then we're back at a hw-specific quirk like this one.
True (although most of the quirks I can think of are in the form of
hard-coded legacy device reservations, and we're keeping those).
If we didn't care about host bridge _CRS, there'd be no reason to
change the old PCI allocation scheme. But I think we *do* care and
will care more in the future. We frequently assign resources to
devices the BIOS didn't configure, and that only works if we're lucky
Not really -- the main point here is to make multi-host bridge
machines work reliably, and I really don't see a way to do that
without using _CRS.
If we're going to use _CRS, I think in the long run we'll be better
off if we do it similarly to Windows, despite these early problems.
Bjorn
--
It's not about any "despite these early problems".
It's about "clearly we're not doing things at all like Windows, and
it's just broken".
The thing is, we will never be able to match Windows exactly. It may
well have random hardcoded quirks we simply don't know about.
I'm perfectly happy with you aiming to use _CRS. I am _not_ happy with
you then using that as an excuse to then do things that don't work.
We will NOT start doing random BIOS-specific quirks just because
top-down allocations hit other bugs than bottom-up ones do. Just no.
We'll continue doing that we have tried to do, which is to perhaps
have quirks that are specific to *hardware* (like the ones in
drivers/pci/quirks.c) and just filling in stuff that some BIOSes are
known to get wrong.
That's a maintainable approach. But it's maintainable ONLY if we then
don't do other random changes that invalidates all the years of
testing we've had.
Linus
--
On Tue, Dec 14, 2010 at 12:34 PM, Linus Torvalds
Btw, looking at all the x86-specific commits that have gone in, I'm
*extremely* unhappy that they apparently stopped honoring that
"resource_alloc_from_bottom" flag that I explicitly asked for.
So it looks like it's not enough to just set that flag. We have to
actually revert all the commits in this area as broken.
Which is sad, but since they clearly *are* broken and don't honor the
flag that was there explicitly to avoid this problem and make it easy
to test reverting it, I'm really pissed off. The WHOLE POINT of that
flag was to give people an option to say "use the old resource
allocation order because the new one doesn't work for me".
So at this point the only question is whether I should just revert the
whole effing lot, or whether there are patches to fix the code to
honor the "allocate from bottom" bit and then just set it by default
again.
Bjorn? Preferences?
Linus
--
In 20-20 hindsight, I should have made that switch affect more things. I tried to do what you asked; I obviously just didn't do enough, and Let me identify the set of reversion candidates and the consequences, and then we can figure out whether it's better to retreat or push forward. Bjorn --
I'm not sure there's value in having both "pci=nocrs" and
"resource_alloc_from_bottom" flags. What if I made it so
we allocate top-down if and only if we're using _CRS?
Then old boxes where we don't look at _CRS would use the same
bottom-up behavior they always have (this is another major
screwup in the current tree -- we currently do top-down on
these boxes for no good reason).
And on new boxes, we default to using _CRS and allocating
top-down, but if that doesn't work, we can use "pci=nocrs"
and go back to the old "ignore _CRS and allocate bottom-up"
behavior.
Here's a sample patch which I will test and document if you
think it's a reasonable approach:
commit e39250083dbdd0b42e886aa858d0ffbc86e618c4
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date: Tue Dec 14 22:10:16 2010 -0700
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 21c6746..85268f8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -769,7 +769,6 @@ void __init setup_arch(char **cmdline_p)
x86_init.oem.arch_setup();
- resource_alloc_from_bottom = 0;
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
setup_memory_map();
parse_setup_data();
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 0972315..ca770fc 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -68,6 +68,9 @@ void __init pci_acpi_crs_quirks(void)
"if necessary, use \"pci=%s\" and report a bug\n",
pci_use_crs ? "Using" : "Ignoring",
pci_use_crs ? "nocrs" : "use_crs");
+
+ if (pci_use_crs)
+ resource_alloc_from_top = 1;
}
static acpi_status
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index c4bb261..57a1374 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -65,9 +65,16 @@ pcibios_align_resource(void *data, const struct resource *res,
resource_size_t size, resource_size_t align)
{
struct pci_dev *dev = data;
- resource_size_t start = round_down(res->end - size + 1, ...I've only proposed one BIOS-specific quirk, which is the one for the
nx6325 unreported regions, and I identified things we do differently
than Windows that explain why we see the problem and Windows doesn't.
If we stop opening windows on subtractive-decode bridges, we don't
need that quirk to avoid the hang. We will still need it if we
want to use more than 40-odd MB of space on a PC Card.
I'm pretty confident that if we could find PC Cards that require
enough space, they wouldn't work under Windows either.
I don't know whether the other patches in this series make you
unhappy. I'm really not happy with how I implemented the avoidance
of ACPI devices when doing PCI allocation, but I do think we need
to avoid them *somehow*, and I was looking for a minimal quick
fix at this point in the cycle.
Avoiding ACPI devices fixes the Matthew's 2530p problem. We can
also avoid that particular problem with the simple PCIBIOS_MAX_MEM_32
change you proposed. However, avoiding ACPI devices fixes other
problems at the same time, such as this one:
https://bugzilla.kernel.org/show_bug.cgi?id=23802
where we put the intel-gtt "flush page" on top of an ACPI TPM
device.
Bjorn
--
So the "avoid ACPI devices" part makes sense, and doesn't involved quirks, so I don't hate it at all the same way I hated the HP quirk. However, I hate how it makes the allocation logic opaque. You can no longer tell from the regular non-debug dmesg and the /proc/iomem _why_ something got allocated the way it did, because there are hidden rules. That makes things awkward, methinks. Also, quite frankly, I wonder what happens after release when somebody shows another machine that simply stopped working because the allocation strategy didn't work for it. The hw coverage that -rc6 gets is tiny compared to a real release. IOW, what's the long-term strategy for this? The only sane long-term strategy I can see is the one we have _always_ done, which is to try to populate the memory resource tree with what simply matches reality. The whole "ok, we know the hardware better than the BIOS does" is a _stable_ strategy. In contrast, the things you propose are NOT stable strategies, they all depend on basically "we match windows exactly and/or trust ACPI". Both of which are *known* to be failing models. That's why I'm somewhat upset. Your whole strategy seems to depend on a known broken model. We _know_ ACPI tables are crap much of the time. So we know that "avoiding ACPI resources" is inevitably insufficient. And that's why I hate the "switch everything around" model. Yes, we have a known way to fix things up - namely to actually detect the hardware itself properly when firmware inevitably screws up - but the very act of switching things around will pretty much guarantee that all our years of effort is of dubious value, and we'll end up finding other laptops that used to work and no longer does. Only switching around when _CRS is used is possible, and shouldn't cause any regressions if we continue to default to not using _CRS. But you want to switch that default around at some point, don't you? At which point we'll be up sh*t creek again. See what I'm saying? Which all makes ...
ACPI devices tend to be at high addresses, so allocating top-down is definitely more dangerous unless we explicitly avoid them. I should have realized that and done something like patches 1-3 of this series before the top-down patches. Doing it bottom-up would very likely work better than the "top-down without avoiding ACPI regions" model we currently have, at least in the short term. We *would* have to do something to avoid E820 reservations to fix this: https://bugzilla.kernel.org/show_bug.cgi?id=16228, but that's doable. So here's my proposal for .37: - Keep the current state of _CRS enabled by default (for 2008 and newer machines). - Allocate bottom-up always - Avoid E820 reservations That should fix all the regressions I'm aware of. I'll work on the patches this afternoon. Bjorn --
At the same time, I would like to see a few things done as a matter of course: a) reserve the top 2 MiB of the 32-bit address space. There *will* be ROM at the top of the 32-bit address space; it's a fact of the architecture, and on at least older systems it was common to have a shadow 1 MiB below. b) we may want to consider doing special things in the 0xFExxxxxx memory range, which is used by the CPU-APIC-MSI system in recent processors. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
Sounds sane to me. And yes, the fact that the BIOS tends to allocate
things top-down probably does mean that the whole bottom-up approach
tends to be the safer model.
Linus
--
Remove PNP resources from "available space" used by allocate_resource(). This would be better done by putting the PNP resources directly in the resource maps (iomem_resource, etc), but there are some issues that keep us from doing that yet. This patch keeps us from handing out PNP device address space to other callers of allocate_resource(). Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23332 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23542 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23802 Reported-by: Matthew Garrett <mjg@redhat.com> Reported-by: Dan Williams <dcbw@redhat.com> Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> --- arch/x86/kernel/resource.c | 78 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 78 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c index 5dd6473..1daee92 100644 --- a/arch/x86/kernel/resource.c +++ b/arch/x86/kernel/resource.c @@ -1,6 +1,80 @@ #include <linux/ioport.h> +#include <linux/pnp.h> #include <asm/e820.h> +#ifdef CONFIG_PNP +static bool resource_conflict(struct resource *res, resource_size_t start, + resource_size_t end) +{ + /* + * Return true if and only if "res" conflicts with the [start-end] + * range. + */ + return res->start <= end && res->end >= start; +} + +static void resource_split(struct resource *res, resource_size_t start, + resource_size_t end, struct resource *low, + struct resource *high) +{ + /* + * If "res" conflicts with [start-end], split "res" into the + * part below "start" (low) and the part above "end" (high), + * either (or both) of which may be empty. + * + * If there's no conflict, return the entire "res" as "low". + */ + *low = *res; + low->start = res->start; + low->end = res->start - 1; /* default to empty (size 0) */ + + *high = *res; + high->end = res->end; + high->start = res->end + 1; /* default to ...
Crap, here's the cover letter I meant to include. [RFC] PCI, PNP, resources: effectively reserve ACPI device resources 2.6.37-rc includes some patches of mine to change the way we allocate address space for devices, and these caused some regressions. Most device resource allocation is done by the BIOS, but Linux does allocate space if the BIOS doesn't do it (e.g., for CardBus or other hotplug devices) or if the BIOS did it wrong (e.g., it put a PCI device outside the host bridge windows). My original 2.6.37-rc patches make Linux allocate space from the top down instead of from the bottom up. We did this because that's what Windows does, and doing it the same way helps us avoid some BIOS defects, such as https://bugzilla.kernel.org/show_bug.cgi?id=16228 . The regressions caused by these changes include: https://bugzilla.kernel.org/show_bug.cgi?id=23332 Here, we allocated a 64MB bridge window for a subtractive-decode bridge leading to a CardBus controller, and the window conflicts with devices not reported by the BIOS. Windows doesn't see this problem because it relies on the fact that the bridge is in subtractive-decode mode and doesn't program the window at all. https://bugzilla.kernel.org/show_bug.cgi?id=23542 https://bugzilla.kernel.org/show_bug.cgi?id=23802 In both cases, Linux allocated space that conflicts with an ACPI device because it ignores resource usage reported by ACPI. Windows avoids this problem because it pays attention to the ACPI _CRS methods that report device resource usage. I think the best way to address these problems would be to put ACPI device resources in the resource maps, just like we do for PCI devices. We tried this a couple years ago (http://lkml.org/lkml/2007/11/1/214) and tripped over some subtle issues. We should try again, but not at this stage of the -rc. Instead, these patches take the approach of adding a filter in the allocate_resource() path so each arch can prevent allocation of ...
On Wed, 08 Dec 2010 14:36:06 -0700 Hm, this is bigger than the simple change of just avoiding the high 2M; Linus have you checked it out yet? It's nicer than simply adjusting PCIBIOS_MAX_MEM since it will affect all resource callers rather than just PCI, but it's definitely bigger. If you want just the simple change for 2.6.37 I can push that, but we'll need to get a tested-by from Matthew: diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index ca0437c..aef9f77 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -141,7 +141,7 @@ void dma32_reserve_bootmem(void); /* generic pci stuff */ #include <asm-generic/pci.h> -#define PCIBIOS_MAX_MEM_32 0xffffffff +#define PCIBIOS_MAX_MEM_32 0xfff00000 #ifdef CONFIG_NUMA /* Returns the node based on pci bus */ and I'll queue up this set for 2.6.38. Thanks, -- Jesse Barnes, Intel Open Source Technology Center --
[Actually cc'ing Matthew this time] On Fri, 10 Dec 2010 12:30:08 -0800 -- Jesse Barnes, Intel Open Source Technology Center --
Dan Williams also reproduced the problem on a 2530p and tested this series (though I think he actually tested a backport in Fedora): https://bugzilla.kernel.org/show_bug.cgi?id=23542#c7 https://bugzilla.kernel.org/show_bug.cgi?id=23542#c23 The simple PCIBIOS_MAX_MEM_32 change below will fix the 2530p problem, but it won't help fix the nx6325 problem Rafael reported; details at: https://bugzilla.kernel.org/show_bug.cgi?id=23332#c20 Rafael did confirm on #acpi Wednesday that the full series fixed his nx6325. --
On Fri, 10 Dec 2010 14:07:24 -0700 Thanks, I'll add Dan and Rafael's tested-bys to the patches (they're already in my for-linus tree). Unless Linus has a problem with them I'll send them over to him this weekend or Monday. -- Jesse Barnes, Intel Open Source Technology Center --
See my other email I just sent out.
I really am not going to take some totally new experimental and hacky
major PCI resource management thing this late in the -rc game. No way,
no how.
If the top-down allocator is causing regressions that cannot be fixed
by _simple_ patches, we're simply going to have to undo it. What's the
advantage of top-down? None. Not if we then need all this crap, which
we could as easily do on top of the bottom-up one WITHOUT any
regressions.
Why isn't anybody else questioning the whole basic premise here?
Linus
--
On Sat, 11 Dec 2010 19:34:05 -0800 Questioning the whole premise is fine, but so far we've gone in (or at least think we're going in) a consistent direction: behave like Windows on platforms designed for Windows to avoid bugs that Windows doesn't hit and enable all the same devices Windows allows. But yes, I really don't like the nx6325 patch either; there's obviously something we're still missing that's preventing us from doing the right thing on that platform. Quirking it isn't a good long term answer. -- Jesse Barnes, Intel Open Source Technology Center --
OK, so I guess the best thing we can do for 2.6.37 is to revert 1af3c2e (x86: allocate space within a region top-down), right? Rafael --
That's a possibility, but I'm not sure it's the best. It would avoid the nx6325 landmine, but we could easily step on a different one on other machines. If Windows uses the end of available space first, that's the best-tested territory, and I think it's better to try to stay in it rather revert 1af3c2e and start mapping different territory that really isn't tested by Windows at all. On the nx6325, we stray out of that tested territory because Linux opens windows on subtractive-decode bridges and allocates more space to CardBus bridges. I think it'd be fairly simple to leave subtractive-decode bridges alone, and that might make the nx6325 work well enough for .37, even without the quirk. This feels like a change that might be the right thing to do in general. Using subtractive rather than positive decode would give up a little performance, but I doubt it's important There's another regression, https://bugzilla.kernel.org/show_bug.cgi?id=23132, that seems similar, though we haven't debugged it yet. It also has to do with assigning resources that aren't strictly needed to bridges, so there might be more changes we should make to that strategy. Bjorn --
Is there a patch for that that people could try? The regression needs to be addressed one way or another: either by improving the "compatible with Windows" approach to actually work (without quirks), or by reverting to the tested-for-years Linux solution. ( What we always try to avoid is trading in a set of regressions for another set of regressions. That way lies madness, it can not result in reliable, provable progress. ) Thanks, Ingo --
Yes. https://bugzilla.kernel.org/show_bug.cgi?id=23332#c22 It works for me, but it's such a new idea to me that I'm not quite used to it and don't know how confident to be yet. Bjorn --
