Re: [PATCH -v2 2/2] x86,pci, acpi: Inherent BUSY flag when setup_resource for root bus

Previous thread: [PATCH] lis3: add support for HP ProBook 432x/442x/452x/522x by Takashi Iwai on Wednesday, April 7, 2010 - 1:52 pm. (2 messages)

Next thread: [patch,rfc v2] ext3/4: enhance fsync performance when using cfq by Jeff Moyer on Wednesday, April 7, 2010 - 2:18 pm. (19 messages)
From: Bjorn Helgaas
Date: Wednesday, April 7, 2010 - 2:06 pm

Currently, we only reserve the legacy VGA area [mem 0xa0000-0xbffff] on
x86_32.  But this legacy area is also used on x86_64, so this patch
reserves it there, too.

If we don't reserve it, we may mistakenly move a PCI device to that area,
as we did here:

  pci_root PNP0A03:00: host bridge window [mem 0xff980800-0xff980bff]
  pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
  pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff]
  pci 0000:00:1f.2: BAR 5: assigned [mem 0x000a0000-0x000a07ff]

as reported by Andy Isaacson at http://lkml.org/lkml/2010/4/6/375

I think the fact that the BAR is not within a host bridge window is a
BIOS defect, and it's now more visible because we have "pci=use_crs" as
the default.  Using "pci=nocrs" is a workaround, because then we won't
attempt to move the device.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---

 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/head32.c     |    3 +--
 arch/x86/kernel/setup.c      |   25 ++++++++-----------------
 3 files changed, 9 insertions(+), 20 deletions(-)


diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 86b1506..f4c0fe4 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(void) { }
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index b2e2460..b6de8f8 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -20,9 +20,8 @@
 
 static void __init i386_default_early_setup(void)
 {
-	/* Initilize 32bit specific setup functions */
+	/* Initialize 32bit specific setup functions */
 	x86_init.resources.probe_roms = probe_roms;
-	x86_init.resources.reserve_resources = i386_reserve_resources;
 ...
From: Yinghai
Date: Wednesday, April 7, 2010 - 3:45 pm

that doesn't look right.

It seem another thread, erission has one model without VGA, and they use that area for other device MMIO.

current for 64bit, We remove [0xa0000, 0x100000) from e820 map if those area is E820_RAM.

in e820_reserve_resources(), kernel will reserve range < 1M according to e820 map.
that is before pci BAR is claimed.

or you can add
boot_params.screen_info.orig_video_isVGA == 1
or double check scan pci tree to see if video is there or not

YH
--

From: Bjorn Helgaas
Date: Wednesday, April 7, 2010 - 4:05 pm

I'm sorry, I can't understand what you're saying.

It seems like you're saying we already have a mechanism to keep us
from assigning that VGA range to another device, but obviously it's
not working.

Maybe it will be clearer if you propose a different patch that prevents
us from assigning 0xa0000 to the USB controller.

Bjorn
--

From: Yinghai
Date: Wednesday, April 7, 2010 - 4:22 pm

for 64 bit, you may check boot_params.screen_info.orig_video_isVGA to see if you need to reserve that VGA range.

when the system only have one peer root bus, can you skip the _CRS for it?

YH
--

From: Bjorn Helgaas
Date: Wednesday, April 7, 2010 - 8:24 pm

If you're saying that on x86_64, you have a reliable way to determine
whether to reserve the legacy VGA MMIO area, I hope you'll provide a
patch, because I don't know about this sort of x86_32 vs x86_64
difference.

I actually did propose doing something in pci_setup_device(), similar to
what we already do for legacy IDE resources, but HPA thought it should
be done in the arch code instead, again for reasons I don't completely

That's ugly.  When we discover the first host bridge, we have no idea
whether there will be more.  I don't want to mess around with trying to
count the number of bridges, then go back and add them in another pass.

I think it's fairly clear from http://lkml.org/lkml/2010/4/7/284 that
Windows is paying attention the _CRS, even though there's only one host
bridge.  So we ought to be able to make it work, too.

This is not a problem with pci=use_crs.  This is only a problem because
we don't reserve the VGA area when we should.  That's the problem we
should fix.

Bjorn


--

From: H. Peter Anvin
Date: Wednesday, April 7, 2010 - 9:42 pm

"Non-PCI devices" is hard to understand?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Bjorn Helgaas
Date: Thursday, April 8, 2010 - 6:04 am

I understand that part, and the patch I posted is for the arch code, as
you suggested.

I just thought you were suggesting an x86 change *instead* of something
in PCI, even though it seems like the PCI spec does mention that devices
with VGA class code respond at 0xa0000.  Or maybe you were just
suggesting that we should do something in *both* PCI and in the arch
code?

Of course, the PCI part might be complicated by the VGA arbiter,
although I would think that if the system contains at least one VGA
device, reserving the 0xa0000 region once would solve 99% of the
problem.  If the arbiter changes the actual device that responds there,
the reservation will be for the wrong device, but we still want to avoid
putting anything else there.

Honestly, I don't know enough about x86_32 vs x86_64 to understand
Yinghai's objection.  Is there some platform architecture difference
that means x86_64 can reliably determine whether a non-PCI VGA device is
present, when x86_32 can't?  Unless there's a real difference, we ought
to handle them both the same, as my patch does.

Yinghai mentioned a box with no VGA and another device using 0xa0000.
Unless there's some x86_64-specific spec or convention that addresses
this, maybe we could handle that by (1) doing the legacy reservation as
in this patch, and (2) special-casing the PCI device setup so that if we
find a BAR set to that area, we undo the legacy reservation.  Or maybe
we just leave the legacy reservation unused and relocate the device that
BIOS left there.

Bjorn


--

From: Bjorn Helgaas
Date: Friday, April 9, 2010 - 9:04 am

Why is this different for 64-bit vs 32-bit?  Can you point me to any
references where I can learn about this?

Bjorn
--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 9:44 am

It's not.  Any differences between 32 and 64 bits in this area are just
plain wrong.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Yinghai
Date: Friday, April 9, 2010 - 10:22 am

then please check this one

update e820 at first, and later put them resource tree, but give pci BAR chance to claim
the range, before assign unassigned resources.

in case some system doesn't have VGA, and use [0xa0000, ..] for mmio of some other pci devices.

Yinghai

Subject: [RFC PATCH] x86: reserve [0xa0000, 0x100000] in e820

And put those range in resource tree after pci BAR claim those range

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/e820.c       |    2 +-
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 4 files changed, 2 insertions(+), 21 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_se
 {
 	/* Initilize 32bit specific setup functions */
 	x86_init.resources.probe_roms = ...
From: H. Peter Anvin
Date: Friday, April 9, 2010 - 11:23 am

[Adding Guenter Roeck as he was commenting on something writing to the
VGA area...]


Do we have any evidence of any such system?

Either which way, the patch looks saner...

	-hpa
--

From: Guenter Roeck
Date: Friday, April 9, 2010 - 11:55 am

Yes, we have a system which does not have VGA and uses the VGA memory
space for another device. I don't know exactly what for, but I can find
out if it is relevant.

Guenter


--

From: Yinghai
Date: Friday, April 9, 2010 - 12:55 pm

update version for test

Subject: [PATCH] x86: reserve [0xa0000, 0x100000] in e820

Update e820 at first, and later put them resource tree.
But give pci BAR chance to claim the range, before assign unassigned resources.

In case some system doesn't have VGA, and use [0xa0000, ..] for mmio of some
other pci devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/e820.c       |    2 +-
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 4 files changed, 2 insertions(+), 21 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static ...
From: Andy Isaacson
Date: Friday, April 9, 2010 - 3:21 pm

This doesn't help, my AHCI still fails to initialize in the same way.

-andy
--

From: Yinghai Lu
Date: Friday, April 9, 2010 - 3:27 pm

bootlog?

what is the address used for AHCI?

YH
--

From: Andy Isaacson
Date: Friday, April 9, 2010 - 3:35 pm

Full bootlog attached to
https://bugzilla.kernel.org/show_bug.cgi?id=15744

Snippets:


[    0.000000]  BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 00000001bc000000 (usable)
--
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a301 base: 0xfed00000
[    0.000000] SMP: Allowing 8 CPUs, 4 hotplug CPUs
[    0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000bfe01000 - 00000000bfe02000
[    0.000000] PM: Registered nosave memory: 00000000bfe02000 - 00000000bfe53000
[    0.000000] PM: Registered nosave memory: 00000000bfe53000 - 00000000bfe54000
--
[    0.000000] PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
[    0.000000] PM: Registered nosave memory: 00000000fec00000 - 00000000fed00000
[    0.000000] PM: Registered nosave memory: 00000000fed00000 - 00000000fed20000
[    0.000000] PM: Registered nosave memory: 00000000fed20000 - 00000000feda0000
[    0.000000] PM: Registered nosave memory: 00000000feda0000 - 00000000fee00000
[    0.000000] PM: Registered nosave memory: 00000000fee00000 - 00000000fef00000
[    0.000000] PM: Registered nosave memory: 00000000fef00000 - 00000000ffb00000
[    0.000000] PM: Registered nosave memory: 00000000ffb00000 - 0000000100000000
--
[    0.000000] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:2304
[ ...
From: H. Peter Anvin
Date: Friday, April 9, 2010 - 3:44 pm

Looks like this is something the kernel is assigning to it.  It *does*

... but that doesn't seem to keep the PCI code from assigning anything
there.  This is a Very Bad thing in general... if we're assigning
devices to areas marked reserved, we have a huge problem.

	-hpa
--

From: Yinghai
Date: Friday, April 9, 2010 - 3:54 pm

_CRS report those range are used devices under the peer root bus.

and in e820_reserve_resources_late, we are using insert_resource_expand_to_fit()
to register E820_RESERVED region.

could be that insert_resource_expand_to_fit doesn't work in that case.

like in the tree when we have
[    0.704003] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.705002] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
[    0.706002] pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]

and then insert_resource...
[0x9fec00, 0x100000)

it is not inserted to tree properly

YH
--

From: Yinghai
Date: Friday, April 9, 2010 - 4:01 pm

Please check

Subject: [PATCH] x86: reserve [0xa0000, 0x100000] in e820

Update e820 at first, and later put them resource tree.

-V2: reserved that early, no PCI BAR can use it, force them to get new one

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/e820.c       |    2 +-
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 4 files changed, 2 insertions(+), 21 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_se
 {
 	/* Initilize 32bit specific setup functions */
 	x86_init.resources.probe_roms = probe_roms;
-	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 
 	reserve_ebda_region();
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -693,7 +693,7 @@ static void __init trim_bios_range(void)
 	 * area (640->1Mb) as ram even though it is not.
 	 * take them out.
 	 */
-	e820_remove_range(BIOS_BEGIN, ...
From: Andy Isaacson
Date: Friday, April 9, 2010 - 5:00 pm

Still no go.

[    0.000000] Linux version 2.6.34-rc3-00411-g9e969ac (andy@farthing) (gcc version 4.3.3 (Debian 4.3.3-5) ) #11 SMP Fri Apr 9 16:27:51 PDT 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc3-00411-g9e969ac root=UUID=a2359eda-9295-451c-924f-c181c6f49d0d ro console=tty0 console=ttyS0,115200
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe01c00 (usable)
[    0.000000]  BIOS-e820: 00000000bfe01c00 - 00000000bfe53c00 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 00000001bc000000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.5 present.
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1bc000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] last_pfn = 0xbfe01 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
[    0.000000] init_memory_mapping: 0000000000000000-00000000bfe01000
[    0.000000] init_memory_mapping: 0000000100000000-00000001bc000000
[    0.000000] RAMDISK: 37c8e000 - 37ff0000
[    0.000000] ACPI: RSDP 00000000000febf0 00024 (v02 DELL  )
[    0.000000] ACPI: XSDT 00000000000fcea4 0006C (v01 DELL    B9K     00000015 ASL  00000061)
[    0.000000] ACPI: FACP ...
From: Yinghai
Date: Friday, April 9, 2010 - 6:10 pm

in addition to -v2 patch

please apply this patch too

also please boot with "debug" in boot command line.

Thanks

Yinghai

[PATCH] x86,acpi: use request_resource instead of instead of insert_resource

So make pci root resouce from _CRS honor the one We reserve in e820 below 1M

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -168,7 +168,7 @@ setup_resource(struct acpi_resource *acp
 		return AE_OK;
 	}
 
-	conflict = insert_resource_conflict(root, res);
+	conflict = request_resource_conflict(root, res);
 	if (conflict) {
 		dev_err(&info->bridge->dev,
 			"address space collision: host bridge window %pR "
--

From: Andy Isaacson
Date: Friday, April 9, 2010 - 6:43 pm

That works.  dmesg.gz attached (I'm pretty sure I'm skating close to
vger's size limit without gz).  "debug" didn't seem to make much
difference?

-andy
From: Yinghai
Date: Friday, April 9, 2010 - 6:48 pm

[    0.636805] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.654025] pci_root PNP0A03:00: address space collision: host bridge window [io  0x0000-0x0cf7] conflicts with dma1 [io  0x0000-0x001f]
[    0.654370] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
[    0.654586] pci_root PNP0A03:00: address space collision: host bridge window [mem 0x000a0000-0x000bffff] conflicts with reserved [mem 0x000a0000-0x000fffff]
[    0.654933] pci_root PNP0A03:00: address space collision: host bridge window [mem 0x000c0000-0x000effff] conflicts with reserved [mem 0x000a0000-0x000fffff]
[    0.655002] pci_root PNP0A03:00: address space collision: host bridge window [mem 0x000f0000-0x000fffff] conflicts with reserved [mem 0x000a0000-0x000fffff]
[    0.655367] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfebfffff]
[    0.655706] pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
[    0.656003] pci_root PNP0A03:00: host bridge window [mem 0xff980800-0xff980bff]
[    0.656360] pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
[    0.657002] pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]

still not optimal.

YH
--

From: Andy Isaacson
Date: Friday, April 9, 2010 - 6:57 pm

Well, thanks for looking at it; if there are any more patches I can try,
let me know (but I may not get to it until Monday morning, since I can't
reliably reboot this machine remotely).

Is this a BIOS error?  I am currently running A04; there's an A10
available, but I didn't want to perturb the situation during the
debugging cycle without guidance.

-andy
--

From: Yinghai
Date: Friday, April 9, 2010 - 7:46 pm

thanks.

please use this one replace second patch
[PATCH] x86,pci, acpi: Inherent BUSY flag when setup_resource for root bus

So make pci root resource from _CRS honor the range We reserve in e820 below 1M

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -175,6 +175,10 @@ setup_resource(struct acpi_resource *acp
 			"conflicts with %s %pR\n",
 			res, conflict->name, conflict);
 	} else {
+		/* In case it falls in big reserved region */
+		if (res->parent->flags & IORESOURCE_BUSY)
+			res->flags |= IORESOURCE_BUSY;
+
 		pci_bus_add_resource(info->bus, res, 0);
 		info->res_num++;
 		if (addr.translation_offset)
--

From: Andy Isaacson
Date: Monday, April 12, 2010 - 11:54 am

That doesn't work:

[    0.000000] Linux version 2.6.34-rc3-00412-gef61642 (andy@farthing) (gcc version 4.3.3 (Debian 4.3.3-5) ) #13 SMP Mon Apr 12 11:12:43 PDT 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc3-00412-gef61642 root=UUID=a2359eda-9295-451c-924f-c181c6f49d0d ro console=tty0 console=ttyS0,115200
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe01c00 (usable)
[    0.000000]  BIOS-e820: 00000000bfe01c00 - 00000000bfe53c00 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 00000001bc000000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.5 present.
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1bc000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] last_pfn = 0xbfe01 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
[    0.000000] init_memory_mapping: 0000000000000000-00000000bfe01000
[    0.000000] init_memory_mapping: 0000000100000000-00000001bc000000
[    0.000000] RAMDISK: 37c8e000 - 37ff0000
[    0.000000] ACPI: RSDP 00000000000febf0 00024 (v02 DELL  )
[    0.000000] ACPI: XSDT 00000000000fcea4 0006C (v01 DELL    B9K     00000015 ASL  00000061)
[    0.000000] ACPI: FACP ...
From: Yinghai
Date: Monday, April 12, 2010 - 12:34 pm

please check

[PATCH] x86,pci, acpi: Inherent BUSY flag when setup_resource for root bus

So make pci root resource from _CRS honor the range We reserve in e820 below 1M
then  will not assign them to unsigned pci BAR

-v2: let pci_bus_alloc_resource() honor IORESOURCE_BUSY
     driver can not use those range, so skip it

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |    4 ++++
 drivers/pci/bus.c   |    4 ++++
 2 files changed, 8 insertions(+)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -175,6 +175,10 @@ setup_resource(struct acpi_resource *acp
 			"conflicts with %s %pR\n",
 			res, conflict->name, conflict);
 	} else {
+		/* In case it falls in big reserved region */
+		if (res->parent->flags & IORESOURCE_BUSY)
+			res->flags |= IORESOURCE_BUSY;
+
 		pci_bus_add_resource(info->bus, res, 0);
 		info->res_num++;
 		if (addr.translation_offset)
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -103,6 +103,10 @@ pci_bus_alloc_resource(struct pci_bus *b
 		if (!r)
 			continue;
 
+		/* Driver can not reserve it later, so don't use it */
+		if (r->flags & IORESOURCE_BUSY)
+			continue;
+
 		/* type_mask must match */
 		if ((res->flags ^ r->flags) & type_mask)
 			continue;
--

From: Andy Isaacson
Date: Monday, April 12, 2010 - 12:48 pm

What patches should I have applied?  I'm testing with d620a7cf from
linus' tree, plus your "reserve [0xa0000,0x1000000] in e820", plus this
patch.  Please let me know if I should be testing something else (or
just push a git head for me to pull, that would remove all the
uncertainty).

-andy
--

From: yinghai.lu@oracle.com
Date: Monday, April 12, 2010 - 12:55 pm

Yes
--

From: Andy Isaacson
Date: Monday, April 12, 2010 - 1:02 pm

Success:


[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.34-rc3-00412-gbb33c1c (andy@farthing) (gcc version 4.3.3 (Debian 4.3.3-5) ) #14 SMP Mon Apr 12 12:52:26 PDT 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc3-00412-gbb33c1c root=UUID=a2359eda-9295-451c-924f-c181c6f49d0d ro
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe01c00 (usable)
[    0.000000]  BIOS-e820: 00000000bfe01c00 - 00000000bfe53c00 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 00000001bc000000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.5 present.
[    0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1bc000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-D3FFF write-protect
[    0.000000]   D4000-EFFFF uncachable
[    0.000000]   F0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000 mask 000000000 write-back
[    0.000000]   1 base ...
From: Yinghai
Date: Monday, April 12, 2010 - 3:32 pm

Update e820 at first, and later put them resource tree.

-V2: reserved that early, no PCI BAR can use it, force them to get new one

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 3 files changed, 1 insertion(+), 20 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_se
 {
 	/* Initilize 32bit specific setup functions */
 	x86_init.resources.probe_roms = probe_roms;
-	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 
 	reserve_ebda_region();
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -693,7 +693,7 @@ static void __init trim_bios_range(void)
 	 * area (640->1Mb) as ram even though it is not.
 	 * take them out.
 	 */
-	e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, 1);
+	e820_add_region(BIOS_BEGIN, ...
From: Bjorn Helgaas
Date: Tuesday, April 13, 2010 - 2:02 pm

I like the fact that this makes x86_64 and x86_32 handle the legacy VGA
framebuffer the same way.

What about arch/x86/kernel/probe_roms_32.c?  That deals with expansion
ROMs in the 0xc0000-0xfffff range, including the VGA ROM.  We only build

Let me see if I understand this.  On Andy's machine, the e820 map
doesn't mention the 0xa0000-0xf0000 range at all:

  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)

e820_reserve_resources() inserts resources for some e820 entries (those
that start before 0x100000 or are not E820_RESERVED).  Andy's machine
didn't supply any e820 entries that cover 0xa0000-0xf0000, so we didn't
insert any resources there, and PCI assumed that range was available.

This patch adds the [0xa0000-0x100000] range as E820_RESERVED.  Since
that starts below 0x100000, e820_reserve_resources() will insert a
corresponding resource marked as BUSY.

Then the second patch prevents PCI from using that BUSY region to
allocate resources to devices.

Is my understanding correct?

I don't feel like I know enough about x86 architecture to ack this
patch, but I don't object to it.


--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 2:09 pm

[Note: it's worth noting that the e820 spec specifically says that e820

I guess the real question (which I haven't looked at myself) is if the
E820_RESERVED -> BUSY will cause an explicitly assigned BAR from being
moved.  That's bad, not so much for this particular range, but from BARs
which may be assigned by SMM.  Hacking that up in a simulator
(Qemu/Bochs) and testing it is probably on the to do list...

	-hpa
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:11 pm

no, if some device BAR fall in that range, it should still use that range, and will not be relocated.

will update the change log.

Thanks

Yinghai
--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 2:18 pm

Good, that's what we want.

ROM probing in this region should really be possible to skip, since the
only thing safe is to treat the whole region as special.  This is not in
any way 32-bit-specific.

	-hpa
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:22 pm

Update e820 at first, and later put them resource tree.

Reserved that early, will not be allocated to unassigned PCI BAR

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 3 files changed, 1 insertion(+), 20 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_se
 {
 	/* Initilize 32bit specific setup functions */
 	x86_init.resources.probe_roms = probe_roms;
-	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 
 	reserve_ebda_region();
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -693,7 +693,7 @@ static void __init trim_bios_range(void)
 	 * area (640->1Mb) as ram even though it is not.
 	 * take them out.
 	 */
-	e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, 1);
+	e820_add_region(BIOS_BEGIN, BIOS_END - ...
From: H. Peter Anvin
Date: Friday, April 23, 2010 - 4:30 pm

Hi Jesse,

I am considering putting this patch plus the two followup patches in my
next set of fixes to Linus.  However, the two followup patches are
really more PCI-related than x86-related, so I would appreciate your OK.

I'm using this version (-v3) because the latter cleanup (removing
probe_rom_32.c for example) do not seem to be .34 material.

For the followup patches:

http://marc.info/?i=4BC4E0E9.7000203@oracle.com
http://marc.info/?i=4BC4E115.90805@oracle.com

	-hpa


--

From: Jesse Barnes
Date: Monday, April 26, 2010 - 8:35 am

On Fri, 23 Apr 2010 16:30:33 -0700

I was going to congratulate Yinghai on writing good changelogs, but
then I saw Bjorn actually wrote these ones for him. :)  At any rate they
looks good:

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>



-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:23 pm

If a host bridge window falls inside a region the architecture has
marked busy, the window should inherit the busy flag so we don't
try to assign that region to a device.

[commit log is from Bjorn]

Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -175,6 +175,10 @@ setup_resource(struct acpi_resource *acp
 			"conflicts with %s %pR\n",
 			res, conflict->name, conflict);
 	} else {
+		/* In case it falls in big reserved region */
+		if (res->parent->flags & IORESOURCE_BUSY)
+			res->flags |= IORESOURCE_BUSY;
+
 		pci_bus_add_resource(info->bus, res, 0);
 		info->res_num++;
 		if (addr.translation_offset)
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:24 pm

Drivers typically use pci_request_regions() to reserve the resources
  they use, but that fails if the resource is already busy.  Therefore,
  we should ignore busy resources when we're assigning resources to a
  device.

[commit log is from Bjorn]

Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -103,6 +103,10 @@ pci_bus_alloc_resource(struct pci_bus *b
 		if (!r)
 			continue;
 
+		/* Driver can not reserve it later, so don't use it */
+		if (r->flags & IORESOURCE_BUSY)
+			continue;
+
 		/* type_mask must match */
 		if ((res->flags ^ r->flags) & type_mask)
 			continue;
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:42 pm

the driver for that device later can not use pci_request_region(). because that region is BUSY already.

YH

--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 2:58 pm

That's not good (in general - for devices in this particular range it's
not such a big deal, but it is potentially really bad for devices marked
reserved for them not to be moved.)

We have talked about a need to resolve this before.

	-hpa

--

From: Yinghai
Date: Tuesday, April 13, 2010 - 3:29 pm

current code for mmio that is just below 4g, if some PCI BAR use that range, and those range is falling into E820_RESERVED,

those range still can be claimed, but driver can not use pci_request_region() later.

So We still
1. rely that BIOS does not reserve the [0xa0000, 0xe0000)
2. kernel only reserve the range when we make sure these is legacy device on that range.

YH
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 3:39 pm

should be

--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 3:41 pm

This really isn't sufficient.  There are systems in the field which
marks a memory range reserved in E820 because it a device pointed there,
and it doesn't want that device moved because it is used by an SMM handler.

This was reported quite a while ago (like two years.)  I can dig up the
thread if it matters.

	-hpa

--

From: Yinghai
Date: Tuesday, April 13, 2010 - 3:58 pm

Are you sure? what is BAR range? greater than 1M ?

e820_reserve_resources() will make that range to be reserved and BUSY in resource tree.
and if driver for that device want to call pci_request_region, it will get failure...

YH
--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 4:02 pm

Yes, > 1 MB in that case, I'm fairly sure.

	-hpa

--

From: Yinghai
Date: Tuesday, April 13, 2010 - 4:03 pm

that is ok. actually that is handled by e820_reserve_resource_late(), and it will not put BUSY on the entry at all.

YH
--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 4:07 pm

OK... why is that handled differently?

	-hpa

--

From: Yinghai
Date: Tuesday, April 13, 2010 - 4:09 pm

about one year ago, Linus made that change to use insert_resource_expand_to_fit() to honor PCI device BAR than E820_RESERVED.

YH
--

From: Yinghai
Date: Tuesday, April 20, 2010 - 10:33 pm

Update e820 at first, and later put them resource tree.
Reserved that early, will not be allocated to unassigned PCI BAR

v3: remove probe_roms() that is not needed, because whole range is reserved
	 already

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Isaacson <adi@hexapodia.org>

---
 arch/x86/include/asm/setup.h    |    3 
 arch/x86/include/asm/x86_init.h |    4 
 arch/x86/kernel/Makefile        |    1 
 arch/x86/kernel/head32.c        |    2 
 arch/x86/kernel/mrst.c          |    2 
 arch/x86/kernel/probe_roms_32.c |  166 ----------------------------------------
 arch/x86/kernel/setup.c         |   25 ------
 arch/x86/kernel/x86_init.c      |    2 
 8 files changed, 8 insertions(+), 197 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -43,8 +43,8 @@ static inline void visws_early_detect(vo
 
 extern unsigned long saved_video_mode;
 
+void trim_bios_range(void);
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
@@ -96,7 +96,6 @@ void *extend_brk(size_t size, size_t ali
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
-extern void probe_roms(void);
 
 #else
 void __init x86_64_start_kernel(char *real_mode);
Index: linux-2.6/arch/x86/include/asm/x86_init.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/x86_init.h
+++ linux-2.6/arch/x86/include/asm/x86_init.h
@@ -32,14 +32,14 @@ struct x86_init_mpparse {
 
 /**
  * struct x86_init_resources - platform specific resource related ops
- * @probe_roms:			probe BIOS roms
+ * @trim_bios_range:		trim BIOS related range in E820
  * @reserve_resources:		reserve ...
From: Yinghai
Date: Tuesday, April 20, 2010 - 10:34 pm

It will cover the whole region to BUSY, except that some regions that have
children under them.

those children normally is PCI bar but it is falling into E820_RESERVED.
We can not put BUSY on them, otherwise driver can not use pci_request_region()
later

/proc/iomem will have
00010000-00094fff : System RAM
00095000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
  000a0000-000bffff : reserved
000c0000-000cffff : reserved
000d0000-000dffff : PCI Bus 0000:00
  000d0000-000dffff : reserved
000e0000-000fffff : reserved

Tested-by: Guenter Roeck <guenter.roeck@ericsson.com>
Tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/e820.c |   10 +++++++---
 include/linux/ioport.h |    3 +++
 kernel/resource.c      |   24 +++++++++++++++++++-----
 3 files changed, 29 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
@@ -1135,8 +1135,12 @@ void __init e820_reserve_resources_late(
 
 	res = e820_res;
 	for (i = 0; i < e820.nr_map; i++) {
-		if (!res->parent && res->end)
-			insert_resource_expand_to_fit(&iomem_resource, res);
+		if (!res->parent && res->end) {
+			if (res->start < (1ULL<<20))
+				reserve_region_with_split_check_child(&iomem_resource, res->start, res->end, res->name);
+			else
+				insert_resource_expand_to_fit(&iomem_resource, res);
+		}
 		res++;
 	}
 
Index: linux-2.6/include/linux/ioport.h
===================================================================
--- ...
From: Yinghai
Date: Tuesday, April 20, 2010 - 10:35 pm

Don't clear root bus resource too early, until We can make sure _CRS works

also restore it, if all _CRS get rejected because of conflicts

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -196,9 +196,6 @@ get_current_resources(struct acpi_device
 	struct pci_root_info info;
 	size_t size;
 
-	if (pci_use_crs)
-		pci_bus_remove_resources(bus);
-
 	info.bridge = device;
 	info.bus = bus;
 	info.res_num = 0;
@@ -217,10 +214,20 @@ get_current_resources(struct acpi_device
 		goto name_alloc_fail;
 	sprintf(info.name, "PCI Bus %04x:%02x", domain, busnum);
 
+	/* Only clear that when _CRS works for sure*/
+	if (pci_use_crs)
+		pci_bus_remove_resources(bus);
+
 	info.res_num = 0;
 	acpi_walk_resources(device->handle, METHOD_NAME__CRS, setup_resource,
 				&info);
 
+	if (pci_use_crs && !info.res_num) {
+		/* Restore default one */
+		bus->resource[0] = &ioport_resource;
+		bus->resource[1] = &iomem_resource;
+	}
+
 	return;
 
 name_alloc_fail:
--

From: Bjorn Helgaas
Date: Wednesday, April 21, 2010 - 8:21 am

Please observe English conventions like capitalizing the first word
of a sentence (and not things like "We" in the middle) and using

If you put a comment here, please at least add a space before the

This is ugly because it just repeats this code from pci_create_bus(),
and there's no indication either here or there that they are connected.

Admittedly, I think it's also sort of ugly that pci_bus_remove_resources()
exists at all -- I'd rather have some sort of hook so we could set the
bus resources correctly the first time.

Maybe you could at least add a pci_bus_set_default_resources() that
could be called both here and from pci_create_bus().

Why are you doing this patch?  Did you see a machine where the host
bridge was left with no resources because of _CRS issues?  If so,
this patch feels like a band-aid.  I'd rather investigate the issue
directly, because that would probably be a Linux problem we could fix.

Also, if there *is* a reported problem, you should include a link to
the bugzilla or email thread.



--

From: Yinghai Lu
Date: Wednesday, April 21, 2010 - 9:45 am

No, just code review.

YH
--

From: Bjorn Helgaas
Date: Wednesday, April 21, 2010 - 9:59 am

In that case, I don't think this patch is a good idea.  Assume we have
a Linux problem with _CRS parsing, and as a result, we don't find any
host bridge resources.  This patch will cause us to silently revert to
the default resources, so we lose the opportunity to debug and fix the
Linux problem.

Even worse, if there's a *BIOS* problem with _CRS, this patch will hide
it and make it harder to diagnose.

Bjorn
--

From: H. Peter Anvin
Date: Wednesday, April 21, 2010 - 3:33 pm

Hi Bjorn,

Do you have opinions on patches 1-2 of the series?

I'm getting concerned about how the size of the patchset has grown, and
we're past -rc5 already... but it is a regression so we can't just defer
it to .35.

	-hpa
--

From: Bjorn Helgaas
Date: Wednesday, April 21, 2010 - 4:04 pm

Part 1: the essential part of this seems to be the trim_bios_range()
change, and that part is not too big.  In v4, Yinghai also removed
probe_roms_32.c.  That sounds like the right thing to do, but I'd
rather have that in separate patch so it doesn't obfuscate the other
change, and I don't know whether it *has* to be done for .34; maybe
it could be deferred.

Part 2: IMHO, we're putting way too much crap in kernel/resource.c.
A name like "reserve_region_with_split_check_child()" is a pretty
good clue that we've lost our way somewhere.  But that's mostly a
cosmetic thing, and the end result does seem to be something that
fixes the current regression.

So I guess that in spite of my issues with the implementation, I'm
still OK with the concept.

Bjorn
--

From: H. Peter Anvin
Date: Wednesday, April 21, 2010 - 4:10 pm

It's not just a good clue we have lost our way, it's also completely
impossible for anyone but Yinghai to divine what the intended semantics
are supposed to be.  This *greatly* concerns me, especially given
previous track record.

Even the checkin comment is almost unparsable, which makes it very
likely that someone is going to trip up on some of this in the future.
I really would like to get a better description.

The use of a string match in:

+       if (check_child && !conflict->child && strstr(conflict->name,
"PCI Bus"))
^^^^^^^^^

... screams "wrong! ugly! bad!" in my opinion.  I utterly fail to see
how that could be acceptable under any circumstances.  I thought that
had been flagged earlier in the conversation, but it is apparently still

OK, but what is "the implementation" and what is "the concept" here?

	-hap

--

From: Yinghai
Date: Wednesday, April 21, 2010 - 4:43 pm

then use -v3 please


I don't know.

insert_resource_expand_to_fit() is added by Linus.



the string checking is to make sure pci device that is hooked into bus0 directly, but pci bar is falling into
0xa0000 - 0x100000.
so can not put "reserved" holder under them.

YH
--

From: H. Peter Anvin
Date: Wednesday, April 21, 2010 - 5:02 pm

That makes a lot of sense for 2.6.34.  I agree with moving the trimming
into subsystems, but I think it's .35 material at this point.


It makes me extremely concerned, because such hacks tend to be extremely
vulnerable.  Strings are designed primarily for human consumption, and
"find string inside another string" is *very* much so.  As such, I
really would like to understand that there isn't any more sensible way,
such as a flag, that can be used to accomplish the objective.

	-hpa
--

From: Yinghai
Date: Wednesday, April 21, 2010 - 5:06 pm

pass function pointer to additional checking?

YH
--

From: Andy Isaacson
Date: Wednesday, April 21, 2010 - 12:31 pm

Test booted this patch series on the problematic t3400, seems to work
fine.  dmesg attached to bug 15744.

-andy
--

From: Bjorn Helgaas
Date: Friday, April 23, 2010 - 4:05 pm

Thanks for testing (again).  I'm not confident that this series is
going to be successful, so I started looking for other approaches.

I can't reproduce the exact problem you're seeing, but in my
kludged-up attempt, the patch below is enough to keep us from
assigning the space below 1MB to a device.

Would you guys (Andy & Andy, what a coincidence :-)) mind giving
it a try?  This is intended to work on top of current upstream,
with no other patches required.

Bjorn


commit 7fb707eb97fdf6dc4fa4b127f127f8d00223afc7
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Fri Apr 23 15:22:10 2010 -0600

x86/PCI: never allocate PCI MMIO resources below BIOS_END
    
When we move a PCI device or assign resources to a device not configured
by the BIOS, we want to avoid the BIOS region below 1MB.  Note that if the
BIOS places devices below 1MB, we leave them there.
    
See https://bugzilla.kernel.org/show_bug.cgi?id=15744
and https://bugzilla.kernel.org/show_bug.cgi?id=15841
    
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 46fd43f..97da2ba 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -72,6 +72,9 @@ pcibios_align_resource(void *data, const struct resource *res,
 			return start;
 		if (start & 0x300)
 			start = (start + 0x3ff) & ~0x3ff;
+	} else if (res->flags & IORESOURCE_MEM) {
+		if (start < BIOS_END)
+			start = BIOS_END;
 	}
 	return start;
 }
--

From: H. Peter Anvin
Date: Friday, April 23, 2010 - 4:44 pm

This certainly wins from a simplicity standpoint!

	-hpa
--

From: Yinghai Lu
Date: Friday, April 23, 2010 - 5:36 pm

indeed.


--

From: R. Andrew Bailey
Date: Monday, April 26, 2010 - 5:50 am

Good news- that solved it. I tried Yinghai's patches saturday to no
avail (sorry it took me so long to get back to you, I was about 5 bios
revisions behind on this machine and wanted to update it before I
tried any more tests).

  .andy
--

From: Bjorn Helgaas
Date: Monday, April 26, 2010 - 8:40 am

Great, thanks for testing this!  The only problem here is that we
changed two things at once -- the BIOS and the patch, and we need
to figure out which change fixed the problem.

I want Linux to work correctly even on the old BIOS, on the theory
that "if Windows works, Linux should work, too."  Changing a BIOS
is relatively risky, and it's not something I want users to have to
diagnose and deal with.

If we're lucky, the kernel without the patch will still fail on the
updated BIOS.  If the pre-patch kernel fails and the post-patch kernel
works, and you can attach the entire dmesg log of the post-patch kernel
to the bugzilla, we should be able to see Linux making more sensible
BAR assignments when working around the BIOS bug.  Then we can be pretty
confident that my patch fixed the problem.

If the pre-patch kernel works on the updated BIOS, that means one of
the BIOS updates fixed the BIOS bug, and we didn't actually exercise
my patch at all.  If that's the case, we'll have to wait for a report
from the other Andy.  (Or you could temporarily down-grade your BIOS
to the original version you had.  But I don't know whether that's even
possible...  it probably depends on the BIOS update tools.)

Thanks again for all your help.  All progress in Linux depends on early
adopters like you who are willing to test kernels and help work through
issues :-)

Bjorn
--

From: Andy Isaacson
Date: Monday, April 26, 2010 - 11:34 am

From: Jesse Barnes
Date: Monday, April 26, 2010 - 12:31 pm

On Mon, 26 Apr 2010 11:34:36 -0700

Great, thanks for testing.  Applied this one to my for-linus tree.  I
still thing Yinghai's patches should go in as well (marking regions as
busy seems like good housekeeping), but with this fixed they're a better
fit for -next.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Bjorn Helgaas
Date: Monday, April 26, 2010 - 1:27 pm

I'm a little concerned that those patches are a sledgehammer approach.
Previously, IORESOURCE_BUSY has basically been used for mutual exclusion
between drivers that would otherwise claim the same resource.  It hasn't
been used to guide resource assignment in the PCI/PNP/etc core.  Maybe
it's a good idea to also use IORESOURCE_BUSY there, but I'm not sure.
Right now it feels like undesirable overloading to me.

I think it also leads to at least one problem: Guenter's machine has no
VGA but has a PCI device that lives at 0xa0000.  The driver for that
device won't be able to request that region if the arch code has marked
it busy.

Bjorn
--

From: Jesse Barnes
Date: Monday, April 26, 2010 - 1:37 pm

On Mon, 26 Apr 2010 14:27:56 -0600

I guess that's true, removing those regions from the pool entirely
might be better?  Or some other, clear way of expressing that the
regions aren't available to drivers.  Maybe we need a new IO resource

Ah good point, so we'll want another approach at any rate.  Yinghai?

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Yinghai
Date: Monday, April 26, 2010 - 2:07 pm

With current linus's tree, there is some difference between mmio > 1M and MMIO < 1M.

Actually it does not care about E820_RESERVED for > 1M range, because it doesn't have _BUSY.

For < 1M range, 
a. MMIO is not with E820_RESERVED, the device driver will work.
b. MMIO is with E820_RESERVED, the device driver can not reserve it's mmio later. because that MMIO could have parent resource with "reserved" name and _BUSY flag

My patch -v3 will break a.
but my patch -v4 patch will make it work.

but looks -v4 is too tricky.

maybe we just need remove change like following.

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 7bca3c6..9849824 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}

May need Linus to ack it.

Those strange devices could be some kind of virtual debug devices. and they like to use those low mmio range.

YH
--

From: H. Peter Anvin
Date: Monday, April 26, 2010 - 2:19 pm

Let's distinguish between what we need for .34 -- we need something
minimal and as soon as possible at this stage -- and concentrate on
getting the right thing to happen going forward.

       -hpa


--

From: H. Peter Anvin
Date: Monday, April 26, 2010 - 2:12 pm

What we need is to keep track of the areas available for address space
allocation by dynamically addressed devices, as distinct from address
space that is in use by a kernel-known device.  There is an in-between,
which one can call "here there be dragons" space, which should never be
used for dynamic device allocation, but if a platform device or
pre-assigned device uses that space then it should be allowed to be
allocated.

In the case of x86, anything that is E820_RESERVED, *or* in the legacy
region (below 1 MB) and is not RAM, is "here there be dragons" space.

    -hpa

--

From: Jesse Barnes
Date: Monday, April 26, 2010 - 2:25 pm

On Mon, 26 Apr 2010 14:12:35 -0700

Agreed.  The trickier part is handling any platform devices that
request_resource against that space.  But maybe we don't need to do
anything special; just making sure we avoid it in the PCI "BIOS" code
as Bjorn did may be sufficient.

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: H. Peter Anvin
Date: Monday, April 26, 2010 - 2:44 pm

Why is that hard?  If a platform device does a request_resource against
that space, it's a request for specific address space and it should be
granted.

        -hpa

--

From: Jesse Barnes
Date: Monday, April 26, 2010 - 2:53 pm

On Mon, 26 Apr 2010 14:44:50 -0700

I was thinking if we made it a special resource type we'd have to
change any platform drivers to use it; i.e. it wouldn't be
IORESOURCE_MEM or IORESOURCE_IO but IORESOURCE_DRAGONS.  That way it
wouldn't be part of the normal resource space.

But that's definitely overkill.  I think Bjorn's fix is sufficient.

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Yinghai Lu
Date: Monday, April 26, 2010 - 2:59 pm

the two regressions from the reporters:

BIOS put 0xa0000-0xb0000, 0xc0000- 0xd0000 with E820_RESERVED.
BIOS ACPI _CRS keep 0xa0000-0xb0000, 0xc0000-0xd0000 as part resources
for peer root bus: BUS 0.

kernel insert 0xa0000-0xb0000 into resource tree with _BUSY in
e820_reserve_resources() at first.
last pci bus scan code, will insert 0xa0000-0xb0000, and it is under
previous reserved entry.

later pci_assign_unassign code, will use bus 0 resources directly, and
don't care if the parent's have _BUSY bit.

solutions:
1. mark _BUSY under bus 0 resource:  ==> -v3
2. split e820 reserve entries to small pieces to fit into bus 0
resources, so will have holder under bus0 resources. it will prevent
those range to be used.
   -v4
3. reject any dynamically allocation under 1M.  ==> Bjorn's new patch.

till now, driver can reserve resource under 1M, only when those range is
not in e820.

case A:
bus 0: --- bus X --- device Y
if the BIOS only assign range to to BUS X bridge with 0xB0000, and
device Y is not assigned.  then with Bojorn's patch, device Y can not
get right resource allocated on first try.

my -v4 can handle that case.

YH
--

From: Yinghai
Date: Monday, April 26, 2010 - 3:04 pm

It will cover the whole region to BUSY, except that some regions that have
children under them.

those children normally is PCI bar but it is falling into E820_RESERVED.
We can not put BUSY on them, otherwise driver can not use pci_request_region()
later

/proc/iomem will have
00010000-00094fff : System RAM
00095000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
  000a0000-000bffff : reserved
000c0000-000cffff : reserved
000d0000-000dffff : PCI Bus 0000:00
  000d0000-000dffff : reserved
000e0000-000fffff : reserved

v5: Add function pointer to put string comparing with caller

Tested-by: Guenter Roeck <guenter.roeck@ericsson.com>
Tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/e820.c |   18 +++++++++++++++---
 include/linux/ioport.h |    3 +++
 kernel/resource.c      |   29 ++++++++++++++++++++++++-----
 3 files changed, 42 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
@@ -1128,6 +1128,14 @@ static unsigned long ram_alignment(resou
 
 #define MAX_RESOURCE_SIZE ((resource_size_t)-1)
 
+static int __init check_func(struct resource *cf)
+{
+	if (strstr(cf->name, "PCI Bus"))
+		return 1;
+
+        return 0;
+}
+
 void __init e820_reserve_resources_late(void)
 {
 	int i;
@@ -1135,8 +1143,12 @@ void __init e820_reserve_resources_late(
 
 	res = e820_res;
 	for (i = 0; i < e820.nr_map; i++) {
-		if (!res->parent && ...
From: Yinghai
Date: Monday, April 26, 2010 - 3:04 pm

for memo, sending out -v5 
--

From: Yinghai
Date: Monday, April 26, 2010 - 3:04 pm

That is not needed, because whole range is reserved already

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/setup.h    |    2 
 arch/x86/include/asm/x86_init.h |    4 
 arch/x86/kernel/Makefile        |    1 
 arch/x86/kernel/head32.c        |    1 
 arch/x86/kernel/mrst.c          |    2 
 arch/x86/kernel/probe_roms_32.c |  166 ----------------------------------------
 arch/x86/kernel/setup.c         |    6 -
 arch/x86/kernel/x86_init.c      |    2 
 8 files changed, 7 insertions(+), 177 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -43,6 +43,7 @@ static inline void visws_early_detect(vo
 
 extern unsigned long saved_video_mode;
 
+void trim_bios_range(void);
 extern void reserve_standard_io_resources(void);
 extern void setup_default_timer_irq(void);
 
@@ -95,7 +96,6 @@ void *extend_brk(size_t size, size_t ali
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
-extern void probe_roms(void);
 
 #else
 void __init x86_64_start_kernel(char *real_mode);
Index: linux-2.6/arch/x86/include/asm/x86_init.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/x86_init.h
+++ linux-2.6/arch/x86/include/asm/x86_init.h
@@ -32,14 +32,14 @@ struct x86_init_mpparse {
 
 /**
  * struct x86_init_resources - platform specific resource related ops
- * @probe_roms:			probe BIOS roms
+ * @trim_bios_range:		trim BIOS related range in E820
  * @reserve_resources:		reserve the standard resources for the
  *				platform
  * @memory_setup:		platform specific memory setup
  *
  */
 struct x86_init_resources {
-	void (*probe_roms)(void);
+	void (*trim_bios_range)(void);
 	void (*reserve_resources)(void);
 	char *(*memory_setup)(void);
 };
Index: ...
From: Yinghai
Date: Monday, April 26, 2010 - 3:04 pm

Update e820 at first, and later put them resource tree.

Reserved that early, will not be allocated to unassigned PCI BAR

-v5: same as -v3

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/head32.c     |    1 -
 arch/x86/kernel/setup.c      |   19 +------------------
 3 files changed, 1 insertion(+), 20 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(vo
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
Index: linux-2.6/arch/x86/kernel/head32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head32.c
+++ linux-2.6/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_se
 {
 	/* Initilize 32bit specific setup functions */
 	x86_init.resources.probe_roms = probe_roms;
-	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 
 	reserve_ebda_region();
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -693,7 +693,7 @@ static void __init trim_bios_range(void)
 	 * area (640->1Mb) as ram even though it is not.
 	 * take them out.
 	 */
-	e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, ...
From: jacob pan
Date: Monday, April 26, 2010 - 2:44 pm

Moorestown has a similar situation where one of the PCI devices have a BAR
below 1MB. Moorestown also has the option not to have VGA.
--

From: H. Peter Anvin
Date: Tuesday, April 13, 2010 - 4:03 pm

Let me dig up the thread later... this was so long ago my memory might
be rusty.

	-hpa

--

From: Yinghai
Date: Tuesday, April 13, 2010 - 5:57 pm

this one should make both cases work.

Andy, can you check this one together with
v3: x86: Reserve [0xa0000, 0x100000] in e820 map

Guenter, can you try the two patches on the system with special device?

Thanks

Yinghai

--------

Subject: [PATCH] x86, resource: Add reserve_region_with_split_check_child()

It will cover the whole region to BUSY, except that some regions that have
children under them.

those children normally is PCI bar but it is falling into E820_RESERVED.
We can not put BUSY on them, otherwise driver can not use pci_request_region()
later

/proc/iomem will have
00010000-00094fff : System RAM
00095000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
  000a0000-000bffff : reserved
000c0000-000cffff : reserved
000d0000-000dffff : PCI Bus 0000:00
  000d0000-000dffff : reserved
000e0000-000fffff : reserved

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/e820.c |   10 +++++++---
 include/linux/ioport.h |    3 +++
 kernel/resource.c      |   24 +++++++++++++++++++-----
 3 files changed, 29 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+		if (e820.map[i].type != E820_RESERVED) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
@@ -1135,8 +1135,12 @@ void __init e820_reserve_resources_late(
 
 	res = e820_res;
 	for (i = 0; i < e820.nr_map; i++) {
-		if (!res->parent && res->end)
-			insert_resource_expand_to_fit(&iomem_resource, res);
+		if (!res->parent && res->end) {
+			if (res->start < (1ULL<<20)) {
+				reserve_region_with_split_check_child(&iomem_resource, res->start, ...
From: Bjorn Helgaas
Date: Wednesday, April 14, 2010 - 9:55 am

I don't like adding all these special-purpose resource interfaces, e.g.,
insert_resource_expand_to_fit(), reserve_region_with_split(),
reserve_region_with_split_check_child(), etc.  They tend to be only
used by one caller, which is a clue to me that we're doing something
wrong.

Sometimes we'd like to print debug information in them (as we do in
insert_resource_expand_to_fit()), but we don't have enough context,
so the message isn't very meaningful.

I think it's better to have more generic interfaces that allow us to
implement this sort of functionality where it's used and where we
have the context to print useful messages.  For example, could the
reserve_region_with_split_check_child() functionality be implemented

This is kind of gross, too.  Checking the name of a resource for "PCI Bus"?



--

From: Yinghai Lu
Date: Wednesday, April 14, 2010 - 10:21 am

yes, in case some system have device on BUS 0 directly. but have device
have pci BAR below 1M and fall into E820_RESERVED.

YH
--

From: Andy Isaacson
Date: Wednesday, April 14, 2010 - 12:25 pm

I've booted successfully with the patch below, on top of the patch in
Message-ID: <4BC4E09F.9000608@oracle.com>

I'll attach the dmesg to bug 15744.

--

From: Yinghai
Date: Wednesday, April 14, 2010 - 12:27 pm

thanks. please put /proc/iomem there too.

Yinghai
--

From: Andy Isaacson
Date: Wednesday, April 14, 2010 - 12:43 pm

I attached it to the bug.

-andy
--

From: Yinghai
Date: Wednesday, April 14, 2010 - 12:49 pm

looks good.

000a0000-000bffff : PCI Bus 0000:00
  000a0000-000bffff : reserved
000c0000-000effff : PCI Bus 0000:00
  000c0000-000effff : reserved
000f0000-000fffff : PCI Bus 0000:00
  000f0000-000fffff : reserved

Thanks

Yinghai
--

From: Yinghai
Date: Tuesday, April 13, 2010 - 2:08 pm

looks that file could be dropped.



Thanks

Yinghai
--

From: Yinghai
Date: Monday, April 12, 2010 - 3:33 pm

So make pci root resource from _CRS honor the range We reserve in e820 below 1M
then  will not assign them to unsigned pci BAR

-v2: let pci_bus_alloc_resource() honor IORESOURCE_BUSY
     driver can not use those range, so skip it

Tested-by: Andy Isaacson <adi@hexapodia.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/acpi.c |    4 ++++
 drivers/pci/bus.c   |    4 ++++
 2 files changed, 8 insertions(+)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -175,6 +175,10 @@ setup_resource(struct acpi_resource *acp
 			"conflicts with %s %pR\n",
 			res, conflict->name, conflict);
 	} else {
+		/* In case it falls in big reserved region */
+		if (res->parent->flags & IORESOURCE_BUSY)
+			res->flags |= IORESOURCE_BUSY;
+
 		pci_bus_add_resource(info->bus, res, 0);
 		info->res_num++;
 		if (addr.translation_offset)
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -103,6 +103,10 @@ pci_bus_alloc_resource(struct pci_bus *b
 		if (!r)
 			continue;
 
+		/* Driver can not reserve it later, so don't use it */
+		if (r->flags & IORESOURCE_BUSY)
+			continue;
+
 		/* type_mask must match */
 		if ((res->flags ^ r->flags) & type_mask)
 			continue;
--

From: Jesse Barnes
Date: Monday, April 12, 2010 - 3:44 pm

On Mon, 12 Apr 2010 15:33:19 -0700

I'd like Bjorn to ack this one before it gets applied.

-- 
Jesse Barnes, Intel Open Source Technology Center
--

From: Bjorn Helgaas
Date: Tuesday, April 13, 2010 - 2:02 pm

This -v2 text doesn't belong in the changelog.  If you need it, it
should be in a cover message so it doesn't clutter the permanent git

I think this makes sense, but I think it should be in two separate
patches.  The first would be the pci_bus_alloc_resource() change, which
is of interest to all architectures, with a changelog like this:

  PCI: don't allocate from a BUSY bus resource

  Drivers typically use pci_request_regions() to reserve the resources
  they use, but that fails if the resource is already busy.  Therefore,
  we should ignore busy resources when we're assigning resources to a
  device.

and the second would be:

  x86/PCI: host bridge windows inherit BUSY flag from parent

  If a host bridge window falls inside a region the architecture has
  marked busy, the window should inherit the busy flag so we don't
  try to assign that region to a device.

Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>


--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 1:11 pm

Quite.  We really need to consider if that kind of devices should be
supported.

	-hpa
--

From: Guenter Roeck
Date: Friday, April 9, 2010 - 1:31 pm

It is used for a custom FPGA implementing a watchdog timer and some
other functionality.

This is not a matter of supporting such a device in Linux, it is a
matter of explicitly excluding non-VGA devices from using VGA memory
space.

If that is the decision, it might be worthwhile mentioning it somewhere
in the documentation, to prevent others from ever using that memory
space in an embedded system (and to save them the time needed to find
out about it and to identify and implement a workaround).

Guenter


--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 1:44 pm

"Don't use legacy fixed-function addresses for nonstandard purposes."

There, there is your documentation.

	-hpa
--

From: Guenter Roeck
Date: Friday, April 9, 2010 - 2:01 pm

Fair and good enough, as long as it is going to show up somewhere.

Guenter


--

From: Alan Cox
Date: Friday, April 9, 2010 - 3:42 pm

It's probably a good idea to only reserve it if the space is actually
being used. There are a variety of legitimate reasons to use that space
for other things on embedded x86 boards.

They are only "legacy fixed function" if you have a PCI bus...
--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 3:42 pm

No they're not.  The 0xa0000...0xbffff range has been a legacy video
area since the very first PC (although the first PC only used
0xb0000..0xbffff, 0xa0000..0xbffff was declared reserved at that time.)

I'm wondering what those legitimate reasons are.  This is particularly
so since it affects our ability to deal with very early errors, long
before we have enumerated anything.  At this point we can at least lay
down bytes in the video area and hope the user can see them.

	-hpa

--

From: Alan Cox
Date: Friday, April 9, 2010 - 3:54 pm

Depending on your definition of "PC". Quite a few early MSDOS systems had
video elsewhere. Some embedded systems without video use the space for
other stuff. Lots of ISA 386/486 PCs had cards borrowing the unused bits

Thats why you have the bios equipment byte and video queries.
--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 3:55 pm

With "early MSDOS" I presume you mean pre-386 (early on there were
computer makers who thought the MS-DOS universe would work like the CP/M

Yes of course; just because we don't want people to do the wrong thing
doesn't necessarily mean we won't try to be nice about it, as long as it
doesn't cause excessive pains elsewhere.

	-hpa
--

From: Alan Cox
Date: Friday, April 9, 2010 - 3:51 pm

IMHO its wrong for both

You can only reserve the region in question if you actually have a VGA
device and mappings present.

It's wrong for non PCI systems
It's wrong for legacy ISA systems with monochrome video/no video
It's wrong for several embedded platforms.
It's wrong if PCI isn't your root bridge

Basically the reservation is the wrong way to fix it. A much saner way to
fix it would be to simply keep a list of address ranges not to use for
PCI device relocation. For I/O ports of course we just fix up the PCI
resources of the device to list them as we discover it (IDE legacy).

You don't want to put anything at the VGA address that needs assigning an
address. That is *totally* different to the question of whether you want
to believe the space is 'reserved'. If something is at that address then
presumably the firmware knows what it is doing. If a device driver wishes
to reserve that address it's doing so with more information, later in
boot so should be allowed to.

Alan
--

From: H. Peter Anvin
Date: Friday, April 9, 2010 - 3:55 pm

That's what I mean with reserved, I'm using the term in the
E820_RESERVED sense.  As in "don't put anything there without it being
an explicit assignment".

	-hpa
--

From: H. Peter Anvin
Date: Wednesday, April 7, 2010 - 4:43 pm

Removing is not the same thing as reserving!

A range with no type in E820 is address space available for allocation;

Perhaps boot_params.screen_info.orig_video_isVGA != 1 because he's not
booting in a text mode?
	
	-hpa

--

From: Andy Isaacson
Date: Wednesday, April 7, 2010 - 5:19 pm

It's a pretty standard grub2 install (Debian unstable,
1.98~20100128-1.2) with something approaching the default config.  Grub
does seem to put the console in some annoying framebuffer format to draw
the boot menu, but IIRC it switches back to VGA textmode before booting
the kernel.

-andy
--

From: H. Peter Anvin
Date: Wednesday, April 7, 2010 - 10:00 pm

Grub2 gratuitously bypasses the 16-bit initialization code in the Linux
kernel, that could easily be a source of bugs (and with Grub's track
record, probably is.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Andy Isaacson
Date: Thursday, April 8, 2010 - 2:40 pm

This patch doesn't fix the T3400.

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.34-rc3-00300-g29dc2a0 (andy@farthing) (gcc version 4.3.3 (Debian 4.3.3-5) ) #6 SMP Thu Apr 8 14:22:20 PDT 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc3-00300-g29dc2a0 root=UUID=a2359eda-9295-451c-924f-c181c6f49d0d ro console=tty0 console=ttyS0,115200
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe01c00 (usable)
[    0.000000]  BIOS-e820: 00000000bfe01c00 - 00000000bfe53c00 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 00000001bc000000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.5 present.
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1bc000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] last_pfn = 0xbfe01 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
[    0.000000] init_memory_mapping: 0000000000000000-00000000bfe01000
[    0.000000] init_memory_mapping: 0000000100000000-00000001bc000000
[    0.000000] RAMDISK: 37c8e000 - 37ff0000
[    0.000000] ACPI: RSDP 00000000000febf0 00024 (v02 DELL  )
[    0.000000] ...
Previous thread: [PATCH] lis3: add support for HP ProBook 432x/442x/452x/522x by Takashi Iwai on Wednesday, April 7, 2010 - 1:52 pm. (2 messages)

Next thread: [patch,rfc v2] ext3/4: enhance fsync performance when using cfq by Jeff Moyer on Wednesday, April 7, 2010 - 2:18 pm. (19 messages)