Re: [PATCH] x86: split e820 reserved entries record to late v2

Previous thread: [PATCH 2/2] file capabilities: turn on by default by Serge Hallyn on Thursday, August 28, 2008 - 12:54 pm. (4 messages)

Next thread: Too many stupid questions about the block layer by Roni Feldman on Thursday, August 28, 2008 - 1:38 pm. (1 message)
From: Yinghai Lu
Date: Thursday, August 28, 2008 - 1:34 pm

so could let BAR res register at first, or even pnp?

v2: insert e820 reserve resources before pnp_system_init

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/e820.c |   20 ++++++++++++++++++--
 arch/x86/pci/i386.c    |    3 +++
 include/asm-x86/e820.h |    1 +
 3 files changed, 22 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,13 +1271,15 @@ static inline const char *e820_type_to_s
 /*
  * Mark e820 reserved areas as busy for the resource manager.
  */
+struct resource __initdata *e820_res;
 void __init e820_reserve_resources(void)
 {
 	int i;
-	struct resource *res;
 	u64 end;
+	struct resource *res;
 
 	res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+	e820_res = res;
 	for (i = 0; i < e820.nr_map; i++) {
 		end = e820.map[i].addr + e820.map[i].size - 1;
 #ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,8 @@ void __init e820_reserve_resources(void)
 		res->end = end;
 
 		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-		insert_resource(&iomem_resource, res);
+		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
+			insert_resource(&iomem_resource, res);
 		res++;
 	}
 
@@ -1303,6 +1306,19 @@ void __init e820_reserve_resources(void)
 	}
 }
 
+void __init e820_reserve_resources_late(void)
+{
+	int i;
+	struct resource *res;
+
+	res = e820_res;
+	for (i = 0; i < e820.nr_map; i++) {
+		if (e820.map[i].type == E820_RESERVED && res->start >= (1ULL<<20))
+			insert_resource(&iomem_resource, res);
+		res++;
+	}
+}
+
 char *__init default_machine_specific_memory_setup(void)
 {
 	char *who = "BIOS-e820";
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -36,6 +36,7 @@
 #include ...
From: Linus Torvalds
Date: Thursday, August 28, 2008 - 1:40 pm

Looks ok by me. Now it just needs testing ;)

Does it actually fix the HPET regression on that odd machine (without the 
special hacks to recognize HPET explicitly)?

		Linus
--

From: Yinghai Lu
Date: Thursday, August 28, 2008 - 1:52 pm

On Thu, Aug 28, 2008 at 1:40 PM, Linus Torvalds

David,

can you test attached patch?
also you may try to revert the old patch.

YH
From: Ingo Molnar
Date: Thursday, August 28, 2008 - 1:58 pm

great - i've done the revert of a2bd7274b471 and have applied your patch 
and pushed it out into -tip. David, could you please test whether 
tip/master works for you out of box?

	Ingo
--

From: Ingo Molnar
Date: Thursday, August 28, 2008 - 2:16 pm

Here we have the problem of overlap i outlined earlier: if there's a 
partial overlap at this stage (as i think it can happen in the hpet case 
on David's box), we wont insert the E820_RESERVED resource.

The hpet hang will be solved, because we dont reprogram the BAR, but we 
now keep the formerly e820-reserved area as 'free' - which the PCI code 
could allocate new resources into - which could cause other problems 
(hangs, non-working devices, etc.) down the line.

Which most likely wont happen currently in practice (there's enough free 
space elsewhere), but it's still a not truly 'free' area and it would be 
nice to have a complete and correct picture, based on all sources of 
information we have.

	Ingo
--

From: H. Peter Anvin
Date: Thursday, August 28, 2008 - 5:21 pm

This may be a rehash of things previously discussed in this thread; my 
email seems to be a bit flakey to the point that I don't know if I have 
gotten all the messages.

Either way, Ingo mentioned in a private messages four steps, basically 
summarizing the above email:

1 - first we allocate the absolute essentials (e820 RAM and a few low
     RAM specials)
2 - then we register all existing PCI resources - but do not reallocate
     any PCI resources that conflict with existing step #1 resources
3 - then we allocate e820 reserved entries (and whatever special non-PCI
     resources we might know about in general) - these are less trusted
     than any of the existing PCI resources but still it can hurt us
     badly if the PCI code allocates new resources on them.
4 - then the PCI code can run and allocate free resources to all the
     zero, not yet allocated BARs, and can reallocate any resources that
     might conflict with existing [step #1 or step #3] registered
     resources.

I agree that this is almost certainly what we should be doing; there is 
a difference between claiming resources already allocated and allocating 
resources to new address space, in which case we want to be as 
conservative as possible.

The key, of course, is that nothing goes in #1 unless we are bloody 
damned sure that if a BAR points there, that BAR is unconditionally 
broken and pointing into hyperspace.  Something claiming RAM or, say, 
the legacy KBC might fall in this area.

	-hpa
--

Previous thread: [PATCH 2/2] file capabilities: turn on by default by Serge Hallyn on Thursday, August 28, 2008 - 12:54 pm. (4 messages)

Next thread: Too many stupid questions about the block layer by Roni Feldman on Thursday, August 28, 2008 - 1:38 pm. (1 message)