Re: More E820 brokenness

Previous thread: [PATCH] clockevents: fix bogus next_event reset for oneshot broadcast devices by Thomas Gleixner on Thursday, September 27, 2007 - 3:17 pm. (1 message)

Next thread: [PATCH] removes array_size duplicates by roel on Thursday, September 27, 2007 - 3:51 pm. (6 messages)
From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 3:17 pm

As luck would have it, it's not just an obscure Geode system which has a
broken E820 implementation.  Today I received a bug report about a Dell
system (XPS M1330) with broken E820.

Unfortunately, the workaround for the Geode breaks this system, because
x86-64 doesn't fall back to the e801/88 information like the i386 kernel
does.

I wonder if the relevant people could test out this patch to see how it
works on their respective system.  This patch reverts to 2.6.23-rc8
behaviour of simply truncating the map, but still makes e801/88 info
available to the kernel; this hopefully should match 2.6.22 behaviour.

I want to emphasize that this is seriously broken.  Using a partial e820
map could have disastrous results, since the kernel will have partial
memory map information and not know about reserved areas, etc.  Part of
me feels that the right thing to do is what the current git kernel does
-- either fall back to e801, or stop and error.

(Andi: I would particularly appreciate your opinion on this issue.)

	-hpa
From: Jordan Crouse
Date: Thursday, September 27, 2007 - 3:33 pm

Breaks on the Geode - original behavior.

I think that having boot_prams.e820_entries != 0 makes the kernel

I'm inclined to agree.  

Jordan


-

From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 3:47 pm

Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
because this, to the best of my reading, mimics the 2.6.22 behavior
exactly.  DID IT REALLY, and/or did you make any kind of configuration

Arguably the right thing to do is to find the responsible BIOS engineer
and shoot them, but that's hard to do without robotics.

	-hpa



-

From: Jordan Crouse
Date: Thursday, September 27, 2007 - 4:15 pm

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-

From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 4:22 pm

Was this a stock 2.6.22 kernel, or might it have been modified?

There is, of course, also the possibility that triggering the BIOS bug
in your case depends on some delicate combination of input state.

	-hpa
-

From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 4:27 pm

Oh bugger, looks like this one might be genuinely my fault after all.
The ID check in the new code is buggy.

Can you please test this revised patch out (against current -git)?

	-hpa



From: Jordan Crouse
Date: Thursday, September 27, 2007 - 4:34 pm

That looks the same as the previous patch you sent?

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-

From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 4:36 pm

Sorry, this is the right one...

	-hpa
From: Jordan Crouse
Date: Thursday, September 27, 2007 - 4:54 pm

Worked, but that just raises more questions.  Why didn't more x86 boxes
break or, alternatively, why did a new version of the BIOS fix the problem? 
I guess we shouldn't look a gift horse in the mouth. Or something.

Thanks very much for your help.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-

From: H. Peter Anvin
Date: Thursday, September 27, 2007 - 5:12 pm

Why didn't more x86 boxes break... well, it's pretty natural an
implementation of the BIOS to not clobber registers that aren't outputs.
 Arguably the BIOSes that do are still buggy, since there isn't a
well-defined calling sequence for the BIOS and the convention that has
evolved is "don't clobber anything unless it's an output."

It's still wrong, however, especially since it means omitting the *real*
SMAP check.

	-hpa
-

From: Rafael J. Wysocki
Date: Friday, September 28, 2007 - 6:05 am

I'd like to update http://bugzilla.kernel.org/show_bug.cgi?id=9086 with correct
information.

Should I add a pointer to the patch from your previous message to it?

Greetings,
Rafael
-

Previous thread: [PATCH] clockevents: fix bogus next_event reset for oneshot broadcast devices by Thomas Gleixner on Thursday, September 27, 2007 - 3:17 pm. (1 message)

Next thread: [PATCH] removes array_size duplicates by roel on Thursday, September 27, 2007 - 3:51 pm. (6 messages)