Some BIOSes have been observed to corrupt memory in the low 64k. This patch does two things: - Reserves all memory which does not have to be in that area, to prevent it from being used as general memory by the kernel. Things like the SMP trampoline are still in the memory, however. - Clears the reserved memory so we can observe changes to it. - Adds a function check_for_bios_corruption() which checks and reports on memory becoming unexpectedly non-zero. Currently it's called in the x86 fault handler, and the powermanagement debug output. RFC: What other places should we check for corruption in? [ Alan, Rafał: could you check you see: 1: corruption messages 2: no crashes Thanks -J ] Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Cc: Hugh Dickens <hugh@veritas.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Rafał Miłecki <zajec5@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> --- Documentation/kernel-parameters.txt | 5 ++ arch/x86/Kconfig | 3 + arch/x86/kernel/setup.c | 86 +++++++++++++++++++++++++++++++++++ arch/x86/mm/fault.c | 2 drivers/base/power/main.c | 1 include/linux/kernel.h | 12 ++++ 6 files changed, 109 insertions(+) =================================================================== --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -359,6 +359,11 @@ BayCom Serial Port AX.25 Modem (Half Duplex Mode) Format: <io>,<irq>,<mode> See header of drivers/net/hamradio/baycom_ser_hdx.c. + + bios_corruption_check=0/1 [X86] + Some BIOSes seem to corrupt the first 64k of memory + when doing things like suspend/resume. Setting this + option will scan the memory looking for corruption. boot_delay= Milliseconds to delay each printk during boot. Values larger than 10 seconds (10000) are changed ...
can you please not punish systems without this bios problem? if (!bios_corruption_check) move earlier and bios_corruption_check = 0; BTW: SMI evil damaged that area? YH
Yeah, OK, but I think it should default to ON for now. The problem is
that we had two very different systems (Sony Vaio and Intel desktop)
exhibit the same problem in two different ways. These systems worked
fine until we slightly changed the way that pagetable memory was
allocated. We just don't know how many other systems are doing this
kind of subtle corruption of low memory.
J
--
> >> - Clears the reserved memory so we can observe changes to it. You ought to pick different values on different runs 0x00, 0xFF, 0xAA, So a zillion people should lose a chunk of RAM because of what is probably an obscure bug in a single version of a piece of SMM code on two systems total ? That appears to be totally out of proportion - plus the defaults are *irrelevant* for mass user coverage. If you want large scale coverage ask the Fedora, OpenSuSE and Ubuntu people to turn it on for a testing kernel release and report back. Alan --
T3V0IG9mIGN1cnJlbnQgZGlzY3Vzc2lvbiBJIHRyaWVkIHVzaW5nIHMycmFtIG9uIHBhdGNoZWQg a2VybmVsIChJIGRpZApub3QgdHJ5IHMycmFtIGVhcmxpZXIsIG15IHByb2JsZW0gd2FzICh1bilw bHVnZ2luZyBIRE1JIC0gc29tZSBBQ1BJCmNvZGUgcHJvYmFibHkpLgoKQ29ycnVwdGlvbiBvdXRw dXQgaXMgcXVpdGUgaHVnZSwgSSBhdHRhY2hlZCBpdCB0byBidWcgcmVwb3J0OgpodHRwOi8vYnVn emlsbGEua2VybmVsLm9yZy9zaG93X2J1Zy5jZ2k/aWQ9MTEyMzcKaHR0cDovL2J1Z3ppbGxhLmtl cm5lbC5vcmcvYXR0YWNobWVudC5jZ2k/aWQ9MTc1MjYKCgpJIGRvIG5vdCBrbm93IG11Y2ggYWJv dXQgY3VycmVudCBkaXNjdXNzaW9uIGFuZCBmYWN0IGlmIGNoZWNraW5nIGZvcgpjb3JydXB0aW9u IHNob3VsZCBiZSBlbmFibGVkIGJ5IGRlZmF1bHQuCgpIb3dldmVyLCBwbGVhc2Ugbm90ZSB0aGF0 IGVuZC11c2VyIHdpbGwgbm90IGFibGUgdG8gZGlhZ25vc2UgdGhhdApMaW51eCBkb2Vzbid0IHdv cmsgb24gaGlzIG1hY2hpbmUgYmVjYXVzZSBvZiBsZXNzLW9yLW1vcmUgYnJva2VuIEJJT1MuCkhl IHdpbGwgbm90IGdldCB0byBrbm93IHRoYXQgaGUgbmVlZHMgdG8gZW5hYmxlCmJpb3NfY29ycnVw dGlvbl9jaGVjaz0xIGFuZCBlZGl0IGdydWIgdG8gbWFrZSBpdCBkZWZhdWx0LgoKSSB1bmRlcnN0 YW5kIHRoYXQgZGlzYWJsaW5nIGNodW5rIG9mIFJBTSBvbiBldmVyeSwgZXZlbiBub3QgYWZmZWN0 ZWQsCm1hY2hpbmUgaXMgZmFyIGZyb20gcGVyZmVjdCwgYnV0IHdlIGhhdmUgdG8gcmVtZW1iZXIg YWJvdXQgbm90CmFkdmFuY2VkIGVuZC11c2Vycy4KCi0tIApSYWZhxYIgTWnFgmVja2kK --
Including not advanced end users without the bug ... which is probably all but one or two people. --
Not quite the output we were expecting! I've not got around to trying
it yet, so beware, but I think Jeremy's patch needs the following on top.
Or you may prefer to wait until one of us reports that it is now working
as intended.
--- a/arch/x86/kernel/setup.c=092008-08-29 11:17:16.000000000 +0100
+++ b/arch/x86/kernel/setup.c=092008-08-29 11:19:24.000000000 +0100
@@ -636,11 +636,12 @@ void check_for_bios_corruption(void)
=09=09unsigned long *addr =3D __va(scan_areas[i].addr);
=09=09unsigned long size =3D scan_areas[i].size;
=20
-=09=09for(; size; addr++, size--) {
+=09=09for(; size; addr++, size -=3D sizeof(unsigned long)) {
=09=09=09if (!*addr)
=09=09=09=09continue;
=09=09=09printk(KERN_ERR "Corrupted low memory at %p (%lx phys) =3D %08lx\=
n",
=09=09=09 addr, __pa(addr), *addr);
+=09=09=09*addr =3D 0;
=09=09=09corruption =3D 1;
=09=09}
=09}I tried your patch anyway (after applying Jeremy's patch of course) and it doesn't seem to work. The only output is: scanning 2 areas for BIOS corruption after using s2ram. I do not get any Corrupted low memory at* -- Rafał Miłecki
It seemed to work for me. Did you remember to plug HDMI to trigger the corruption before you used s2ram? [ 71.663828] Back to C! [ 71.663828] Corrupted low memory at ffff8800000083e8 (83e8 phys) = 803c85370cfc0000 [ 71.663828] Corrupted low memory at ffff8800000083f0 (83f0 phys) = 00003000 [ 71.663828] Pid: 7335, comm: pm-suspend Not tainted 2.6.27-rc4-00216-ge1c9c9d-dirty #152 [ 71.663828] [ 71.663828] Call Trace: [ 71.663828] [<ffffffff8020fdd5>] check_for_bios_corruption+0xb5/0xd0 [ 71.663828] [<ffffffff803a3e1a>] pm_dev_dbg+0x2a/0xa0 [ 71.663828] [<ffffffff803a49c2>] dpm_power_up+0x32/0xf0 Alan --
I hope that's what got missed. Here's my version of Jeremy's patch, that I've now tested on my machines, as x86_32 and as x86_64. It addresses none of the points Alan Cox made, and it stays silent for me, even after suspend+resume, unless I actually introduce corruption myself. Omits Jeremy's check in fault.c, but does a check every minute, so should soon detect Rafa=C5=82's HDMI corruption without any need to suspend+resume. Hugh --- 2.6.27-rc5/Documentation/kernel-parameters.txt=092008-08-29 01:02:34.00= 0000000 +0100 +++ linux/Documentation/kernel-parameters.txt=092008-08-29 11:17:16.0000000= 00 +0100 @@ -360,6 +360,11 @@ and is between 256 and 4096 characters.=20 =09=09=09Format: <io>,<irq>,<mode> =09=09=09See header of drivers/net/hamradio/baycom_ser_hdx.c. =20 +=09bios_corruption_check=3D0/1 [X86] +=09=09=09Some BIOSes seem to corrupt the first 64k of memory +=09=09=09when doing things like suspend/resume. Setting this +=09=09=09option will scan the memory looking for corruption. + =09boot_delay=3D=09Milliseconds to delay each printk during boot. =09=09=09Values larger than 10 seconds (10000) are changed to =09=09=09no delay (0). --- 2.6.27-rc5/arch/x86/Kconfig=092008-08-29 01:02:35.000000000 +0100 +++ linux/arch/x86/Kconfig=092008-08-29 11:17:16.000000000 +0100 @@ -201,6 +201,9 @@ config X86_TRAMPOLINE =09depends on X86_SMP || (X86_VOYAGER && SMP) || (64BIT && ACPI_SLEEP) =09default y =20 +config X86_CHECK_BIOS_CORRUPTION + def_bool y + config KTIME_SCALAR =09def_bool X86_32 source "init/Kconfig" --- 2.6.27-rc5/arch/x86/kernel/setup.c=092008-08-29 01:02:35.000000000 +010= 0 +++ linux/arch/x86/kernel/setup.c=092008-08-29 13:50:19.000000000 +0100 @@ -579,6 +579,106 @@ static struct x86_quirks default_x86_qui struct x86_quirks *x86_quirks __initdata =3D &default_x86_quirks; =20 /* + * Some BIOSes seem to corrupt the low 64k of memory during events + * like suspend/resume and unplugging an HDMI cable. Reserve all + * remaining free ...
Ah, stupid mistake, sorry! Indeed, using HDMI and s2ram later works fine: Corrupted low memory at ffff88000000be98 (be98 phys) = b02a000400000000 [<ffffffff8020fc9c>] check_for_bios_corruption+0x94/0xa0 -- Rafał Miłecki
Your periodic test works fine: Corrupted low memory at ffff88000000be9c (be9c phys) = b02a0004 <IRQ> [<ffffffff8020fc9b>] check_for_bios_corruption+0x93/0x9f [<ffffffff8020fca7>] ? periodic_check_for_corruption+0x0/0x25 [<ffffffff8020fcb0>] periodic_check_for_corruption+0x9/0x25 By the way I confirmed this bug on Sony Vaio FW11M (my one is FW11S). Probably more machines from FW11* are affected. -- Rafał Miłecki
If this patch is known to work fine for Sony Vaio FW* and Alan's machine, could it go mainline somehow? -- Rafał Miłecki
Well. Thanks for the prod, and I'm certainly remiss for not following up sooner. But I'm really not at all keen on such a patch going into mainline myself. It's an interesting experiment, and I'd be happy to see such a patch (adjusted to make sure output goes to kerneloops.org) spending a little while in Fedora Rawhide (who'd be the right contact for that?). But so far as mainline goes, I share Alan Cox's opinion that we should not be chopping pages out of every x86 user's memory, just because a couple of machines with faulty BIOSes have been observed. Particularly now it's evident that the 64kB "limit" is no more than a reflection of where the directmap pagetable changes have caught such corruption. If lots more such corruptions are reported, of course I would change my position; but those bad directmap PMD crashes are themselves quite recognizable now we know to look out for them. I would prefer you both to use the minimal memmap=3D solutions for now; but others may disagree. Hugh
I have my original patch with your changes as a followup patch sitting
in my queue. I was planning on sending it in the next day or so. I was
Dave Jones? He used to do it, at least. I guess a WARN_ON() would get
I think that's a worthwhile cost for -rc. We can fix it up (ie, make it
a real config option, defaulting off) for release once we're happy that
We should definitely make the kernel parameter set the banned memory
The fact that we're seeing this problem in two completely different
systems with different BIOSes and everything else makes me worried that
this is quite widespread. It's only the persistence and diligence of
our bug reporters that we managed to work out that they're the same
problem. How many other people are getting strange crashes and haven't
managed to correlate it any particular BIOS interaction? Or just happen
to be corrupting memory we don't care about right now, but is only a
small code change or link order change away from disaster?
J
--
please put this all behind a .config debug option that distros can turn on/off. Also, when it's enabled in the .config, there should be another .config option that marks it disabled by default but it can be enabled via a boot parameter. Distro debug kernels will most likely enable the .config - even release kernels might enable it it, with default off - users can enable the boot switch if they suspect something, without having to build a new kernel. Ingo --
Yes, that wouldn't be a bad idea. The corruption we've seen so far is
Well, we just don't know. We have two systems which started reporting
problems when I made a small change to how the initial pagetables are
allocated. Before that they appeared to work fine, though in reality it
was just some other part of the address space being corrupted. Who
knows how many systems are out there "working fine", except that some
obscure address in the user address space now allows you to write into
kernel memory?
The two systems are *completely different*. One has a Phoenix BIOS, the
other is AMI. One is a Sony Vaio, the other is an OEM Intel desktop.
One corrupts memory on unplugging a HDMI cable, the other corrupts on
suspend/resume. I don't think this is reassuring, or indicates the
The idea is to leave it on by default for a while - maybe a couple of
-rc releases - and see what turns up. We can turn it off for a release
kernel if people really object, though I'd hope something like Fedora
Rawhide would leave it on.
J
--
This is very interresting. I have a gigabyte X48 board, and everything works perfect, however, in memtest86+ i get thousands of errors in the range 0-0.9mb. I am certain my ram is fine, i've tried it in other computers, and reversed. I've run dozens of stress tests within my booted linux, no trouble, also the userspace memtester (allthough im aware it wont actually ever get to grab offending addresses). --
I would expect you'd get crashes if you had massive corruption in that
area. By default it's being used for the kernel pagetables, so stomping
on them will cause problems. In the cases we've been looking at so far,
it's only been a word or two being stomped.
The low 1Mbyte is used as general memory, so corruption over the whole
area will have widespread effects.
But try this patch (Hugh's version) and see what happens.
J
--
MjAwOC84LzI4IEplcmVteSBGaXR6aGFyZGluZ2UgPGplcmVteUBnb29wLm9yZz46Cj4gU29tZSBC SU9TZXMgaGF2ZSBiZWVuIG9ic2VydmVkIHRvIGNvcnJ1cHQgbWVtb3J5IGluIHRoZSBsb3cgNjRr LiAgVGhpcwo+IHBhdGNoIGRvZXMgdHdvIHRoaW5nczoKPiAgLSBSZXNlcnZlcyBhbGwgbWVtb3J5 IHdoaWNoIGRvZXMgbm90IGhhdmUgdG8gYmUgaW4gdGhhdCBhcmVhLCB0bwo+ICAgcHJldmVudCBp dCBmcm9tIGJlaW5nIHVzZWQgYXMgZ2VuZXJhbCBtZW1vcnkgYnkgdGhlIGtlcm5lbC4gIFRoaW5n cwo+ICAgbGlrZSB0aGUgU01QIHRyYW1wb2xpbmUgYXJlIHN0aWxsIGluIHRoZSBtZW1vcnksIGhv d2V2ZXIuCj4gIC0gQ2xlYXJzIHRoZSByZXNlcnZlZCBtZW1vcnkgc28gd2UgY2FuIG9ic2VydmUg Y2hhbmdlcyB0byBpdC4KPiAgLSBBZGRzIGEgZnVuY3Rpb24gY2hlY2tfZm9yX2Jpb3NfY29ycnVw dGlvbigpIHdoaWNoIGNoZWNrcyBhbmQgcmVwb3J0cyBvbgo+ICAgbWVtb3J5IGJlY29taW5nIHVu ZXhwZWN0ZWRseSBub24temVyby4gIEN1cnJlbnRseSBpdCdzIGNhbGxlZCBpbiB0aGUKPiAgIHg4 NiBmYXVsdCBoYW5kbGVyLCBhbmQgdGhlIHBvd2VybWFuYWdlbWVudCBkZWJ1ZyBvdXRwdXQuCj4K PiBSRkM6IFdoYXQgb3RoZXIgcGxhY2VzIHNob3VsZCB3ZSBjaGVjayBmb3IgY29ycnVwdGlvbiBp bj8KPgo+IFsgQWxhbiwgUmFmYcWCOiBjb3VsZCB5b3UgY2hlY2sgeW91IHNlZToKPiAgIDE6IGNv cnJ1cHRpb24gbWVzc2FnZXMKPiAgIDI6IG5vIGNyYXNoZXMKPiAgVGhhbmtzIC1KCj4gXQoKSSB3 YXMgdHJ5aW5nIG15IGJlc3QgdG8gY3Jhc2ggc3lzdGVtIHdpdGggdGhpcyBwYXRjaCBhcHBsaWVk IGFuZCBmYWlsZWQgOikKCldvcmtzIGdyZWF0LgoKSnVzdCB3b25kZXIgaWYgSSBzaG91bGQgZXhw ZWN0IGFueSBwcmludGsgZnJvbQpjaGVja19mb3JfYmlvc19jb3JydXB0aW9uPyBJIGRvIG5vdCBz ZWUgYW55OgoKemFqZWNAc29ueTp+PiBkbWVzZyB8IGdyZXAgLWkgY29ycgpzY2FubmluZyAyIGFy ZWFzIGZvciBCSU9TIGNvcnJ1cHRpb24KCi0tIApSYWZhxYIgTWnFgmVja2kK --
that's _very_ weird. maybe the BIOS expects _zeroes_ somewhere? Do you suddenly see crashes if you change this line in Jeremy's patch: + memset(__va(addr), 0, size); to something like: + memset(__va(addr), 0x55, size); If this does not tickle any messages either, then maybe the problem is in the identity of the entities we allocate in the first 64K. Is there a list of allocations that go there when Jeremy's patch is not applied? but ... i think with an earlier patch you saw corruption, right? Far-fetched idea: maybe it's some CPU erratum during suspend/resume that corrupts pagetables if the pagetables are allocated in the first 64K of RAM? In that case we should use a bootmem allocation for pagetables that give a minimum address of 64K. Ingo --
No, it's expected. Rafał only got corruption when plugging his HDMI
cable, and I didn't put any corruption checks on that path (I'm not even
sure what kernel code would get executed in that case). Hugh's original
patch put a check in the hot path of the fault handler - and so it would
get called regularly - but I put it in the kernel-bug path, which is
fairly pointless given that we expect this patch to prevent the crashes.
It does, however, do the check in the pm state changes, so doing a
suspend should make it print some of the corruption it found. Alan's
case would be a better test for that though.
It does raise the question of where the good places to put the check
are. It shouldn't be too hot, given that it's scanning ~64k of memory,
but often enough to actually show something. I was thinking of putting
some calls in the acpi code itself, but got, erm, discouraged.
Rafał's corruption was definitely non-zero. I think the corruption is
happening, but it's just not reported.
J
--
i thought we did a check after coming back from resume? That would be a natural place to do the check. (the corruption is suspend/resume related after all, right?) Ingo --
No, in Rafał's case it happens when he unplugs an HDMI cable.
J
--
I just followed up in detail to Ingo, but I think it's because I didn't
put any checks where you would hit them. If you suspend/resume it
should print something.
J
--
Thanks for taking this on, Jeremy: I had too many doubts to do so. hpa introduced the 64k idea, and we've all been repeating it; but I've not heard the reasoning behind it. Is it a fundamental addressing limitation within the BIOS memory model? Or a case that Windows treats the bottom 64k as scratch, so BIOS testers won't notice if they corrupt it? The two instances of corruption we've been studying have indeed been below 64k (one in page 8 and one in page 11), but that's because they were both recognizable corruptions of direct map PMDs. If there is not a very strong justification for that 64k limit, then I don't think this approach will be very useful, and we should simply continue to rely on analyzing corruption when it appears, and recommend memmap=3D as a way of avoiding it once analyzed. If there I don't know: the easy answer would be just to do it once every minute. As the patch stands, we'll only learn more on machines going through suspend+resume (disk or ram). If the corruption avoidance works, =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= It's actually a bottom_of_memory corruption_check: it would pick up corruption there whether it's caused by the BIOS or not. The boot parameter description ought to refer to the config option: ah, the config option is always on for x86, hmm. If the 64k is in any doubt, maybe the corruption_check boot option =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= Always on? For the moment, perhaps. I'm very pleased to see you've set it on for x86_32 as well as for x86_64, I'd been wanting to ask =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= Small point, but since we're doing the same on 32-bit and 64-bit, I think it would be better to operate on unsigned ints, to get The purpose of the dump_stack being to insert a recognizable and irritating spew into the log, prompting people to report these cases: the stacktrace itself ...
I don't think there's been a strong rationale. 64k is significant because it's the size of an old 16-bit real-mode segment, and perhaps the bios code in question is 16-bit code. But I think it's really a question of limiting the scope of the problem: yes, the BIOS could corrupt any memory anywhere, but what can we possibly do about that? Such a machine is as useless as a system with intermittent faulty memory hardware. We have to assume that it is basically possible to use the machine, and that presumably whatever bug the BIOS has isn't making Windows crash. Windows might be avoiding the low 64k for some ancient-history DOS reason, or perhaps it's just full of boot code which isn't used again? It would be easy to add another kernel parameter to select how much Right. Once a minute sounds good, though it might be a bit too long to Yes, that sounds good. Also, it might be worth being able to set the rate. Doing it once every 10 seconds wouldn't add too much overhead, and once a second would be reasonable if you think there's an actual I couldn't see any reason the problem would be restricted to 64-bit. It Well, no, I was thinking about it would give a clue about what went wrong. If you see the spew out of the pm code, then it was likely suspend/resume. If it's out of the timer function then it won't tell you much other than "something else". If we put it elsewhere (even if temporarily for detecting other suspected corruption symptoms), then it will be useful for distinguishing those. An alternative would be to pass a flag to say whether a backtrace would That would break Xen. I needed to do the pagetable reuse to make sure We could add NX. What's the behaviour of setting NX in a non-NX-supporting CPU? I don't think it would trigger a "reserved bit" exception (the other high pte flags don't). Or failing that, we could mask out NX once we've worked out the CPU doesn't support it (at the same time it relocates the pagetables to the kernel's ...
Or just make the boot_corruption_check parameter take a range instead of a boolean. -hpa --
I've no experience of what happens if NX is set to a non-NX-supporting CPU, so can't advise on that at all. But I think you're looking at it the wrong way round - or else I am. Here's the declaration and comment on level2_ident_pgt in head_64.S: NEXT_PAGE(level2_ident_pgt) /* Since I easily can, map the first 1G. * Don't set NX because code runs from these pages. */ PMDS(0, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD) (The "Since I easily can" comment is there because it used to map only a subset needed for early kernel startup, not the whole 1G.) So it's very intentionally leaving NX out there. I believe the level2_ident_pgt page appears twice or more in the pagetable layout, used to map two or more areas of virtual address space - once to provide the direct map at ffff880000000000 and once to provide the kernel image virtual mapping at ffffffff80200000. I think we don't want NX on Linux kernel text ;-? Before your 2.6.27-rc changes, init_memory_mapping subsequently replaced the direct map usage by a separately constructed pagetable, similar to it but with NX set throughout. After your 2.6.27-rc changes, level2_ident_pgt is found there already so left untouched - leaving NX out of that first 1G of the direct map forever (when CPA splits it up, the smaller pages inherit the lack of NX too). I noticed when changing /proc/meminfo to show DirectMap in kB, and tried to fix it, but only gave myself a non-booting system (I probably shrank my direct map to 0 while implicitly using it). And at that stage I didn't realize at all that it was a recent regression arising from your mods. I'm reluctant to delve in there again at present, and unsure what the right fix should be: perhaps the code which checks if an entry is already there, should check if it has the desired flags? Hugh --
Well, if we never want the direct map to be non-executable (which I
think would be OK, since all the code is either core kernel or modules
which are mapped elsewhere), then we can set NX on the level4 for the
Well, the specific reason I made these changes was to make sure that
there was never more than one entry mapping any kernel page, so that you
can update the page permissions on a kernel page with just one update.
This is pretty much a requirement for Xen, and very convenient at other
times. Native will use either L2 or L3 mappings for the kernel and
linear space, and Xen uses L1 mappings, so if we break the aliasing at
the L2 level I can still keep the L1s aliased, but it seems simpler to
set NX on the linear mapping's L4.
J
--
That would be much the neatest answer: I hadn't realized that inheritance (perhaps I'm still living in early-i386 days, when IIRC there was a bug in inheriting WP from higher levels). But then I stumbled across static_protections() in pageattr.c (takes both addr and pfn, latter seems weird), whose BIOS_BEGIN and BIOS_END seem to echo the ISA_START_ADDR and ISA_END_ADDR used by is_ISA_range() in ioremap.c. And peering at the pagetables I've got here for that area of the direct map in 2.6.26 x86_64, yes, I'm missing NX from 0xc0000 to 0xfffff (presumably nothing tried to ioremap 0xa0000 to 0xbffff). A simple answer might be to go the way you suggest, but remove the special casing from pageattr.c and ioremap.c; but I fear that will slow something down, or introduce further bugs. Hugh --
The 64K number was empirical, of course. The bottom 64K is somewhat special, however, in that it is what you can address in real mode (as opposed to big real mode) with your segments parked at zero, so you end up with something approaching a flat real mode. Especially the first 32K (below 0x7c00) are frequently used by various BIOS items, but I believe the corruption observed was at 0x8000, so it's beyond even this first barrier. Obviously, it's extremely hard to predict where BIOS vendors will have choosen to scribble, but the observations in this particular case seemed to finger this particular area. -hpa --
I should point out that I have seen one particular bug quite a few times poking around with boot loaders: the BIOS accesses memory at an otherwise valid address, but with the segment base set to either zero or 0x400 instead of whatever it should have been. -hpa --
