Changes to previous versions:
- Ported to the latest git-x86 including the PAT patchkit
This undoes some changes in the PAT patches and reimplements them
in a different way. End result should be equivalent, but this
made it easier for me to merge the patches.
- Fix NX bit handling (I think even after Jeremy's fixes it was
still not completely right)
- Minor fixes based on feedback-Andi
--
thanks Andi for porting your CPA queue ontop of PAT. Now that PAT
support is getting into shape i've test-merged your CPA series to
x86.git.v2.6.25 merging of CPA is still somewhat in limbo but worst-case i think
we can still get away with just doing wbinvd instead of clflush and get
rid of most of the risks that way. Could you please add a boot option
and Kconfig option that does that? Something like "noclflush" and a
.config option to achieve the same - just like we do for PAT.We've got way too much stuff going on at the moment - and the PAT bits
are more fundamental and more important than nice but non-essential
optimizations like CPA. There's still a lot of cruft all around this
area.One thing, you undid a cleanup patch:
| Subject: CPA: Undo white space changes
| From: Andi Kleen <ak@suse.de>
|
| Undo random white space changes. This reverts
| ddb53b5735793a19dc17bcd98b050f672this is perfectly fine as we do not want to make your merging harder via
cleanups, as long as you redo the cleanups after your series. Your new
code is pretty ugly to look at, and this very much shows in the
checkpatch metrics too:errors lines of code errors/KLOC
arch/x86/mm/pageattr_32.c 29 419 69.2
arch/x86/mm/pageattr_64.c 31 384 80.7prior the undo it was:
errors lines of code errors/KLOC
arch/x86/mm/pageattr_32.c 0 294 0
arch/x86/mm/pageattr_64.c 0 275 0please restore that cleanliness state. Thanks,
Ingo
--
hm, i just found a failing 64-bit .config while testing your CPA
patchset:[ 1.916541] CPA mapping 4k 0 large 2048 gb 0 x 0[0-0] miss 0
[ 1.919874] Unable to handle kernel paging request at 000000000335aea8 RIP:
[ 1.919874] [<ffffffff8021d2d3>] change_page_attr+0x3/0x61
[ 1.919874] PGD 0
[ 1.919874] Oops: 0000 [1]
[ 1.919874] CPU 0config and full crash.log attached. Fully reproducible. I've also pushed
out the current x86.git with the new CPA bits included.Ingo
hm, and your CPA queue is no bisectable, due to:
Subject: Undo pat cpa patch
From: Andi Kleen <ak@suse.de>Going to implement this differently
commit 5ec5c5a2302ca8794da03f8bedec931a2a814ae9
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jan 15 09:36:03 2008 +0100patches/x86-pat-cpa_i386.patch
could you please make your queue bisectable?
Also, did you actually try to port your queue ontop of the cleanups
patch? There's an easy technique: first do an undo patch, then apply
your patch, then apply the re-do patch. Whatever rejects come from the
re-do patch can be dropped from the cleanup. Continue with this until
all patches are covered. Then generate the undo+your-old-patch+redo
patch into a single your-new-patch.Ingo
--
The idea was that you git revert the original patches I referenced
and then drop the undo patches since I reimplement all that in different ways
(except for the white space changes, but that can be redone once everything
settled down again). Then it will be bisectable.Sorry for not making this more clear in the original mail.
-Andi
--
and how does that again make things bisectable in the middle of the PAT
queue? For example if i undo:Subject: x86: pat: cpa, 32-bit
From: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>in the PAT series, the rest of PAT wont apply. The proper approach is to
do incremental updates to the existing codebase, i.e. truly base CPA
ontop of PAT.it's a revert barrier (within v2.6.25), so it would be nicer and more
maintainable to integrate the whitespace changes into your patches, via
the method i suggested. (it can even be scripted up)Ingo
--
Ok sorry only the pageattr_32 hunks would need to be undone.
So it would be something like:
drop venki's 32bit patch
insert my patch to add cpa_addr()
reinsert venki's patch with the pageattr_32 hunks dropped after itdrop venki's 64bit patch
insert my patch to add reserved checking on 64bit cpa
reinsert venki's patch with the pageattr_64 hunks dropped after itWhat is a revert barrier?
Anyways of course the way to handle that is the same as with the other undo
patches: drop the original white space changes (ddb53b5735793a19dc17bcd98b050f672f)
completely and then drop that undo patch too. The white space changes haven't reached
Linus yet so you can just make them disappear completely from known history.-Andi
--
we dont want them to disappear, due to the second half of:
http://lkml.org/lkml/2008/1/18/112
we want easy-cleanups first, difficult changes applied second. It's a
well-established concept.Ingo
--
That rule of thumb makes sense if someone does a series from scratch, but
redoing a large existing series just because someone else sneaked in a white space
patch at the wrong time does not seem to be very efficient to me.After all these rules are not to make our lives more complicated,
but to to make things more efficient.And most of the code in the white space cleanup changes anyways in the later
CPA series.And what is left you'll get them again anyways once the CPA stuff is
in and people tested it a bit more so it has settled down.I promise to redo them on top of the end result.
-Andi
--
i pointed it out how to port a larger series ontop of a whitespace
cleanup patch:http://lkml.org/lkml/2008/1/18/281
the "there's an easy technique" bit. Repeat that method for every patch
and you'll have your series ported ontop the whitespace cleanups. (with
no risks)Ingo
--
But it will be even easier to just redo the cleanup stuff at the
end. If I do what you describe here I'm sure I will make a mistake
somewhere and I would rather not risk that.-Andi
--
FYI, i've done the proper splitup of your CPA patchset - see today's
x86.git#mm for the details. I've extracted all the c_p_a() fixes from
your series and eliminated the 'undo cleanups' patch as well.It's a first shot so it might not yet be perfect - although so far it
looks good in testing on 4-5 testsystems here, on mixed 64-bit and
32-bit boxes. Doing it this way was a pretty straightforward process, it
took less than an hour - and the end result feels much better in terms
of maintainability.I left the clflush feature bits out for now - fixes and cleanups go
first. We first need to see whether this is robust enough before making
other changes to c_p_a(). There's enough on the arch/x86 plate for
v2.6.25 already - we can try the clflush optimizations in v2.6.26.
(since there's no high-freq in-kernel user of the c_p_a() API at the
moment, there's no pressing need for this either.)Anyway, could you check today's x86.git and see whether any of those
fixes have some implicit dependency on other changes i left out of this
splitup? That's the main high-level risk i can see for now. (besides the
large number of changes to this fragile API)Also, CPA_DEBUG still produces warnings all around the place - as it did
with your series.Ingo
--
You still kept Venki's redundant 32bit reference count change for 32bit.
The code handled that already by doing reserved bits check.IMHO it would have been cleaner to also do that for the 64bit version
instead of abusing the reference counting for this (like myOk I'll redo it. Thank you for your support.
Probably in larger chunks now though -- with your somewhat
random patching applying methology larger small grained series are just too
painful for me.First priority will be gbpages on top of it.
I would appreciate if you could either prevent or warn against further
wide scale changes on these files before .26 then -- otherwise I'll haveYes I'll do a patch later. I had wanted to fix it over the weekend, but
was fighting instead with all the other problems that were in git-x86
at that time.-Andi
--
Hmm. Which patch are you referring to ? There is no patch from Venki
I don't understand what you mean. The "CPA Handle 4K split pages at
boot on 64bit" patch is in x86.git:http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit...
Please keep them fine-grained and keep fixes separate and prior to
First priority is getting CPA and PAT consolidated before we put new
functionality on top of it. This implies a possible unification of the
32 and 64 bit code as well. There is no real good reason to haveFYI, the consolidation of CPA and PAT is changing that code, so flux
is expected.Thanks,
tglx
--
> First priority is getting CPA and PAT consolidated before we put new
PAT seems to be still quite unstable and frankly for me it is
unclear how long it will take to it become stable. It would
not surprise me if it takes longer than the .26 merge window.You're saying you want to delay an relatively simple and imho
relatively mature feature like gbpages after that complicated and risky
feature PAT? Please take a look at the patches; they're really
not very complicated.That seems to me like against your own principles -- simple stuff
first -- that you two harped on so extensively on earlier this thread.For me it would make much more sense to put the gbpages first
than to delay them for PAT. I only didn't argue this strongly
earlier because PAT was already rushed in (for me quite suprisingly)
and I didn't want to argue for dropping it.But now that it is gone again anyways delaying the gbpages for it again
would be quite unfortunate from my perspective.-Andi
--
Definitely, if we change the code further without doing anything to
consolidate it in the first place.Have you even cared to look, why PAT is so ugly and fragile ? Simply
because it interferes/interacts with CPA and the page table code. So
adding further stuff to that area without considering the requirementsIt's not a question of complicated or not. Fact is, that PAT is
interfering with all this and any new feature will make it harder toNot at all. If the simple stuff makes it harder to do something else,
then it is not longer simple. Then it is simply in the way.If your patches are so simple, then they can be done on top of a
I can understand that, because it is in the way of your particular
interests, but we have to look at the global picture and not at the
personal preferences of you or anyone else.Thanks,
tglx
--
Well I was second generation hacker on the patchkit (after Eric B.) and
No that is not its main problem I believe. Main problem are
all the driver and other subsystem interactions (it is a little
bit similar to power management where you have lots of little
bits all over right instead of a single big one). The actual
page table handling is the smallest issue and well understood
anyways.gbpages on the other hand does not change the driver interaction
I don't think gbpages has much to do with how well PAT works or not.
It is just a different way to map the large areas of the direct mapping
that do not contain any mmio or aperture mappings. These areas
are not affected by PAT. By definition (in Linux) if PAT is active
for something there are no gbpages anymore.PAT essentially only works on areas which are already split into
Sure they can -- i did that in fact with PAT only -- my worry is just that
there is no time frame when someone will actually produce
working PAT and then consolidated CPA. So basically my relatively
simple (and imho not very intrusive) feature is queued behind two very
complicated projects with unclear time frame and might
be delayed forever for those.And the rationales I so far heard for this particular prioritization
were not very convincing to say the least. Frankly I suspect Ingo
hadn't actually looked at the gbpage code really before coming up withAccording to you and Ingo "the global perspective" is to get
simple stuff first in. But in this case you're doing the complicated
(and worse the unfinished) stuff first which seems to be against
your own principles.-Andi
--
No, the global perspective is to get a stable and reliable system,
which allows us to do new features like gbpages, PAT and whatever
comes up next in a clean way.Your patches just shove another extra into the existing code base
without doing any consolidation work and without any consideration of
problems we need to urgently solve in this area.Your only care is to get stuff merged which is interesting for you. I
can understand that, but it should be entirely clear to you as an
engineer that ignoring the existing problems and adding more (even
simple) stuff makes it more complex to consolidate and is nothing else
than bad engineering.PAT is high on the requirements list, not because it's not complex (it
definitely is), but simply because Linux has a years long of backlog
(it's the only modern OS on the planet still not using PAT) and
hardware makers are stepping beyond the limits of MTRRs. There is an
increasing number of systems which don't work under Linux properly due
to the MTRR limitations, but work perfectly fine with other
OSs. Should we ignore that ?While PAT is a 10 years old hardware feature, gbpages is a feature for
a brand new chip, which is not even available to mere mortals in a
useable form. And there is no real problem with not having gbpages for
some time. So where is the pressure to get that in? Just because it
can be done and happens to work on some test machine?PAT patches have been around for years and nothing happened - while
the first time gbpages were submitted was 19 days ago by you.Of all pending features, PAT has a priority simply because it
affects users. The lack of gbpages does not. We are not going to rush
PAT in before it is stable, but we hold everything off which
interferes with getting it to that point.Please stop arguing around with the subtle undertone of us having no
clue about the topics. We looked into the whole set of pending issues,
including your gbpages patchset and we well understand the
implications. It is quite cl...
I fixed the problems in CPA I was aware of -- I'm not aware of
Very true -- by definition I'm not interested in things I'm not interested
in. Thanks for reminding me of that :-) However it would surpriseCan you elaborate on the existing problems in the CPA code?
(excluding issues already fixed in my CPA patch series)Actually I'm not aware of any shipping box that doesn't work currently on Linux
because of no PAT or MTRRs. Do you have an example? I know BIOS
people have been grumbling about it, but I don't think there were
any real show stoppers so far.It is pretty hard to imagine that ever being the case anyways. We already
did non caching mappings for quite some time using the page tables
(although admittedly not fully correct and a little unsafe, but probably
well enough in practice). The only value add that you get from
true PAT support is write-combining and write combining over uncacheable
is always only an optimization; nothing required to make
boxes work.Admittedly it is helpful for 3d graphics, but the current state
is that the big out of tree 3d stuff reprograms the PAT registers
on its own. While replacing that with an in tree solution
will be a good idea it is not really all that urgent.But I'm not saying that that PAT shouldn't be merged
anyways -- i wouldn't have worked on these patches earlier
if that was the case -- i'm just disagreeing on you
saying it is more important than anything else. I also think
it will take longer to make it really stable enough to be mergeable
(.26 target would be probably ambitious) so I don't think other patchesAMD shipped over 400k of them last quarter and they are perfectly usable
For me it's mostly that I was sitting too long on that patch
(ok that's my own fault) so I finally want to get it out.Also I don't know of any real reason to delay it much longer -- it is
not particularly tricky and contrary to your claims it does not actually
interact in a great way with with PAT or anything else prett...
that is (yet another) major misconception on your part. "Drivers" are an
easy to blame target (i guess because there's no one out there to defend
a vague "drivers" accusation), and they are not the problem here _at
all_.Drivers tell the architecture code which physical pages they'd like to
have access to (or which page range they'd like to see different cache
attributes on) and that's it. They are plain users of the ioremap() and
change_page_attr() APIs. Nothing more, nothing less.It is the utmost duty of architecture code to make those APIs
fool-proof. Hardware _will_ mess up the physical parameters that get
passed in every possible way - and drivers just try to use what the
hardware tells them to use. So robustness is key and there's just no
"driver reason" why these APIs cannot be robust.so you are delusional if you think that the c_p_a() problems are "driver
and other subsystem interactions".And your analogy with power management could not be more mistaken. Power
management and suspend/resume in particular is so complex because it is
analogous to a _full bootup and shutdown cycle_, with the following,
hard to meet expectation from the user: 'this stuff must work all the
time, and must be instantaneous'. Suspend/resume is an _incredibly
complex_ machinery and the user does not realize (and does not accept
the concequences) of this complexity. It is a codepath that is affected
by tens and tens of thousands of driver and core kernel code. Just one
single mistake and "resume does not work".ioremap() and change_page_attr() on the other hand is a small, few
hundred lines codebase for a stable and well-defined purpose. There's no
significant "subsystem interactions" whatsoever.by far the most intense and most high-frequency user of the
change_page_attr() code is CONFIG_DEBUG_PAGEALLOC=y. It does a cpa call
for every single page and slab allocation/freeing. But this debug
feature ... is not enabled on the 64-bit side - why? So unfortunately w...
In this case the problem is that drivers ask for different caching mode
on overlapping IO mappings. That only gets resolved by the underlyingThat's true, but the problem is that they give different conflicting
Does it?
ioremap as an ABI is robust I believe, but still it has some basic
requirements like nobody passing in conflicting requests.Are you saying the underlying ioremap() interface should silently
change the caching mode if the driver passes in a conflicting one?I have my doubts that would be a good strategy. Better probably
Point of that being? I added a separate stress tester instead
that actually tests far more cases and actually verifies that it
works.My take is rather that with my changes DEBUG_PAGEALLOC is significantly
cheaper than it was before (e.g. it runs lockless now) so actually
more people can use it for debugging.And after all we have millions of lines of code who can benefit
from DEBUG_PAGEALLOC and will benefit from it being cheaper, while cpa
is just a few hundred lines of code that we will hopefully eventually
get right anyways.-Andi
--
Or rather instead of git reverting drop them completely. I'm sure it can
be done somehow. You should also moveCPA: Implement change_page_attr_addr entry point for i386
Similar to 64bit.
Needed by PAT patches. Replaces 5ec5c5a2302ca8794da03f8bedec931a2a814ae9
Note: should probably be put before PAT patches to avoid bisect failures later
and
CPA Handle 4K split pages at boot on 64bit
Port the code to check for already split 4K pages at boot over from
32bit to 64bit.Note: should be probably put before PAT patches to avoid bisect failures later
Signed-off-by: Andi Kleen <ak@suse.de>in front of the PAT patches like described in their descriptions. I put
them at the beginning and they are independent so that should be possible
without conflicts.I'm not sure what git commands to use for that, but I'm sure it can be
done somehow.-Andi
--
updated config attached. (the previous one ran through 'make oldconfig'
on the upstream kernel so it lost its x86.git#mm entries)Ingo
The SMP trampoline always runs in real mode, so making it executable
in the page tables doesn't make much sense because it executes
before page tables are set up. That was the only user of
set_kernel_exec(). Remove set_kernel_exec().Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/kernel/smpboot_32.c | 11 -----------
arch/x86/mm/init_32.c | 30 ------------------------------
include/asm-x86/pgtable_32.h | 12 ------------
3 files changed, 53 deletions(-)Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -535,36 +535,6 @@ static void __init set_nx(void)
}
}
}
-
-/*
- * Enables/disables executability of a given kernel page and
- * returns the previous setting.
- */
-int __init set_kernel_exec(unsigned long vaddr, int enable)
-{
- pte_t *pte;
- int ret = 1;
- int level;
-
- if (!nx_enabled)
- goto out;
-
- pte = lookup_address(vaddr, &level);
- BUG_ON(!pte);
-
- if (!pte_exec_kernel(*pte))
- ret = 0;
-
- if (enable)
- pte->pte_high &= ~(1 << (_PAGE_BIT_NX - 32));
- else
- pte->pte_high |= 1 << (_PAGE_BIT_NX - 32);
- pte_update_defer(&init_mm, vaddr, pte);
- __flush_tlb_all();
-out:
- return ret;
-}
-
#endif/*
Index: linux/include/asm-x86/pgtable_32.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_32.h
+++ linux/include/asm-x86/pgtable_32.h
@@ -195,18 +195,6 @@ static inline void clone_pgd_range(pgd_t
*/
extern pte_t *lookup_address(unsigned long address, int *level);-/*
- * Make a given kernel text page executable/non-executable.
- * Returns the previous executability setting of that page (which
- * is used to restore the previous state). Used by the SMP bootup code.
- * NOTE: this is an __init function for security reasons.
- */
-#ifdef...
- Rename it to pte_exec() from pte_exec_kernel(). There is nothing
kernel specific in there.
- Move it into the common file because _PAGE_NX is 0 on !PAE and then
pte_exec() will be always evaluate to true.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/fault_32.c | 2 +-
include/asm-x86/pgtable-2level.h | 8 --------
include/asm-x86/pgtable-3level.h | 8 --------
include/asm-x86/pgtable.h | 1 +
4 files changed, 2 insertions(+), 17 deletions(-)Index: linux/include/asm-x86/pgtable-2level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-2level.h
+++ linux/include/asm-x86/pgtable-2level.h
@@ -56,14 +56,6 @@ static inline pte_t native_ptep_get_and_
#define pte_pfn(x) (pte_val(x) >> PAGE_SHIFT)/*
- * All present pages are kernel-executable:
- */
-static inline int pte_exec_kernel(pte_t pte)
-{
- return 1;
-}
-
-/*
* Bits 0, 6 and 7 are taken, split up the 29 bits of offset
* into this range:
*/
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -153,6 +153,7 @@ static inline int pte_write(pte_t pte)
static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; }
static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_PSE; }
static inline int pte_global(pte_t pte) { return pte_val(pte) & _PAGE_GLOBAL; }
+static inline int pte_exec(pte_t pte) { return !(pte_val(pte) & _PAGE_NX); }static inline int pmd_large(pmd_t pte) {
return (pmd_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
Index: linux/include/asm-x86/pgtable-3level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-3level.h
+++ linux/include/asm-x86/pgtable-3level.h
@@ -19,14 +19,6 @@
#define pu...
Someone setting NX on the kernel text tends to result in nasty failures
and triple faults, so BUG_ON early for that.Does not cover __inittext.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 8 ++++++++
1 file changed, 8 insertions(+)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -242,6 +242,14 @@ __change_page_attr(struct page *page, pg
BUG_ON(PageLRU(kpte_page));
BUG_ON(PageCompound(kpte_page));+ /*
+ * Better fail early if someone sets the kernel text to NX.
+ * Does not cover __inittext
+ */
+ BUG_ON(address >= (unsigned long)&_text &&
+ address < (unsigned long)&_etext &&
+ (pgprot_val(prot) & _PAGE_NX));
+
set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
level < 3);--
And clarify description a bit.
Only for 64bit, but the interfaces are identical for 32bit and kerneldoc should
merge them (?)Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
Documentation/DocBook/kernel-api.tmpl | 8 +++++
arch/x86/mm/pageattr_64.c | 46 +++++++++++++++++++++++++---------
2 files changed, 42 insertions(+), 12 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -266,19 +266,19 @@ __change_page_attr(unsigned long address
return 0;
}-/*
- * Change the page attributes of an page in the linear mapping.
- *
- * This should be used when a page is mapped with a different caching policy
- * than write-back somewhere - some CPUs do not like it when mappings with
- * different caching policies exist. This changes the page attributes of the
- * in kernel linear mapping too.
+/**
+ * change_page_attr_addr - Change page table attributes in linear mapping
+ * @address: Virtual address in linear mapping.
+ * @numpages: Number of pages to change
+ * @prot: New page table attribute (PAGE_*)
*
- * The caller needs to ensure that there are no conflicting mappings elsewhere.
- * This function only deals with the kernel linear map.
- *
- * Caller must call global_flush_tlb() after this.
+ * Change page attributes of a page in the direct mapping. This is a variant
+ * of change_page_attr() that also works on memory holes that do not have
+ * mem_map entry (pfn_valid() is false).
+ *
+ * See change_page_attr() documentation for more details.
*/
+
int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
{
int err = 0, kernel_map = 0;
@@ -315,13 +315,35 @@ int change_page_attr_addr(unsigned long
return err;
}-/* Don't call this for MMIO areas that may not have a mem_map entry */
+/**
+ * change_page_at...
The boot direct mapping initialization used a different test to check if a
page was part of the kernel mapping than c_p_a(). Make them use
a common function.Also round up to a large page size to be sure and check for the beginning
of the kernel address to handle highly loaded kernels better.This gives a small semantic change of NX applying to always 2MB areas
on !PSE && NX systems, but that's an obscure case even considering
DEBUG_PAGEALLOC.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/init_32.c | 16 ++--------------
arch/x86/mm/pageattr_32.c | 9 ++++++++-
include/asm-x86/pgtable_32.h | 2 +-
3 files changed, 11 insertions(+), 16 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -184,6 +184,13 @@ static int cache_attr_changed(pte_t pte,
return a != (pgprot_val(prot) & _PAGE_CACHE);
}+int text_address(unsigned long addr)
+{
+ unsigned long start = ((unsigned long)&_text) & LARGE_PAGE_MASK;
+ unsigned long end = ((unsigned long)&__init_end) & LARGE_PAGE_MASK;
+ return addr >= start && addr < end + LARGE_PAGE_SIZE;
+}
+
/*
* Mark the address for flushing later in global_tlb_flush().
*
@@ -238,7 +245,7 @@ __change_page_attr(struct page *page, pg
set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
level < 3);- if ((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
+ if (text_address(address))
ref_prot = PAGE_KERNEL_EXEC;ref_prot = canon_pgprot(ref_prot);
Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -136,13 +136,6 @@ static void __init page_table_range_init
}
}-static inline int is_kernel_te...
When changing a page that has already been modified to non standard attributes
before don't change the reference count. And when changing back a page
only decrease the ref count if the old attributes were non standard.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 44 +++++++++++++++++++++++++-------------------
arch/x86/mm/pageattr_64.c | 16 ++++++++++++----
include/asm-x86/pgtable.h | 2 ++
3 files changed, 39 insertions(+), 23 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -206,20 +206,26 @@ __change_page_attr(unsigned long address
{
pte_t *kpte;
struct page *kpte_page;
- pgprot_t ref_prot2;
+ pgprot_t ref_prot2, oldprot;
int level;kpte = lookup_address(address, &level);
if (!kpte) return 0;
kpte_page = virt_to_page(kpte);
+ oldprot = pte_pgprot(*kpte);
BUG_ON(PageCompound(kpte_page));
BUG_ON(PageLRU(kpte_page));set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
level < 4);+ ref_prot = canon_pgprot(ref_prot);
+ prot = canon_pgprot(prot);
+
if (pgprot_val(prot) != pgprot_val(ref_prot)) {
if (level == 4) {
+ if (pgprot_val(oldprot) == pgprot_val(ref_prot))
+ page_private(kpte_page)++;
set_pte(kpte, pfn_pte(pfn, prot));
} else {
/*
@@ -234,12 +240,14 @@ __change_page_attr(unsigned long address
pgprot_val(ref_prot2) &= ~_PAGE_NX;
set_pte(kpte, mk_pte(split, ref_prot2));
kpte_page = split;
+ page_private(kpte_page)++;
}
- page_private(kpte_page)++;
} else if (level == 4) {
+ if (pgprot_val(oldprot) != pgprot_val(ref_prot)) {
+ BUG_ON(page_private(kpte_page) <= 0);
+ page_private(kpte_page)--;
+ }
set_pte(kpte, pfn_pte(pfn, ref_prot));
- BUG_ON(page_private(kpte_page) == 0);
- page_private(kpte_page)--;
...
Various CPUs have errata when using INVLPG to flush large pages.
This includes Intel Penryn (AV2) and AMD K7 (#16 in Athlon 4)
While these happen only in specific circumstances it is still
a little risky and it's not clear the kernel can avoid them all.Avoid this can of worms by always flushing the full TLB (but
not the full cache) when splitting a large page. This should
not be that expensive anyways and initial splitting should be
hopefully infrequent.This also removes the previously hard coded workaround for K7
Athlon on 32bit.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 17 +++++++++++------
arch/x86/mm/pageattr_64.c | 12 ++++++++++--
2 files changed, 21 insertions(+), 8 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -113,10 +113,7 @@ static void flush_kernel_map(void *arg)
}
}- /* Handle errata with flushing large pages in early Athlons */
- if (a->full_flush ||
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- boot_cpu_data.x86 == 7))
+ if (a->full_flush)
__flush_tlb_all();/*
@@ -198,7 +195,7 @@ static int cache_attr_changed(pte_t pte,
* data structure to keep track of the flush. This has the added bonus that
* it will work for MMIO holes without mem_map too.
*/
-static void set_tlb_flush(unsigned long address, int cache)
+static void set_tlb_flush(unsigned long address, int cache, int large)
{
enum flush_mode mode = cache ? FLUSH_CACHE : FLUSH_TLB;
struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
@@ -210,6 +207,13 @@ static void set_tlb_flush(unsigned long
f->addr = address;
f->mode = mode;
list_add_tail(&f->l, &flush_pages);
+
+ /*
+ * Work around large page INVLPG bugs in early K7 and in Penryn.
+ * When we split a ...
Otherwise the kernel will likely always run with 4K pages instead of 2MB pages,
which is costly in terms of TLBs.Also optimize it a little bit by using only a single change_page_attr() calls.
This is particularly useful if debugging is enabled inside it because it spams
the logs much less.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/init_64.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -694,13 +694,16 @@ void free_init_pages(char *what, unsigne
init_page_count(virt_to_page(addr));
memset((void *)(addr & ~(PAGE_SIZE-1)),
POISON_FREE_INITMEM, PAGE_SIZE);
- if (addr >= __START_KERNEL_map)
- change_page_attr_addr(addr, 1, __pgprot(0));
free_page(addr);
totalram_pages++;
}
- if (addr > __START_KERNEL_map)
+#ifdef CONFIG_DEBUG_RODATA
+ if (begin >= __START_KERNEL_map) {
+ change_page_attr_addr(begin, (end - begin)/PAGE_SIZE,
+ __pgprot(0));
global_flush_tlb();
+ }
+#endif
}void free_initmem(void)
--
virt_to_page does not care about the bits below the page granuality.
So don't mask them.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -204,7 +204,7 @@ __change_page_attr(unsigned long addresskpte = lookup_address(address, &level);
if (!kpte) return 0;
- kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
+ kpte_page = virt_to_page(kpte);
BUG_ON(PageCompound(kpte_page));
BUG_ON(PageLRU(kpte_page));--
With the separate data structure added for flush earlier it is only
needed to call save_page() now on pte pages that have been already reverted.Also move all freeing checks into the caller.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 4 +---
arch/x86/mm/pageattr_64.c | 7 +++----
2 files changed, 4 insertions(+), 7 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -270,9 +270,9 @@ __change_page_attr(struct page *page, pg
* replace it with a largepage.
*/- save_page(kpte_page);
if (!PageReserved(kpte_page)) {
if (cpu_has_pse && (page_private(kpte_page) == 0)) {
+ save_page(kpte_page);
paravirt_release_pt(page_to_pfn(kpte_page));
revert_page(kpte_page, address);
}
@@ -349,8 +349,6 @@ void global_flush_tlb(void)
list_for_each_entry_safe(pg, next, &free_pages, lru) {
list_del(&pg->lru);
ClearPageDeferred(pg);
- if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)
- continue;
ClearPagePrivate(pg);
__free_page(pg);
}
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -243,9 +243,10 @@ __change_page_attr(unsigned long address
BUG();
}- save_page(kpte_page);
- if (!PageReserved(kpte_page) && page_private(kpte_page) == 0)
+ if (!PageReserved(kpte_page) && page_private(kpte_page) == 0) {
+ save_page(kpte_page);
revert_page(address, ref_prot);
+ }
return 0;
}@@ -335,8 +336,6 @@ void global_flush_tlb(void)
if (PageReserved(pg))
continue;
ClearPageDeferred(pg);
- if (page_private(pg) != 0)
- continue;
ClearPagePrivate(pg);
__free_page(pg);
}
--
No code changes.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -120,7 +120,7 @@ static void flush_kernel_map(void *arg)
wbinvd();
}-/* both protected by init_mm.mmap_sem */
+/* All protected by init_mm.mmap_sem */
static enum flush_mode full_flush;
static LIST_HEAD(deferred_pages);
static LIST_HEAD(flush_pages);
@@ -132,7 +132,7 @@ static inline void save_page(struct page
}/*
- * No more special protections in this 2/4MB area - revert to a
+ * No more special protections in this 2MB area - revert to a
* large page again.
*/
static void revert_page(unsigned long address, pgprot_t ref_prot)
--
When c_p_a() detects a inconsistency in the kernel page tables
it BUGs. When this happens dump the page table first to avoid one
bug reporting round trip.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 11 ++++++++++-
arch/x86/mm/pageattr_64.c | 11 ++++++++++-
2 files changed, 20 insertions(+), 2 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -12,6 +12,7 @@
#include <asm/processor.h>
#include <asm/tlbflush.h>
#include <asm/io.h>
+#include <asm/kdebug.h>enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };
@@ -231,8 +232,16 @@ __change_page_attr(unsigned long address
set_pte(kpte, pfn_pte(pfn, ref_prot));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
- } else
+ } else {
+ /*
+ * When you're here you either set the same page to PAGE_KERNEL
+ * two times in a row or the page table reference counting is
+ * broken again. To catch the later bug for now (sorry)
+ */
+ printk(KERN_ERR "address %lx\n", address);
+ dump_pagetable(address);
BUG();
+ }save_page(kpte_page);
if (!PageReserved(kpte_page) && page_private(kpte_page) == 0)
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -13,6 +13,7 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/sections.h>
+#include <asm/kdebug.h>#define PG_deferred PG_arch_1
@@ -252,8 +253,16 @@ __change_page_attr(struct page *page, pg
set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
- } else
+ } else {
+ /*
+ * When you're here you either...
Previously change_page_attr always flushed caches even for
pages that only change a non caching related attribute (like RO for
read/write protection).This changes the flush code to only flush the cache when the
caching attributes actually change.I made some effort to already handle reprogrammed PAT bits, although this
is not strictly needed right now by the core kernel (but that will
probably change soon)This will only make a difference on AMD CPUs or older Intel CPUs,
because all newer Intel CPUs support "self-snoop" and do not require
this cache flushing anyways.Another advantage of this patch is that it prevents recursive
slab calls with slab debugging.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 43 +++++++++++++++++++++++++++++++++----------
arch/x86/mm/pageattr_64.c | 42 ++++++++++++++++++++++++++++++++++--------
include/asm-x86/pgtable.h | 8 ++++++++
3 files changed, 75 insertions(+), 18 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,7 +13,10 @@
#include <asm/tlbflush.h>
#include <asm/io.h>+enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };
+
struct flush {
+ enum flush_mode mode;
struct list_head l;
unsigned long addr;
};
@@ -76,7 +79,7 @@ static struct page *split_large_page(uns
}struct flush_arg {
- int full_flush;
+ enum flush_mode full_flush;
struct list_head l;
};@@ -91,14 +94,19 @@ static void flush_kernel_map(void *arg)
{
struct flush_arg *a = (struct flush_arg *)arg;
struct flush *f;
+ int cache_flush = a->full_flush == FLUSH_CACHE;/* When clflush is available always use it because it is
much cheaper than WBINVD. */
list_for_each_entry(f, &a->l, l) {
if (!a->full_flush)
__flush_tlb_one(f->addr);
-...
Use the page table level instead of the PSE bit to check if the PTE
is for a 4K page or not. This makes the code more robust when the PAT
bit is changed because the PAT bit on 4K pages is in the same position as the
PSE bit.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 4 ++--
arch/x86/mm/pageattr_64.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -209,7 +209,7 @@ __change_page_attr(struct page *page, pg
set_tlb_flush(address);if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
- if (!pte_huge(*kpte)) {
+ if (level == 3) {
set_pte_atomic(kpte, mk_pte(page, prot));
} else {
pgprot_t ref_prot;
@@ -225,7 +225,7 @@ __change_page_attr(struct page *page, pg
kpte_page = split;
}
page_private(kpte_page)++;
- } else if (!pte_huge(*kpte)) {
+ } else if (level == 3) {
set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -184,7 +184,7 @@ __change_page_attr(unsigned long address
set_tlb_flush(address);if (pgprot_val(prot) != pgprot_val(ref_prot)) {
- if (!pte_huge(*kpte)) {
+ if (level == 4) {
set_pte(kpte, pfn_pte(pfn, prot));
} else {
/*
@@ -201,7 +201,7 @@ __change_page_attr(unsigned long address
kpte_page = split;
}
page_private(kpte_page)++;
- } else if (!pte_huge(*kpte)) {
+ } else if (level == 4) {
set_pte(kpte, pfn_pte(pfn, ref_prot));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
--
Intel recommends to first flush the TLBs and then the caches
on caching attribute changes. c_p_a() previously did it the
other way round. Reorder that.The procedure is still not fully compliant to the Intel documentation
because Intel recommends a all CPU synchronization step between
the TLB flushes and the cache flushes.However on all new Intel CPUs this is now meaningless anyways
because they support Self-Snoop and can skip the cache flush
step anyways.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 13 ++++++++++---
arch/x86/mm/pageattr_64.c | 14 ++++++++++----
2 files changed, 20 insertions(+), 7 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,9 +97,6 @@ static void flush_kernel_map(void *arg)
struct flush_arg *a = (struct flush_arg *)arg;
struct flush *f;- if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4 &&
- !cpu_has_ss)
- wbinvd();
list_for_each_entry(f, &a->l, l) {
if (!a->full_flush && !cpu_has_ss)
clflush_cache_range((void *)f->addr, PAGE_SIZE);
@@ -112,6 +109,16 @@ static void flush_kernel_map(void *arg)
(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
boot_cpu_data.x86 == 7))
__flush_tlb_all();
+
+ /*
+ * RED-PEN: Intel documentation ask for a CPU synchronization step
+ * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
+ */
+
+ if ((!cpu_has_clflush || a->full_flush) &&
+ !cpu_has_ss && boot_cpu_data.x86_model >= 4)
+ wbinvd();
+
}static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
...
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -95,7 +95,7 @@ static void flush_kernel_map(void *arg)
/* When clflush is available always use it because it is
much cheaper than WBINVD. */
if ((a->full_flush || !cpu_has_clflush) && !cpu_has_ss)
- asm volatile("wbinvd" ::: "memory");
+ wbinvd();
list_for_each_entry(f, &a->l, l) {
if (!a->full_flush && !cpu_has_ss)
clflush_cache_range((void *)f->addr, PAGE_SIZE);
--
When the self-snoop CPUID bit is set change_page_attr() only needs to flush
TLBs, but not the caches.The description of self-snoop in the Intel manuals is a bit vague
but I got confirmation that this is what SS really means.This should improve c_p_a() performance significantly on newer
Intel CPUs.Note: the line > 80 characters will be modified again in a followup
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 5 +++--
arch/x86/mm/pageattr_64.c | 4 ++--
include/asm-x86/cpufeature.h | 1 +
3 files changed, 6 insertions(+), 4 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,10 +97,11 @@ static void flush_kernel_map(void *arg)
struct flush_arg *a = (struct flush_arg *)arg;
struct flush *f;- if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4)
+ if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4 &&
+ !cpu_has_ss)
wbinvd();
list_for_each_entry(f, &a->l, l) {
- if (!a->full_flush)
+ if (!a->full_flush && !cpu_has_ss)
clflush_cache_range((void *)f->addr, PAGE_SIZE);
if (!a->full_flush)
__flush_tlb_one(f->addr);
Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -168,6 +168,7 @@
#define cpu_has_clflush boot_cpu_has(X86_FEATURE_CLFLSH)
#define cpu_has_bts boot_cpu_has(X86_FEATURE_BTS)
#define cpu_has_pat boot_cpu_has(X86_FEATURE_PAT)
+#define cpu_has_ss boot_cpu_has(X86_FEATURE_SELFSNOOP)#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1
Index: linux/arch/x86/mm/pageattr_64.c
===================...
With the infrastructure added for CLFLUSH it is possible
to only TLB flush the actually changed pages in change_page_attr()Take care of old Athlon K7 Errata on the 32bit version
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 15 ++++++++-------
arch/x86/mm/pageattr_64.c | 10 +++++-----
2 files changed, 13 insertions(+), 12 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,19 +97,20 @@ static void flush_kernel_map(void *arg)
struct flush_arg *a = (struct flush_arg *)arg;
struct flush *f;- if (!cpu_has_clflush)
- a->full_flush = 1;
- if (a->full_flush && boot_cpu_data.x86_model >= 4)
+ if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4)
wbinvd();
list_for_each_entry(f, &a->l, l) {
if (!a->full_flush)
clflush_cache_range((void *)f->addr, PAGE_SIZE);
+ if (!a->full_flush)
+ __flush_tlb_one(f->addr);
}- /* Flush all to work around Errata in early athlons regarding
- * large page flushing.
- */
- __flush_tlb_all();
+ /* Handle errata with flushing large pages in early Athlons */
+ if (a->full_flush ||
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 == 7))
+ __flush_tlb_all();
}static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -92,18 +92,18 @@ static void flush_kernel_map(void *arg)
struct flush_arg *a = (struct flush_arg *)arg;
struct flush *f;- if (!cpu_has_clflush)
- a->full_flush = 1;
-
/* When clflush is available always use it because it is
much...
Instead of open coding the bit accesses uses standard style
*PageDeferred* macros.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 14 ++++++++++----
arch/x86/mm/pageattr_64.c | 11 +++++++++--
2 files changed, 19 insertions(+), 6 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -18,6 +18,13 @@ struct flush {
unsigned long addr;
};+#define PG_deferred PG_arch_1
+
+#define PageDeferred(p) test_bit(PG_deferred, &(p)->flags)
+#define SetPageDeferred(p) set_bit(PG_deferred, &(p)->flags)
+#define ClearPageDeferred(p) clear_bit(PG_deferred, &(p)->flags)
+#define TestSetPageDeferred(p) test_and_set_bit(PG_deferred, &(p)->flags)
+
pte_t *lookup_address(unsigned long address, int *level)
{
pgd_t *pgd = pgd_offset_k(address);
@@ -106,7 +113,7 @@ static LIST_HEAD(flush_pages);static inline void save_page(struct page *fpage)
{
- if (!test_and_set_bit(PG_arch_1, &fpage->flags))
+ if (!TestSetPageDeferred(fpage))
list_add(&fpage->lru, &deferred_pages);
}@@ -286,7 +293,7 @@ void global_flush_tlb(void)
list_del(&pg->lru);
if (PageReserved(pg))
continue;
- clear_bit(PG_arch_1, &pg->flags);
+ ClearPageDeferred(pg);
if (page_private(pg) != 0)
continue;
ClearPagePrivate(pg);
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -14,8 +14,14 @@
#include <asm/pgalloc.h>
#include <asm/sections.h>-/* Protected by init_mm.mmap_sem */
-/* Variables protected by cpa_lock */
+#define PG_deferred PG_arch_1
+
+#define PageDeferred(p) test_bit(PG_deferred, &(p)->flags)
+#define SetPageDeferr...
Queue individual data pages for flushing with CLFLUSH in change_page_attr(),
instead of doing global WBINVDs. WBINVD is a very painful operation
for the CPU (can take msecs) and quite slow too. Worse it is not interruptible
and can cause long latencies on hypervisors on older Intel VT systems.CLFLUSH on the other hand only flushes the cache lines that actually need to be
flushed and since it works in smaller chunks is more preemeptible.To do this c_p_a needs to save the address to be flush for global_tlb_flush()
later. This is done using a separate data structure, not struct page,
because page->lru is often used or not there for memory holes.Also the flushes are done in FIFO order now, not LIFO.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 78 ++++++++++++++++++++++++++++++++++------------
arch/x86/mm/pageattr_64.c | 77 ++++++++++++++++++++++++++++++++++-----------
2 files changed, 118 insertions(+), 37 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,6 +13,11 @@
#include <asm/tlbflush.h>
#include <asm/io.h>+struct flush {
+ struct list_head l;
+ unsigned long addr;
+};
+
pte_t *lookup_address(unsigned long address, int *level)
{
pgd_t *pgd = pgd_offset_k(address);
@@ -63,6 +68,11 @@ static struct page *split_large_page(uns
return base;
}+struct flush_arg {
+ int full_flush;
+ struct list_head l;
+};
+
void clflush_cache_range(void *adr, int size)
{
int i;
@@ -72,27 +82,27 @@ void clflush_cache_range(void *adr, intstatic void flush_kernel_map(void *arg)
{
- struct list_head *l = (struct list_head *)arg;
- struct page *pg;
+ struct flush_arg *a = (struct flush_arg *)arg;
+ struct flush *f;
+
+ if (!cpu_has_clflush)
+ a->full_flush = 1;/* When clflu...
Now that debug pagealloc uses a separate function it is better
to change standard change_page_attr back to init_mm semaphore locking like 64bit.
Various things are simpler when sleeping is allowed.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -14,10 +14,9 @@
#include <asm/pgalloc.h>
#include <asm/sections.h>-static DEFINE_SPINLOCK(cpa_lock);
+/* Protected by init_mm.mmap_sem */
static struct list_head df_list = LIST_HEAD_INIT(df_list);-
pte_t *lookup_address(unsigned long address, int *level)
{
pgd_t *pgd = pgd_offset_k(address);
@@ -46,9 +45,7 @@ static struct page *split_large_page(uns
struct page *base;
pte_t *pbase;- spin_unlock_irq(&cpa_lock);
base = alloc_pages(GFP_KERNEL, 0);
- spin_lock_irq(&cpa_lock);
if (!base)
return NULL;@@ -224,15 +221,14 @@ int change_page_attr(struct page *page,
{
int err = 0;
int i;
- unsigned long flags;- spin_lock_irqsave(&cpa_lock, flags);
+ down_write(&init_mm.mmap_sem);
for (i = 0; i < numpages; i++, page++) {
err = __change_page_attr(page, prot);
if (err)
break;
}
- spin_unlock_irqrestore(&cpa_lock, flags);
+ up_write(&init_mm.mmap_sem);
return err;
}@@ -259,9 +255,9 @@ void global_flush_tlb(void)
BUG_ON(irqs_disabled());
- spin_lock_irq(&cpa_lock);
+ down_write(&init_mm.mmap_sem);
list_replace_init(&df_list, &l);
- spin_unlock_irq(&cpa_lock);
+ up_write(&init_mm.mmap_sem);
flush_map(&l);
list_for_each_entry_safe(pg, next, &l, lru) {
list_del(&pg->lru);
--
CONFIG_DEBUG_PAGEALLOC uses change_page_attr to map/unmap mappings for catching
stray kernel mappings. But standard c_p_a() does a lot of unnecessary work for
this simple case with pre-split mappings.Change kernel_map_pages to just access the page table directly which
is simpler and faster.I also fixed it to use INVLPG if available.
This is required for changes to c_p_a() later that make it use kmalloc. Without
this we would risk infinite recursion. Also in general things are easier when
sleeping is allowed.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -274,22 +274,36 @@ void global_flush_tlb(void)
}#ifdef CONFIG_DEBUG_PAGEALLOC
+/* Map or unmap pages in the kernel direct mapping for kernel debugging. */
void kernel_map_pages(struct page *page, int numpages, int enable)
{
+ unsigned long addr;
+ int i;
+
if (PageHighMem(page))
return;
+ addr = (unsigned long)page_address(page);
if (!enable)
- debug_check_no_locks_freed(page_address(page),
- numpages * PAGE_SIZE);
+ debug_check_no_locks_freed((void *)addr, numpages * PAGE_SIZE);
+
+ /* Bootup has forced 4K pages so this is very simple */
+
+ for (i = 0; i < numpages; i++, addr += PAGE_SIZE, page++) {
+ int level;
+ pte_t *pte = lookup_address(addr, &level);- /* the return value is ignored - the calls cannot fail,
- * large pages are disabled at boot time.
- */
- change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0));
- /* we should perform an IPI and flush all tlbs,
- * but that can deadlock->flush only current cpu.
- */
- __flush_tlb_all();
+ BUG_ON(level != 3);
+ if (enable) {
+ se...
Since change_page_attr() is tricky code it is good to have some regression
test code. This patch maps and unmaps some random pages in the direct mapping
at boot and then dumps the state and does some simple sanity checks.Add it with a CONFIG option.
Optional patch, but I find it useful.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/Makefile_32 | 1
arch/x86/mm/Makefile_64 | 1
arch/x86/mm/pageattr-test.c | 233 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 235 insertions(+)Index: linux/arch/x86/mm/Makefile_64
===================================================================
--- linux.orig/arch/x86/mm/Makefile_64
+++ linux/arch/x86/mm/Makefile_64
@@ -7,3 +7,4 @@ obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpag
obj-$(CONFIG_NUMA) += numa_64.o
obj-$(CONFIG_K8_NUMA) += k8topology_64.o
obj-$(CONFIG_ACPI_NUMA) += srat_64.o
+obj-$(CONFIG_CPA_DEBUG) += pageattr-test.o
Index: linux/arch/x86/mm/pageattr-test.c
===================================================================
--- /dev/null
+++ linux/arch/x86/mm/pageattr-test.c
@@ -0,0 +1,233 @@
+/*
+ * self test for change_page_attr.
+ *
+ * Clears the global bit on random pages in the direct mapping, then reverts
+ * and compares page tables forwards and afterwards.
+ */
+
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <asm/cacheflush.h>
+#include <asm/pgtable.h>
+#include <asm/kdebug.h>
+
+enum {
+ NTEST = 400,
+#ifdef CONFIG_X86_64
+ LOWEST_LEVEL = 4,
+ LPS = (1 << PMD_SHIFT),
+#elif defined(CONFIG_X86_PAE)
+ LOWEST_LEVEL = 3,
+ LPS = (1 << PMD_SHIFT),
+#else
+ LOWEST_LEVEL = 3, /* lookup_address lies here */
+ LPS = (1 << 22),
+#endif
+ GPS = (1<<30)
+};
+
+#ifdef CONFIG_X86_64
+#include <asm/proto.h>
+#define max_mapped end_pfn_map
...
Needed for the next change.
And change all the callers.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/fault_32.c | 3 ++-
arch/x86/mm/init_32.c | 3 ++-
arch/x86/mm/pageattr_32.c | 10 +++++++---
arch/x86/mm/pageattr_64.c | 7 +++++--
arch/x86/xen/mmu.c | 9 ++++++---
include/asm-x86/pgtable_32.h | 2 +-
include/asm-x86/pgtable_64.h | 2 +-
7 files changed, 24 insertions(+), 12 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,7 +13,7 @@
#include <asm/tlbflush.h>
#include <asm/io.h>-pte_t *lookup_address(unsigned long address)
+pte_t *lookup_address(unsigned long address, int *level)
{
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
@@ -27,8 +27,10 @@ pte_t *lookup_address(unsigned long addr
pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
return NULL;
+ *level = 3;
if (pmd_large(*pmd))
return (pte_t *)pmd;
+ *level = 4;
pte = pte_offset_kernel(pmd, address);
if (pte && !pte_present(*pte))
pte = NULL;
@@ -129,8 +131,9 @@ __change_page_attr(unsigned long address
pte_t *kpte;
struct page *kpte_page;
pgprot_t ref_prot2;
+ int level;- kpte = lookup_address(address);
+ kpte = lookup_address(address, &level);
if (!kpte) return 0;
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
BUG_ON(PageLRU(kpte_page));
Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -254,7 +254,7 @@ extern struct list_head pgd_list;extern int kern_addr_valid(unsigned long addr);
-pte_t *lookup_address(unsigned long addr);
+pte_t *lookup_address(unsigned long addr, int *level...
Similar to x86-64. This is useful in other situations where we want
the page table dumped too.Besides anything that makes i386 do_page_fault shorter is good.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/fault_32.c | 72 ++++++++++++++++++++++++++-----------------------
1 file changed, 39 insertions(+), 33 deletions(-)Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -28,6 +28,44 @@
#include <asm/desc.h>
#include <asm/segment.h>+void dump_pagetable(unsigned long address)
+{
+ typeof(pte_val(__pte(0))) page;
+
+ page = read_cr3();
+ page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
+#ifdef CONFIG_X86_PAE
+ printk("*pdpt = %016Lx ", page);
+ if ((page >> PAGE_SHIFT) < max_low_pfn
+ && page & _PAGE_PRESENT) {
+ page &= PAGE_MASK;
+ page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
+ & (PTRS_PER_PMD - 1)];
+ printk(KERN_CONT "*pde = %016Lx ", page);
+ page &= ~_PAGE_NX;
+ }
+#else
+ printk("*pde = %08lx ", page);
+#endif
+
+ /*
+ * We must not directly access the pte in the highpte
+ * case if the page table is located in highmem.
+ * And let's rather not kmap-atomic the pte, just in case
+ * it's allocated already.
+ */
+ if ((page >> PAGE_SHIFT) < max_low_pfn
+ && (page & _PAGE_PRESENT)
+ && !(page & _PAGE_PSE)) {
+ page &= PAGE_MASK;
+ page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
+ & (PTRS_PER_PTE - 1)];
+ printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
+ }
+
+ printk("\n");
+}
+
/*
* Page fault error code bits
* bit 0 == 0 means no page found, 1 means protection fault
@@ -574,7 +612,6 @@ no_context:
bust_spinlocks(1);if (oops_may_print()) {
...
The pte_* modifier functions that cleared bits dropped the NX bit on 32bit
PAE because they only worked in int, but NX is in bit 63. Fix that
by adding appropiate casts so that the arithmetic happens as long long
on PAE kernels.I decided to just use 64bit arithmetic instead of open coding like
pte_modify() because gcc should generate good enough code for that now.While this looks in theory like a .24 candidate this might trigger
some subtle latent bugs so it's better to delay it for .25 for more
testing.Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
include/asm-x86/pgtable.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -151,17 +151,17 @@ static inline int pmd_large(pmd_t pte) {
(_PAGE_PSE|_PAGE_PRESENT);
}-static inline pte_t pte_mkclean(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_DIRTY); }
-static inline pte_t pte_mkold(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_ACCESSED); }
-static inline pte_t pte_wrprotect(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_RW); }
-static inline pte_t pte_mkexec(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_NX); }
+static inline pte_t pte_mkclean(pte_t pte) { return __pte(pte_val(pte) & ~(pteval_t)_PAGE_DIRTY); }
+static inline pte_t pte_mkold(pte_t pte) { return __pte(pte_val(pte) & ~(pteval_t)_PAGE_ACCESSED); }
+static inline pte_t pte_wrprotect(pte_t pte) { return __pte(pte_val(pte) & ~(pteval_t)_PAGE_RW); }
+static inline pte_t pte_mkexec(pte_t pte) { return __pte(pte_val(pte) & ~(pteval_t)_PAGE_NX); }
static inline pte_t pte_mkdirty(pte_t pte) { return __pte(pte_val(pte) | _PAGE_DIRTY); }
static inline pte_t pte_mkyoung(pte_t pte) { return __pte(pte_val(pte) | _PAGE_ACCESSED); }
static inline pte_t pte_mkwrite(pte_t p...
64bit already had it.
Needed for later patches.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
include/asm-x86/pgtable.h | 2 ++
include/asm-x86/pgtable_64.h | 2 --
2 files changed, 2 insertions(+), 2 deletions(-)Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -191,6 +191,8 @@ static inline pte_t pte_modify(pte_t pte
return __pte(val);
}+#define pte_pgprot(x) __pgprot(pte_val(x) & (0xfff | _PAGE_NX))
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -120,8 +120,6 @@ static inline void native_pgd_clear(pgd_#define pte_same(a, b) ((a).pte == (b).pte)
-#define pte_pgprot(a) (__pgprot((a).pte & ~PHYSICAL_PAGE_MASK))
-
#endif /* !__ASSEMBLY__ */#define PMD_SIZE (_AC(1,UL) << PMD_SHIFT)
--
Needed for some test code.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
include/asm-x86/pgtable.h | 3 +++
1 file changed, 3 insertions(+)Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -144,6 +144,7 @@ static inline int pte_young(pte_t pte)
static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; }
static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; }
static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_PSE; }
+static inline int pte_global(pte_t pte) { return pte_val(pte) & _PAGE_GLOBAL; }static inline int pmd_large(pmd_t pte) {
return (pmd_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
@@ -159,6 +160,8 @@ static inline pte_t pte_mkyoung(pte_t pt
static inline pte_t pte_mkwrite(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); }
static inline pte_t pte_mkhuge(pte_t pte) { return __pte(pte_val(pte) | _PAGE_PSE); }
static inline pte_t pte_clrhuge(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_PSE); }
+static inline pte_t pte_mkglobal(pte_t pte) { return __pte(pte_val(pte) | _PAGE_GLOBAL); }
+static inline pte_t pte_clrglobal(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_GLOBAL); }extern pteval_t __supported_pte_mask;
--
When CONFIG_DEBUG_RODATA is enabled undo the ro mapping and redo it again.
This gives some simple testing for change_page_attr()Optional patch, but I find it useful.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/Kconfig.debug | 5 +++++
arch/x86/mm/init_32.c | 26 ++++++++++++++++++++++++++
arch/x86/mm/init_64.c | 10 ++++++++++
3 files changed, 41 insertions(+)Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -744,6 +744,16 @@ void mark_rodata_ro(void)
* of who is the culprit.
*/
global_flush_tlb();
+
+#ifdef CONFIG_CPA_DEBUG
+ printk("Testing CPA: undo %lx-%lx\n", start, end);
+ change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL);
+ global_flush_tlb();
+
+ printk("Testing CPA: again\n");
+ change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL_RO);
+ global_flush_tlb();
+#endif
}
#endifIndex: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -822,6 +822,20 @@ void mark_rodata_ro(void)
change_page_attr(virt_to_page(start),
size >> PAGE_SHIFT, PAGE_KERNEL_RX);
printk("Write protecting the kernel text: %luk\n", size >> 10);
+
+#ifdef CONFIG_CPA_DEBUG
+ global_flush_tlb();
+
+ printk("Testing CPA: Reverting %lx-%lx\n", start, start+size);
+ change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
+ PAGE_KERNEL_EXEC);
+ global_flush_tlb();
+
+ printk("Testing CPA: write protecting again\n");
+ change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
+ PAGE_KERNEL_RX);
+ global_flush_tlb();
+#endif
}
#endif
start += size;
@@ -838,6 +852,18 @@ void mark_rodata_ro(void)
* of who is the culprit.
*/
global_flus...
No need to make it 64bit there.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/init_32.c | 4 ++--
include/asm-x86/pgtable.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -353,9 +353,9 @@ static void __init set_highmem_pages_ini
#define set_highmem_pages_init(bad_ppro) do { } while (0)
#endif /* CONFIG_HIGHMEM */-unsigned long long __PAGE_KERNEL = _PAGE_KERNEL;
+pteval_t __PAGE_KERNEL = _PAGE_KERNEL;
EXPORT_SYMBOL(__PAGE_KERNEL);
-unsigned long long __PAGE_KERNEL_EXEC = _PAGE_KERNEL_EXEC;
+pteval_t __PAGE_KERNEL_EXEC = _PAGE_KERNEL_EXEC;#ifdef CONFIG_NUMA
extern void __init remap_numa_kva(void);
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -74,7 +74,7 @@
#define _PAGE_KERNEL (_PAGE_KERNEL_EXEC | _PAGE_NX)#ifndef __ASSEMBLY__
-extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
+extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
#endif /* __ASSEMBLY__ */
#else
#define __PAGE_KERNEL_EXEC \
--
Port the code to check for already split 4K pages at boot over from
32bit to 64bit.Note: should be probably put before PAT patches to avoid bisect failures later
Signed-off-by: Andi Kleen <ak@suse.de>---
arch/x86/mm/pageattr_64.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -160,11 +160,8 @@ __change_page_attr(unsigned long address
} else
BUG();- /* on x86-64 the direct mapping set at boot is not using 4k pages */
- BUG_ON(PageReserved(kpte_page));
-
save_page(kpte_page);
- if (page_private(kpte_page) == 0)
+ if (!PageReserved(kpte_page) && page_private(kpte_page) == 0)
revert_page(address, ref_prot);
return 0;
}
@@ -243,6 +240,8 @@ void global_flush_tlb(void)list_for_each_entry_safe(pg, next, &l, lru) {
list_del(&pg->lru);
+ if (PageReserved(pg))
+ continue;
clear_bit(PG_arch_1, &pg->flags);
if (page_private(pg) != 0)
continue;
--
Similar to 64bit.
Needed by PAT patches. Replaces 5ec5c5a2302ca8794da03f8bedec931a2a814ae9
Note: should probably be put before PAT patches to avoid bisect failures later
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/mm/pageattr_32.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -233,6 +233,21 @@ int change_page_attr(struct page *page,
return err;
}+int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot)
+{
+ int i;
+ unsigned long pfn = (addr >> PAGE_SHIFT);
+ for (i = 0; i < numpages; i++) {
+ if (!pfn_valid(pfn + i)) {
+ break;
+ } else {
+ pte_t *pte = lookup_address(addr + i*PAGE_SIZE);
+ BUG_ON(pte && !pte_none(*pte));
+ }
+ }
+ return change_page_attr(virt_to_page(addr), i, prot);
+}
+
void global_flush_tlb(void)
{
struct list_head l;
--
Undo random white space changes. This reverts ddb53b5735793a19dc17bcd98b050f672f28f1ea
I simply don't have the nerves to port a 20+ patch series to the
reformatted version. And the patch series changes most lines
anyways and drops the trailing white spaces there.And since this was a nop losing it for now isn't a problem.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>---
arch/x86/mm/pageattr_32.c | 149 ++++++++++++++++++++--------------------------
arch/x86/mm/pageattr_64.c | 137 ++++++++++++++++++------------------------
2 files changed, 126 insertions(+), 160 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -1,29 +1,28 @@
-/*
- * Copyright 2002 Andi Kleen, SuSE Labs.
+/*
+ * Copyright 2002 Andi Kleen, SuSE Labs.
* Thanks to Ben LaHaise for precious feedback.
- */
+ */+#include <linux/mm.h>
+#include <linux/sched.h>
#include <linux/highmem.h>
#include <linux/module.h>
-#include <linux/sched.h>
#include <linux/slab.h>
-#include <linux/mm.h>
-
+#include <asm/uaccess.h>
#include <asm/processor.h>
#include <asm/tlbflush.h>
-#include <asm/sections.h>
-#include <asm/uaccess.h>
#include <asm/pgalloc.h>
+#include <asm/sections.h>static DEFINE_SPINLOCK(cpa_lock);
static struct list_head df_list = LIST_HEAD_INIT(df_list);-pte_t *lookup_address(unsigned long address)
-{
+
+pte_t *lookup_address(unsigned long address)
+{
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;
-
if (pgd_none(*pgd))
return NULL;
pud = pud_offset(pgd, address);
@@ -34,22 +33,21 @@ pte_t *lookup_address(unsigned long addr
return NULL;
if (pmd_large(*pmd))
return (pte_t *)pmd;
-
return pte_offset_kernel(pmd, address);
-}
+}-static ...
Note sure what the point of that change was
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jan 15 16:53:24 2008 +0100patches/x86-pat-usable_only_map.patch
x86_64: Map only usable memory in identity map. Reserved memory maps to a
zero page.Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/mm/pageattr_64.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -53,11 +53,9 @@ split_large_page(unsigned long address,
/*
* page_private is used to track the number of entries in
* the page table page have non standard attributes.
- * Count of 1 indicates page split by split_large_page(),
- * additional count indicates the number of pages with non-std attr.
*/
SetPagePrivate(base);
- page_private(base) = 1;
+ page_private(base) = 0;address = __pa(address);
addr = address & LARGE_PAGE_MASK;
@@ -178,8 +176,11 @@ __change_page_attr(unsigned long address
BUG();
}+ /* on x86-64 the direct mapping set at boot is not using 4k pages */
+ BUG_ON(PageReserved(kpte_page));
+
save_page(kpte_page);
- if (page_private(kpte_page) == 1)
+ if (page_private(kpte_page) == 0)
revert_page(address, ref_prot);
return 0;
}
--
Undoes pageattr_32.c parts of
Not sure what the point of that change was anyways.
commit 11c9734cbcf4c5862260442a5d56dd4779799fcc
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jan 15 09:36:03 2008 +0100patches/x86-pat-usable_only_map_i386.patch
i386: Map only usable memory in identity map. Reserved memory maps to a
zero page.Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/mm/pageattr_32.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -55,11 +55,9 @@ split_large_page(unsigned long address,
/*
* page_private is used to track the number of entries in
* the page table page that have non standard attributes.
- * Count of 1 indicates page split by split_large_page(),
- * additional count indicates the number of pages with non-std attr.
*/
SetPagePrivate(base);
- page_private(base) = 1;
+ page_private(base) = 0;address = __pa(address);
addr = address & LARGE_PAGE_MASK;
@@ -205,7 +203,7 @@ static int __change_page_attr(struct pagsave_page(kpte_page);
if (!PageReserved(kpte_page)) {
- if (cpu_has_pse && (page_private(kpte_page) == 1)) {
+ if (cpu_has_pse && (page_private(kpte_page) == 0)) {
paravirt_release_pt(page_to_pfn(kpte_page));
revert_page(kpte_page, address);
}
--
Going to implement this differently
commit 5ec5c5a2302ca8794da03f8bedec931a2a814ae9
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Tue Jan 15 09:36:03 2008 +0100patches/x86-pat-cpa_i386.patch
This makes 32 bit cpa similar to x86_64 and makes it easier for following PA
T
patches.Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/mm/pageattr_32.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -153,12 +153,15 @@ static inline void save_page(struct page
list_add(&kpte_page->lru, &df_list);
}-static int __change_page_attr(unsigned long address, unsigned long pfn,
- pgprot_t prot)
+static int __change_page_attr(struct page *page, pgprot_t prot)
{
struct page *kpte_page;
+ unsigned long address;
pte_t *kpte;+ BUG_ON(PageHighMem(page));
+ address = (unsigned long)page_address(page);
+
kpte = lookup_address(address);
if (!kpte)
return -EINVAL;
@@ -169,7 +172,7 @@ static int __change_page_attr(unsigned lif (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
if (!pte_huge(*kpte)) {
- set_pte_atomic(kpte, pfn_pte(pfn, prot));
+ set_pte_atomic(kpte, mk_pte(page, prot));
} else {
struct page *split;
pgprot_t ref_prot;
@@ -187,7 +190,7 @@ static int __change_page_attr(unsigned l
page_private(kpte_page)++;
} else {
if (!pte_huge(*kpte)) {
- set_pte_atomic(kpte, pfn_pte(pfn, PAGE_KERNEL));
+ set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
} else
@@ -228,15 +231,14 @@ static inline void flush_map(struct list
*
* Caller must call global_flush_tlb() after this.
*/
-int change_page_attr_addr(unsigned long address, int ...
