Re: [PATCH] [5/7] Readd rdmsrl_safe

Previous thread: [2.6.25-rc5-mm1] BUG() at mnt_want_write(). by penguin-kernel on Tuesday, March 11, 2008 - 9:37 pm. (10 messages)

Next thread: How to compile kernel so it automatically reboots upon a crash instead of generating stack trace and then hang. by Prakhar Krishna on Wednesday, March 12, 2008 - 12:52 am. (5 messages)
To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

[Old patch; repost, but needed for further patches in the series]

Even on 32bit 2MB pages can map more memory than is in the true
max_low_pfn if end_pfn is not highmem and not aligned to 2MB.
Add a end_pfn_map similar to x86-64 that accounts for this
fact. This is important for code that really needs to know about
all mapping aliases.

Signed-off-by: Andi Kleen <ak@suse.de>

---
arch/x86/mm/init_32.c | 4 ++++
include/asm-x86/page.h | 4 +++-
include/asm-x86/page_64.h | 1 -
3 files changed, 7 insertions(+), 2 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -50,6 +50,8 @@

unsigned int __VMALLOC_RESERVE = 128 << 20;

+unsigned long end_pfn_map;
+
DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
unsigned long highstart_pfn, highend_pfn;

@@ -193,6 +195,7 @@ static void __init kernel_physical_mappi
set_pmd(pmd, pfn_pmd(pfn, prot));

pfn += PTRS_PER_PTE;
+ end_pfn_map = pfn;
continue;
}
pte = one_page_table_init(pmd);
@@ -207,6 +210,7 @@ static void __init kernel_physical_mappi

set_pte(pte, pfn_pte(pfn, prot));
}
+ end_pfn_map = pfn;
}
}
}
Index: linux/include/asm-x86/page.h
===================================================================
--- linux.orig/include/asm-x86/page.h
+++ linux/include/asm-x86/page.h
@@ -36,7 +36,7 @@
#define max_pfn_mapped end_pfn_map
#else
#include <asm/page_32.h>
-#define max_pfn_mapped max_low_pfn
+#define max_pfn_mapped end_pfn_map
#endif /* CONFIG_X86_64 */

#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
@@ -51,6 +51,8 @@
extern int page_is_ram(unsigned long pagenr);
extern int devmem_is_allowed(unsigned long pagenr);

+extern unsigned long end_pfn_map;
+
struct page;

static void inline clear_user_page(void *page, unsigned long vaddr,
Index: linux/include/asm-x86/page_...

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

Add information about the mapping state of the direct mapping to
/proc/meminfo.

This way we can see how many large pages are really used for it and how
many are split.

Useful for debugging and general insight into the kernel.

v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
32bit doesn't need it because it doesn't support hotadd for lowmem.
Fix some typos

Signed-off-by: Andi Kleen <ak@suse.de>

---
arch/x86/mm/init_32.c | 2 ++
arch/x86/mm/init_64.c | 2 ++
arch/x86/mm/pageattr.c | 24 ++++++++++++++++++++++++
fs/proc/proc_misc.c | 7 +++++++
include/asm-x86/pgtable.h | 3 +++
5 files changed, 38 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -319,6 +319,8 @@ __meminit void early_iounmap(void *addr,
static unsigned long __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
+ unsigned long flags;
+ unsigned pages = 0;
int i = pmd_index(address);

for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
@@ -335,9 +337,15 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
if (pmd_val(*pmd))
continue;

+ pages++;
set_pte((pte_t *)pmd,
pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
}
+
+ /* Protect against CPA */
+ spin_lock_irqsave(&pgd_lock, flags);
+ dpages_cnt[PG_LEVEL_2M] += pages;
+ spin_unlock_irqrestore(&pgd_lock, flags);
return address;
}

@@ -356,6 +364,8 @@ phys_pmd_update(pud_t *pud, unsigned lon
static unsigned long __meminit
phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
{
+ unsigned long flags;
+ unsigned pages = 0;
unsigned long true_end = end;
int i = pud_index(addr);

@@ -380,6 +390,7 @@ phys_pud_init(pud_t *pud_page, unsigned
}

if (direct_gbpages) {
+ dpages_cnt[PG_LEVEL_1G]++;
set_pte((pte_t *)pud,
...

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:41 pm

Please make the update a debugfs conditional function in the CPA
code. That way it can be compile out and the statistic internals are

Can we have some intuitive name for that like direct_pages_stats, so

debugfs please

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:55 pm

But it's not debugging. Or are you saying all memory state information
in /proc/meminfo is debugging information? I would say it is generic
memory state tracking, similar to VmallocUsed or Slab: or NFS_Unstable:
e.g. I would expect a machine to go slower if all direct mapping pages
are split due to increase TLB misses and that information is generally
useful and should not be forced into some debugging ghetto.

-Andi

--

To: Andi Kleen <andi@...>
Cc: Thomas Gleixner <tglx@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Saturday, March 22, 2008 - 5:50 am

[I didn't switch to debugfs because I strongly disagreed with that
suggestion. But all the other points you made are addressed.]

---

CPA: Add statistics about state of direct mapping v3

Add information about the mapping state of the direct mapping to
/proc/meminfo. I chose /proc/meminfo because that is where all the other
memory statistics are too and it is a generally useful metric even
outside debugging situations. A lot of split kernel pages means the
kernel will run slower.

This way we can see how many large pages are really used for it and how
many are split.

Useful for general insight into the kernel.

v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
32bit doesn't need it because it doesn't support hotadd for lowmem.
Fix some typos
v3: Rename dpages_cnt
Add CONFIG ifdef for count update as requested by tglx
Expand description

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <andi@firstfloor.org>

---
arch/x86/mm/init_32.c | 2 ++
arch/x86/mm/init_64.c | 2 ++
arch/x86/mm/pageattr.c | 24 ++++++++++++++++++++++++
fs/proc/proc_misc.c | 7 +++++++
include/asm-x86/pgtable.h | 3 +++
5 files changed, 38 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -316,9 +316,22 @@ __meminit void early_iounmap(void *addr,
__flush_tlb_all();
}

+static void update_page_count(int level, unsigned long pages)
+{
+#ifdef CONFIG_PROC_FS
+ unsigned long flags;
+ /* Protect against CPA */
+ spin_lock_irqsave(&pgd_lock, flags);
+ direct_pages_count[PG_LEVEL_2M] += pages;
+ spin_unlock_irqrestore(&pgd_lock, flags);
+#endif
+}
+
static unsigned long __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
+ unsigned long pages = 0;
+
int i = pmd_index(address);

for (; i < PTRS_...

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 11:40 am

Also I asked to move the update function to pageattr.c where we have
the variable, so we dont need to make it global and can change the

update via function call with pages = 0 ?

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 12:14 pm

Didn't get that one. Can you clarify?

-Andi
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 12:16 pm

You have the direct increment and the function call in
phys_pud_init. The function is always called with pages=0 because
nothing ever increments pages.

Thanks,
tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 1:01 pm

All comments addressed now hopefully.

-Andi

---

CPA: Add statistics about state of direct mapping v4

Add information about the mapping state of the direct mapping to
/proc/meminfo. I chose /proc/meminfo because that is where all the other
memory statistics are too and it is a generally useful metric even
outside debugging situations. A lot of split kernel pages means the
kernel will run slower.

This way we can see how many large pages are really used for it and how
many are split.

Useful for general insight into the kernel.

v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
32bit doesn't need it because it doesn't support hotadd for lowmem.
Fix some typos
v3: Rename dpages_cnt
Add CONFIG ifdef for count update as requested by tglx
Expand description
v4: Fix stupid bugs added in v3
Move update_page_count to pageattr.c

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <andi@firstfloor.org>

---
arch/x86/mm/init_32.c | 2 ++
arch/x86/mm/init_64.c | 2 ++
arch/x86/mm/pageattr.c | 24 ++++++++++++++++++++++++
fs/proc/proc_misc.c | 7 +++++++
include/asm-x86/pgtable.h | 3 +++
5 files changed, 38 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -319,6 +319,8 @@ __meminit void early_iounmap(void *addr,
static unsigned long __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
+ unsigned long pages = 0;
+
int i = pmd_index(address);

for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
@@ -335,9 +337,11 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
if (pmd_val(*pmd))
continue;

+ pages++;
set_pte((pte_t *)pmd,
pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
}
+ update_page_count(PG_LEVEL_2M, pages);
return address;
}

@@ -356,6 +360,7 @@ ...

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.

There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.

Idea for this came from an earlier patch by Andreas Herrmann

On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)

[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%

=> About 3.2% improvement in elapsed time for kernbench.

With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages. I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case

That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)

I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.

Signed-off-by: Andi Kleen <ak@suse.de>

---
arch/x86/kernel/setup_64.c | 13 +++++++++++++
include/asm-x86/msr-index.h | 1 +
2 files changed, 14 insertions(+)

Index: linux/arch/x86/kernel/setup_64.c
===================================================================
--- linux.orig/arch/x86/kernel/setu...

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 12:44 pm

warning: passing argument 2 of 'rdmsrl_safe' from incompatible pointer type

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 12:54 pm

Nothing obvious. We could add it, but then would need to add a (imho ugly)
vendor check there first.

Yes the type has to be updated after the earlier inline change.
Easiest you just do the trivial change yourself.

-Andi

--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:55 pm

Please keep 32bit and 64bit in sync. The setup code has been merged
and we don't want to have needless separation.

Thanks,
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 7:56 am

--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

RDMSR for 64bit values with exception handling.

Makes it easier to deal with 64bit valued MSRs. The old 64bit code
base had that too as checking_rdmsrl(), but it got dropped somehow.

Needed for followup patch.

Signed-off-by: Andi Kleen <ak@suse.de>

---
include/asm-x86/msr.h | 3 +++
include/asm-x86/paravirt.h | 4 ++++
2 files changed, 7 insertions(+)

Index: linux/include/asm-x86/msr.h
===================================================================
--- linux.orig/include/asm-x86/msr.h
+++ linux/include/asm-x86/msr.h
@@ -150,6 +150,9 @@ static inline int wrmsr_safe(unsigned ms
__err; \
})

+#define rdmsrl_safe(msr,p) \
+ ({ int __err; *(p) = native_read_msr_safe(msr, &__err); __err; })
+
#define rdtscl(low) \
((low) = (u32)native_read_tsc())

Index: linux/include/asm-x86/paravirt.h
===================================================================
--- linux.orig/include/asm-x86/paravirt.h
+++ linux/include/asm-x86/paravirt.h
@@ -687,6 +687,10 @@ static inline int paravirt_write_msr(uns
(*b) = _l >> 32; \
_err; })

+#define rdmsrl_safe(msr, p) ({ \
+ int _err; \
+ *(p) = paravirt_read_msr(msr, &_err); \
+ _err; })

static inline u64 paravirt_read_tsc(void)
{
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:06 pm

ditto

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Saturday, March 22, 2008 - 5:59 am

Readd rdmsrl_safe v2

RDMSR for 64bit values with exception handling.

Makes it easier to deal with 64bit valued MSRs. The old 64bit code
base had that too as checking_rdmsrl(), but it got dropped somehow.

Needed for followup patch.

v2: switch to inline

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <andi@firstfloor.org>

---
include/asm-x86/msr.h | 3 +++
include/asm-x86/paravirt.h | 4 ++++
2 files changed, 7 insertions(+)

Index: linux/include/asm-x86/msr.h
===================================================================
--- linux.orig/include/asm-x86/msr.h
+++ linux/include/asm-x86/msr.h
@@ -150,6 +150,13 @@ static inline int wrmsr_safe(unsigned ms
__err; \
})

+static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
+{
+ int err;
+ *p = native_read_msr_safe(msr, &err);
+ return err;
+}
+
#define rdtscl(low) \
((low) = (u32)native_read_tsc())

Index: linux/include/asm-x86/paravirt.h
===================================================================
--- linux.orig/include/asm-x86/paravirt.h
+++ linux/include/asm-x86/paravirt.h
@@ -687,6 +687,12 @@ static inline int paravirt_write_msr(uns
(*b) = _l >> 32; \
_err; })

+static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
+{
+ int err;
+ *p = paravirt_read_msr(msr, &err);
+ return err;
+}

static inline u64 paravirt_read_tsc(void)
{
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:16 pm

Well all of paravirt.h uses macros. I did the same for consistency. If you want
inlines it would be better to just convert it all in one go (but please only
after this patch was applied)

-Andi
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:58 pm

Of course. We don't want to burden work on your shoulders.

Thanks

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:06 pm

At least I consider it clean and consistent to use similar

Well you signed up for the work yourself last year ;-) It was your choice.

-Andi
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:14 pm

inlines are generally preferred and the existing macro pile is no

Yeah, I signed up for maintaining. And review is one of the tasks of a
maintainer. So when I do a review and ask for a macro -> inline change
I don't see that this means that I have to do the change myself.

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:46 pm

Never mind I retract the clean up patch. While doing the actual
reordering is not that difficult I would need to retest and I don't
have time for that now. And since reordering tends to requiring
typing in everything again and I make occasional typos (and I added
bugs in the past during such reordering exercises) the retesting
would be needed. But for that clean up it is probably not worth it.
After all it just makes the code a little nicer, but doesn't fix
or improve anything.

Please still consider the other patches in the serioes.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Thomas Gleixner <tglx@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:48 pm

Oops that ended up in the wrong thread sorry. The comment
was regarding the early exception recursion -> iteration change,
not for rdmsrl_safe (which is not a clean up)

-Andi
--

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.

On AMD doing so is also a good idea.

The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.

On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.

Signed-off-by: Andi Kleen <ak@suse.de>

---
arch/x86/kernel/bugs_64.c | 12 ++++++++++++
arch/x86/mm/init_32.c | 6 +++++-
2 files changed, 17 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/bugs_64.c
===================================================================
--- linux.orig/arch/x86/kernel/bugs_64.c
+++ linux/arch/x86/kernel/bugs_64.c
@@ -9,6 +9,7 @@
#include <asm/bugs.h>
#include <asm/processor.h>
#include <asm/mtrr.h>
+#include <asm/cacheflush.h>

void __init check_bugs(void)
{
@@ -18,4 +19,15 @@ void __init check_bugs(void)
print_cpu_info(&boot_cpu_data);
#endif
alternative_instructions();
+
+ /*
+ * Make sure the first 2MB area is not mapped by huge pages
+ * There are typically fixed size MTRRs in there and overlapping
+ * MTRRs into large pages causes slow downs.
+ *
+ * Right now we don't do that with gbpages because there seems
+ * very little benefit for that case.
+ */
+ if (!direct_gbpages)
+ set_memory_4k((unsigned long)__va(0), 1);
}
Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -181,8 +181,13 @@ static void __init kernel_physical_mappi
/*
* Map with big pages if possible, otherwise
* create normal page tables:
+ *
+ * Don't use a...

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:45 pm

And why exactly ? Does slowdown not matter with gbpages ?

Also we split the first GB mapping anyway due to the various regions
(NX, RO, UC) in there.

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 1:59 pm

I didn't think so unless you have DEBUG_RODATA enabled? Also there
should be no UC region there as known by the kernel. There might
be a WC region there from the frame buffer code, but that is an MTRR,
not a pageattr.

-Andi
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:03 pm

NX is independent of DEBUG_RODATA and the RODATA protection should be

The first ioremap of the PCI space splits the GB page as well.

Thanks,

tglx
--

To: Thomas Gleixner <tglx@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, March 21, 2008 - 2:44 pm

Sure, but it is still not split by default. At least I don't see any code

Requiring hundreds instead of two TLB entries for the kernel text?

I must say I personally cannot ever remember any bug caught by RODATA

PCI space is normally in the fourth or sometimes third GB page,
not in the first. I am not aware of any system that has the PCI hole
in the first GB.

-Andi

--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 1:38 am

Should we then change CONFIG_PHYSICAL_START from 0x100000 to 0x400000 ?

Thank you
--

To: Eric Dumazet <dada1@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 5:19 am

> Should we then change CONFIG_PHYSICAL_START from 0x100000 to 0x400000 ?

Yes that would be probably a good idea. This means for PAE and 64bit
kernels 2MB is ok too, for i386 non PAE 4MB. The SUSE 64bit kernels have been
using that for quite some time.

-Andi
--

To: Andi Kleen <andi@...>
Cc: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 7:31 am

This should especially boost performance on 32 bit. Have you numbers for

--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--

To: Joerg Roedel <joerg.roedel@...>
Cc: Andi Kleen <andi@...>, <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 25, 2008 - 7:39 am

No numbers for 32bit no.

-Andi
--

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

Add a new function to force split large pages into 4k pages.
This is needed for some followup optimizations.

I had to add a new field to cpa_data to pass down the information
that try_preserve_large_page should not run.

Right now no set_page_4k() because I didn't need it and all the
specialized users I have in mind would be more comfortable with
pure addresses. I also didn't export it because it's unlikely
external code needs it.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/x86/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr.c
+++ linux/arch/x86/mm/pageattr.c
@@ -28,6 +28,7 @@ struct cpa_data {
int numpages;
int flushtlb;
unsigned long pfn;
+ unsigned force_split : 1;
};

#ifdef CONFIG_X86_64
@@ -259,6 +260,9 @@ try_preserve_large_page(pte_t *kpte, uns
int i, do_split = 1;
unsigned int level;

+ if (cpa->force_split)
+ return 1;
+
spin_lock_irqsave(&pgd_lock, flags);
/*
* Check for races, another CPU might have split this page
@@ -693,7 +697,8 @@ static inline int cache_attr(pgprot_t at
}

static int change_page_attr_set_clr(unsigned long addr, int numpages,
- pgprot_t mask_set, pgprot_t mask_clr)
+ pgprot_t mask_set, pgprot_t mask_clr,
+ int force_split)
{
struct cpa_data cpa;
int ret, cache, checkalias;
@@ -704,7 +709,7 @@ static int change_page_attr_set_clr(unsi
*/
mask_set = canon_pgprot(mask_set);
mask_clr = canon_pgprot(mask_clr);
- if (!pgprot_val(mask_set) && !pgprot_val(mask_clr))
+ if (!pgprot_val(mask_set) && !pgprot_val(mask_clr) && !force_split)
return 0;

/* Ensure we are PAGE_SIZE aligned */
@@ -721,6 +726,7 @@ static int change_page_attr_set_clr(unsi
cpa.mask_set = mask_set;
cpa.mask_clr = mask_clr;
cpa.flushtlb = 0;
+ cpa.force_split = force_split;

/* No alias checking for _NX bit modifications */
checkalias = (pgprot_val(mask_set) | pgp...

To: <andreas.herrmann3@...>, <tglx@...>, <mingo@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 10:53 pm

[old patch repost, needed for further patches in the series)

When end_pfn is not aligned to 2MB (or 1GB) then the kernel might
map more memory than end_pfn. Account this in end_pfn_mapped.

Signed-off-by: Andi Kleen <ak@suse.de>

---
arch/x86/kernel/setup_64.c | 2 +-
arch/x86/mm/init_64.c | 33 +++++++++++++++++++++++----------
include/asm-x86/proto.h | 3 ++-
3 files changed, 26 insertions(+), 12 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -296,7 +296,7 @@ __meminit void early_iounmap(void *addr,
__flush_tlb_all();
}

-static void __meminit
+static unsigned long __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
int i = pmd_index(address);
@@ -318,21 +318,25 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
set_pte((pte_t *)pmd,
pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
}
+ return address;
}

-static void __meminit
+static unsigned long __meminit
phys_pmd_update(pud_t *pud, unsigned long address, unsigned long end)
{
+ unsigned long true_end;
pmd_t *pmd = pmd_offset(pud, 0);
spin_lock(&init_mm.page_table_lock);
- phys_pmd_init(pmd, address, end);
+ true_end = phys_pmd_init(pmd, address, end);
spin_unlock(&init_mm.page_table_lock);
__flush_tlb_all();
+ return true_end;
}

-static void __meminit
+static unsigned long __meminit
phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
{
+ unsigned long true_end = end;
int i = pud_index(addr);

for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE) {
@@ -351,13 +355,14 @@ phys_pud_init(pud_t *pud_page, unsigned

if (pud_val(*pud)) {
if (!pud_large(*pud))
- phys_pmd_update(pud, addr, end);
+ true_end = phys_pmd_update(pud, addr, end);
continue;
}

if (direct_gbpages) {
set_pte((pte_t *)...

Previous thread: [2.6.25-rc5-mm1] BUG() at mnt_want_write(). by penguin-kernel on Tuesday, March 11, 2008 - 9:37 pm. (10 messages)

Next thread: How to compile kernel so it automatically reboots upon a crash instead of generating stack trace and then hang. by Prakhar Krishna on Wednesday, March 12, 2008 - 12:52 am. (5 messages)