Re: [git pull] x86 PAT changes

Previous thread: [git pull] scheduler/misc fixes by Ingo Molnar on Thursday, April 24, 2008 - 6:55 pm. (13 messages)

Next thread: [PATCH] AFS: Support the CB.ProbeUuid RPC op by David Howells on Thursday, April 24, 2008 - 6:57 pm. (1 message)
To: Linus Torvalds <torvalds@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>
Date: Thursday, April 24, 2008 - 6:56 pm

Linus, please pull the x86-pat git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat.git for-linus

this adds the second (and final) phase of the x86 PAT changes. Due to
generic impact (the drivers/char/mem.c and include/asm-generic/iomap.h
changes) this is offered as a separate tree.

The new CONFIG_NONPROMISC_DEVMEM defaults to disabled so PAT will be
disabled, to stay compatible and to be a bit cautious/gradual.

Thanks,

Ingo

------------------>
Arjan van de Ven (1):
x86: introduce /dev/mem restrictions with a config option

Ingo Molnar (2):
pat: cleanups
/dev/mem: make promisc the default

Venki Pallipadi (1):
devmem: add range_is_allowed() check to mmap of /dev/mem

venkatesh.pallipadi@intel.com (4):
x86: PAT avoid aliasing in /dev/mem read/write
x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
x86: PAT use reserve free memtype in mmap of /dev/mem
generic: add ioremap_wc() interface wrapper

arch/x86/Kconfig.debug | 11 +++
arch/x86/mm/init_32.c | 19 +++++
arch/x86/mm/init_64.c | 20 +++++
arch/x86/mm/ioremap.c | 29 +++++++
arch/x86/mm/pat.c | 175 +++++++++++++++++++++++++++++++++++++++----
drivers/char/mem.c | 133 +++++++++++++++++++++++++--------
include/asm-generic/iomap.h | 4 +
include/asm-x86/io.h | 8 ++
include/asm-x86/io_32.h | 6 --
include/asm-x86/io_64.h | 6 --
include/asm-x86/page.h | 1 +
include/asm-x86/pgtable.h | 9 ++
12 files changed, 363 insertions(+), 58 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 610aaec..239fd9f 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -5,6 +5,17 @@ config TRACE_IRQFLAGS_SUPPORT

source "lib/Kconfig.debug"

+config NONPROMISC_DEVMEM
+ bool "Disable promiscuous /dev/mem"
+ help
+ The /dev/mem file by default only allows userspace access to PCI
+ space a...

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 9:40 am

Hi Ingo,

When using an 64bit ( didn't got time to test 32bit now ) kernel[1] with PAT enabled , kvm-intel
does not work anymore.

When modprobing kvm-intel , kvm is saying VT extension is disable by BIOS which isn't true.
When disabling PAT again ( no changes to BIOS ) kvm-intel works again here.

Is that an known problem ?

If you need more infos just let me know.

Gabriel

[1] 2.6.25-05096-gb1721d0-dirty
with following patches :
http://lkml.org/lkml/2008/4/26/37
http://lkml.org/lkml/2008/4/26/24
http://lkml.org/lkml/2008/4/25/107

--

To: Gabriel C <nix.or.die@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Arjan van de Ven <arjan@...>, Siddha, Suresh B <suresh.b.siddha@...>, Avi Kivity <avi@...>
Date: Saturday, April 26, 2008 - 10:42 am

thanks - that should be enough for now. We'll try to reproduce these
problems.

A blind guess: maybe it's the CONFIG_NONPROMISC_DEVMEM somehow breaks
Qemu. With the patch below you'd be able to disable NONPROMISC_DEVMEM
without disabling PAT.

Ingo

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4d350b5..4aa4180 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1049,9 +1049,9 @@ config MTRR
See <file:Documentation/mtrr.txt> for more information.

config X86_PAT
- def_bool y
+ bool
prompt "x86 PAT support"
- depends on MTRR && NONPROMISC_DEVMEM
+ depends on MTRR
help
Use PAT attributes to setup page level cache control.

--

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Arjan van de Ven <arjan@...>, Siddha, Suresh B <suresh.b.siddha@...>, Avi Kivity <avi@...>
Date: Saturday, April 26, 2008 - 11:37 am

It seems like the update triggers some compiler bug.

Setting CC_OPTIMIZE_FOR_SIZE=n fixed that btw.

I don't have the time right now but later on today I will compile
gcc43_svn_branch and test again with CC_OPTIMIZE_FOR_SIZE=y and see what I get.

Ingo sorry for the noise.

Gabriel
--

To: Gabriel C <nix.or.die@...>
Cc: Linus Torvalds <torvalds@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Arjan van de Ven <arjan@...>, Siddha, Suresh B <suresh.b.siddha@...>, Avi Kivity <avi@...>
Date: Saturday, April 26, 2008 - 11:41 am

no problem and it's not noise even if it turns out to be a compiler
problem - we very much want to know about them when they affect the
kernel.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Gabriel C <nix.or.die@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Arjan van de Ven <arjan@...>, Siddha, Suresh B <suresh.b.siddha@...>, Avi Kivity <avi@...>
Date: Saturday, April 26, 2008 - 12:43 pm

Indeed. Often these kinds of compiler issues are also actually our bugs,
where some code generation thing just happened to hide the fact that we
did something wrong.

So Gabriel, it would be wondeful if you could try to pinpoint where the
problem happens. That said, especially if you are using something like a
self-compiled gcc from the current SVN tree, that obviously does make it a
*lot* more likely that you are hitting a real compiler bug.

(But even then, if you can pinpoint it, I bet the gcc people would be
really happy to hear about it.)

Linus
--

To: Ingo Molnar <mingo@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>
Date: Friday, April 25, 2008 - 7:43 pm

So why is PAT dependent on NONPROMISC_DEVMEM here?

It seems pointless and wrong. Somebody might want to have both the full
/dev/mem and PAT.

Also, it causes this message for me on one of my machines:

EXT3-fs: mounted filesystem with ordered data mode.
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
mtrr: no more MTRRs available
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
EXT3 FS on dm-0, internal journal
...
eth0: no IPv6 routers present
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xd0000000-0xd0020000
mtrr: no more MTRRs available
Overlap at 0xd0000000-0xe0000000
Overlap at 0xe0300000-0xe0400000
Overlap at 0xe0300000-0xe0380000

which is a bit annoying. Forgotten debug printk, perhaps?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>, H. Peter Anvin <hpa@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 5:57 am

yeah - i already toned it down to KERN_INFO once in commit 28eb559b5 -
but it should be pr_debug() instead. The idea in pat.c is to only print
out if it's a material condition that the user should know about. I've
queued up the patch below.

Ingo

-------->
Subject: x86 PAT: tone down debugging messages
From: Ingo Molnar <mingo@elte.hu>
Date: Sat Apr 26 11:40:31 CEST 2008

turn that into a pr_debug().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-x86.q/arch/x86/mm/pat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pat.c
+++ linux-x86.q/arch/x86/mm/pat.c
@@ -334,7 +334,7 @@ int reserve_memtype(u64 start, u64 end,
break;
}

- printk("Overlap at 0x%Lx-0x%Lx\n",
+ pr_debug("Overlap at 0x%Lx-0x%Lx\n",
saved_ptr->start, saved_ptr->end);
/* No conflict. Go ahead and add this new entry */
list_add(&new_entry->nd, saved_ptr->nd.prev);
--

To: Linus Torvalds <torvalds@...>
Cc: Ingo Molnar <mingo@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Friday, April 25, 2008 - 8:06 pm

The problem is that that can create cached/uncached aliases, which can
cause some processors to lock up (especially AMD is known to have a lot
of errata in this area.)

-hpa
--

To: H. Peter Anvin <hpa@...>
Cc: Ingo Molnar <mingo@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Friday, April 25, 2008 - 9:12 pm

Umm.. I don't think you understand. Right now, NONPROMISC_DEVMEM doesn't
just disable mmap() on /dev/mem, it disables totally regular reads and
writes too. That seems pretty damn excessive.

If it was just mmap(), I don't think it would matter much. I don't think
we traditionally even supported mmap() on real RAM (because the page
counting would get confused), and that actually got supported only thanks
to VM changes that made it possible.

But read/write has always been supported, and shouldn't cause any
cached/uncached aliases!

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: H. Peter Anvin <hpa@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 4:56 am

You are right, there should be no architectural need to make PAT
dependent on nonpromisc-devmem, and thus the patch below should be safe.

In theory even mmap() of /dev/mem should be safe this way - as all
memtypes are properly tracked.

The thinking behind this dependency was three-fold:

- historic: from the days when the PAT patchset didnt do fully correct
tracking yet

- practical: that PAT would be utilized in newer distros on newer
systems - with older distros on older systems not really wanting
(or needing) neither /dev/mem restrictions nor PAT

- paranoia: one less degree of freedom to take into account

Ingo

----------------------->
Subject: x86 PAT: decouple from nonpromisc devmem
From: Ingo Molnar <mingo@elte.hu>
Date: Sat Apr 26 10:26:52 CEST 2008

Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-x86.q/arch/x86/Kconfig
===================================================================
--- linux-x86.q.orig/arch/x86/Kconfig
+++ linux-x86.q/arch/x86/Kconfig
@@ -1042,9 +1042,9 @@ config MTRR
See <file:Documentation/mtrr.txt> for more information.

config X86_PAT
- def_bool y
+ bool
prompt "x86 PAT support"
- depends on MTRR && NONPROMISC_DEVMEM
+ depends on MTRR
help
Use PAT attributes to setup page level cache control.

--

To: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>
Cc: H. Peter Anvin <hpa@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 12:54 pm

Agreed that NONPROMISC_DEVMEM is not really needed for read/write. But,
we will still need it for /dev/mem.

The problem with /dev/mem maps of RAM is situation like this:
1) drivers does vmalloc(), followed by set_memory_uc.
2) User does a /dev/mem map of that vmalloced physical address. User
will get a UC mapping for /dev/mem.
3) driver changes the memory to set_memory_wb and frees the memory.
4) user mapping for this address is still UC which will lead to
aliasing.

Read/write is ok, as they will just use __va for RAM to access and that
will always be consistent.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Ingo Molnar <mingo@...>, H. Peter Anvin <hpa@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 1:15 pm

If so, just disable it unconditionally for mmap.

As mentioned, that's really just a return to original Linux /dev/mmap
semantics: long ago (well, not _that_ long ago) we never used to be able
to mmap() normal kernel memory, because the page counts would get screwed
up on pages that weren't marked PG_Reserved.

So the traditional Linux behavior for mmap() on /dev/mem was always to
only allow it on memory that either had no "struct page *" backing at all,
or that was marked PG_Reserved (ie the ISA hole ay 640k-1M and things like
the BIOS tables etc).

Going back to that doesn't sound horrible.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Ingo Molnar <mingo@...>, H. Peter Anvin <hpa@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 2:32 pm

OK. Below is the quick to disable /dev/mem mmap of RAM with PAT.
This should go along with Ingo's patch that removes PAT dependency on
NONPROMISC_DEVMEM. It makes things safer and eliminates aliasing.
Still somewhat unclean as the range_is_allowed is duplicated.
And also, just compile tested right now.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
arch/x86/mm/pat.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c 2008-04-26 09:34:31.000000000 -0700
+++ linux-2.6/arch/x86/mm/pat.c 2008-04-26 11:25:57.000000000 -0700
@@ -16,6 +16,7 @@
#include <asm/msr.h>
#include <asm/tlbflush.h>
#include <asm/processor.h>
+#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/pat.h>
#include <asm/e820.h>
@@ -477,6 +478,33 @@ pgprot_t phys_mem_access_prot(struct fil
return vma_prot;
}

+#ifdef CONFIG_NONPROMISC_DEVMEM
+/* This check is done in drivers/char/mem.c in case of NONPROMISC_DEVMEM*/
+static inline int range_is_allowed(unsigned long pfn, unsigned long size)
+{
+ return 1;
+}
+#else
+static inline int range_is_allowed(unsigned long pfn, unsigned long size)
+{
+ u64 from = ((u64)pfn) << PAGE_SHIFT;
+ u64 to = from + size;
+ u64 cursor = from;
+
+ while (cursor < to) {
+ if (!devmem_is_allowed(pfn)) {
+ printk(KERN_INFO
+ "Program %s tried to access /dev/mem between %Lx->%Lx.\n",
+ current->comm, from, to);
+ return 0;
+ }
+ cursor += PAGE_SIZE;
+ pfn++;
+ }
+ return 1;
+}
+#endif /* CONFIG_NONPROMISC_DEVMEM */
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -485,6 +513,9 @@ int phys_mem_access_prot_allowed(struct
unsigned long ret_flags;
int retval;

+ if (!range_is_allowed(pfn, size))
+ return ...

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: Linus Torvalds <torvalds@...>, H. Peter Anvin <hpa@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Thomas Gleixner <tglx@...>
Date: Saturday, April 26, 2008 - 3:07 pm

thanks, i've queued up the patch below. I'll do some testing and then
send it to Linus.

Ingo

--------------->
Subject: x86, PAT: disable /dev/mem mmap RAM with PAT
From: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date: Sat, 26 Apr 2008 11:32:12 -0700

disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pat.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

Index: linux-x86.q/arch/x86/mm/pat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pat.c
+++ linux-x86.q/arch/x86/mm/pat.c
@@ -16,6 +16,7 @@
#include <asm/msr.h>
#include <asm/tlbflush.h>
#include <asm/processor.h>
+#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/pat.h>
#include <asm/e820.h>
@@ -477,6 +478,33 @@ pgprot_t phys_mem_access_prot(struct fil
return vma_prot;
}

+#ifdef CONFIG_NONPROMISC_DEVMEM
+/* This check is done in drivers/char/mem.c in case of NONPROMISC_DEVMEM*/
+static inline int range_is_allowed(unsigned long pfn, unsigned long size)
+{
+ return 1;
+}
+#else
+static inline int range_is_allowed(unsigned long pfn, unsigned long size)
+{
+ u64 from = ((u64)pfn) << PAGE_SHIFT;
+ u64 to = from + size;
+ u64 cursor = from;
+
+ while (cursor < to) {
+ if (!devmem_is_allowed(pfn)) {
+ printk(KERN_INFO
+ "Program %s tried to access /dev/mem between %Lx->%Lx.\n",
+ current->comm, from, to);
+ return 0;
+ }
+ cursor += PAGE_SIZE;
+ pfn++;
+ }
+ return 1;
+}
+#endif /* CONFIG_NONPROMISC_DEVMEM */
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -485,6 +513,9 @@ int phys_mem_access_p...

Previous thread: [git pull] scheduler/misc fixes by Ingo Molnar on Thursday, April 24, 2008 - 6:55 pm. (13 messages)

Next thread: [PATCH] AFS: Support the CB.ProbeUuid RPC op by David Howells on Thursday, April 24, 2008 - 6:57 pm. (1 message)