Re: [patch 1/2] x86: track memtype for RAM in page struct - v3

Previous thread: [git pull] PCI fixes by Jesse Barnes on Friday, September 12, 2008 - 5:02 pm. (1 message)

Next thread: [patch 0/2] PAT fix/optimization by Venkatesh Pallipadi on Friday, September 12, 2008 - 5:00 pm. (1 message)
From: Venkatesh Pallipadi
Date: Friday, September 12, 2008 - 5:00 pm

From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: [patch 1/2] x86: track memtype for RAM in page struct

Track the memtype for RAM pages in page struct instead of using the memtype
list. This avoids the explosion in the number of entries in memtype list
(of the order of 20,000 with AGP) and makes the PAT tracking simpler. We are
using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/mm/ioremap.c  |   19 +++++++++++
 arch/x86/mm/pat.c      |   83 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/page.h |    1 
 3 files changed, 103 insertions(+)

Index: tip/arch/x86/mm/ioremap.c
===================================================================
--- tip.orig/arch/x86/mm/ioremap.c	2008-09-12 16:03:33.000000000 -0700
+++ tip/arch/x86/mm/ioremap.c	2008-09-12 16:23:33.000000000 -0700
@@ -102,6 +102,25 @@ int page_is_ram(unsigned long pagenr)
 	return 0;
 }
 
+int pagerange_is_ram(unsigned long start, unsigned long end)
+{
+	int ram_page = 0, not_rampage = 0;
+	unsigned long page_nr;
+
+	for (page_nr = (start >> PAGE_SHIFT); page_nr < (end >> PAGE_SHIFT);
+	     ++page_nr) {
+		if (page_is_ram(page_nr))
+			ram_page = 1;
+		else
+			not_rampage = 1;
+
+		if (ram_page == not_rampage)
+			return -1;
+	}
+
+	return ram_page;
+}
+
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
Index: tip/arch/x86/mm/pat.c
===================================================================
--- tip.orig/arch/x86/mm/pat.c	2008-09-12 16:03:33.000000000 -0700
+++ tip/arch/x86/mm/pat.c	2008-09-12 16:23:33.000000000 -0700
@@ -211,6 +211,75 @@ static struct memtype *cached_entry;
 static u64 cached_start;
 
 /*
+ * RED-PEN:  TODO: Add PageReserved() check aswell here,
+ * once we add SetPageReserved() to all the drivers using
+ * set_memory_* or ...
From: Jeremy Fitzhardinge
Date: Saturday, September 13, 2008 - 10:03 am

Please define PG_arch_1 a proper name so that its easy to tell its being
used just by looking at page-flags.h.

    J
--

From: Ingo Molnar
Date: Sunday, September 14, 2008 - 6:29 am

it should be defined in include/asm-x86/page.h though, not in 
page-flags.h - other architectures are using this flag for other 
purposes.

	Ingo
--

From: Jeremy Fitzhardinge
Date: Sunday, September 14, 2008 - 7:18 am

No, other shared-use flags are all defined in page-flags.h:

enum pageflags {
[...]
	__NR_PAGEFLAGS,

	/* Filesystems */
	PG_checked = PG_owner_priv_1,

	/* XEN */
	PG_pinned = PG_owner_priv_1,
	PG_savepinned = PG_dirty,

	/* SLOB */
	PG_slob_page = PG_active,
	PG_slob_free = PG_private,

	/* SLUB */
	PG_slub_frozen = PG_active,
	PG_slub_debug = PG_error,
};


We could #ifdef CONFIG_X86 just to make it clear we're talking about a
specific X86 usage.  But page-flags.h does seem to have become the
central authority on all struct page flags usage.

    J
--

From: Ingo Molnar
Date: Sunday, September 14, 2008 - 7:22 am

well, in case of the PG_arch_* flags, they are all defined in 
architecture files:

./include/asm-parisc/pgtable.h:#define PG_dcache_dirty         PG_arch_1
./include/asm-mips/cacheflush.h:#define PG_dcache_dirty			PG_arch_1
./arch/sparc64/mm/init.c:#define PG_dcache_dirty		PG_arch_1
./arch/sh/include/cpu-sh4/cpu/cacheflush.h:#define PG_mapped	PG_arch_1
./arch/sh/include/cpu-sh3/cpu/cacheflush.h:#define PG_mapped	PG_arch_1
./arch/arm/include/asm/cacheflush.h:#define PG_dcache_dirty PG_arch_1

they are explicitly reserved for per architecture details.

	Ingo
--

From: Venki Pallipadi
Date: Tuesday, September 23, 2008 - 2:46 pm

Below is the updated patch.

Thanks,
Venki

x86: track memtype for RAM in page struct

From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86: track memtype for RAM in page struct

Track the memtype for RAM pages in page struct instead of using the memtype
list. This avoids the explosion in the number of entries in memtype list
(of the order of 20,000 with AGP) and makes the PAT tracking simpler. We are
using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/mm/ioremap.c        |   19 +++++++++
 arch/x86/mm/pat.c            |   83 +++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/cacheflush.h |    1 
 include/asm-x86/page.h       |    1 
 4 files changed, 104 insertions(+)

Index: tip/arch/x86/mm/ioremap.c
===================================================================
--- tip.orig/arch/x86/mm/ioremap.c	2008-09-23 13:31:31.000000000 -0700
+++ tip/arch/x86/mm/ioremap.c	2008-09-23 13:32:26.000000000 -0700
@@ -102,6 +102,25 @@ int page_is_ram(unsigned long pagenr)
 	return 0;
 }
 
+int pagerange_is_ram(unsigned long start, unsigned long end)
+{
+	int ram_page = 0, not_rampage = 0;
+	unsigned long page_nr;
+
+	for (page_nr = (start >> PAGE_SHIFT); page_nr < (end >> PAGE_SHIFT);
+	     ++page_nr) {
+		if (page_is_ram(page_nr))
+			ram_page = 1;
+		else
+			not_rampage = 1;
+
+		if (ram_page == not_rampage)
+			return -1;
+	}
+
+	return ram_page;
+}
+
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
Index: tip/arch/x86/mm/pat.c
===================================================================
--- tip.orig/arch/x86/mm/pat.c	2008-09-23 13:31:31.000000000 -0700
+++ tip/arch/x86/mm/pat.c	2008-09-23 13:32:26.000000000 -0700
@@ -211,6 +211,75 @@ static struct memtype *cached_entry;
 static u64 cached_start;
 
 /*
+ * ...
From: Christoph Lameter
Date: Tuesday, September 23, 2008 - 2:59 pm

I'd suggest to avoid the open coding of page flag handling using set bit etc.

It may be best to slowly migrate the arch specific definitions into
page-flags.h. Otherwise we wont have a central authority on page flags.

--

From: Venki Pallipadi
Date: Wednesday, September 24, 2008 - 8:53 am

OK. Below is the updated patch that does not use set/clear directly on page
flag.

x86: track memtype for RAM in page struct

From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86: track memtype for RAM in page struct

Track the memtype for RAM pages in page struct instead of using the memtype
list. This avoids the explosion in the number of entries in memtype list
(of the order of 20,000 with AGP) and makes the PAT tracking simpler. We are
using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/mm/ioremap.c        |   19 +++++++++
 arch/x86/mm/pat.c            |   83 +++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/cacheflush.h |    2 +
 include/asm-x86/page.h       |    1 
 4 files changed, 105 insertions(+)

Index: tip/arch/x86/mm/ioremap.c
===================================================================
--- tip.orig/arch/x86/mm/ioremap.c	2008-09-23 13:31:31.000000000 -0700
+++ tip/arch/x86/mm/ioremap.c	2008-09-24 08:39:29.000000000 -0700
@@ -102,6 +102,25 @@ int page_is_ram(unsigned long pagenr)
 	return 0;
 }
 
+int pagerange_is_ram(unsigned long start, unsigned long end)
+{
+	int ram_page = 0, not_rampage = 0;
+	unsigned long page_nr;
+
+	for (page_nr = (start >> PAGE_SHIFT); page_nr < (end >> PAGE_SHIFT);
+	     ++page_nr) {
+		if (page_is_ram(page_nr))
+			ram_page = 1;
+		else
+			not_rampage = 1;
+
+		if (ram_page == not_rampage)
+			return -1;
+	}
+
+	return ram_page;
+}
+
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
Index: tip/arch/x86/mm/pat.c
===================================================================
--- tip.orig/arch/x86/mm/pat.c	2008-09-23 13:31:31.000000000 -0700
+++ tip/arch/x86/mm/pat.c	2008-09-24 08:39:27.000000000 -0700
@@ -211,6 +211,75 @@ static struct memtype *cached_entry;
 ...
From: Ingo Molnar
Date: Saturday, September 27, 2008 - 10:59 am

applied to tip/x86/pat, thanks! Nice patch.

	Ingo
--

From: Nick Piggin
Date: Tuesday, September 30, 2008 - 12:28 am

I'd rather we didn't add any more uses of PageReserved without a
really good reason.

At this point in time (or at least last time I looked, a year or
two ago), it isn't a whole lot of work to remove PG_reserved
completely. If the waters get muddied up again, it could require
another significant rework to remove it in future...

Thanks,
Nick
--

From: Ingo Molnar
Date: Tuesday, September 30, 2008 - 4:21 am

agreed.

	Ingo
--

From: Suresh Siddha
Date: Tuesday, September 30, 2008 - 2:14 pm

If a driver by mistake free's a RAM page before changing its memory attribute
back to WB, we want the generic -mm to catch it.

Today, free_pages_check() prevents freeing the page with PageReserved set. We
want to use this and make sure that either set_page_uc/wc() or the driver
calling these API's set the PageReserved bit. There are already some drivers
which do SetPageReserved() before changing the attribute.

I don't know the history behind PageReserved. But is there a recommended
way to achieve what we want? Either we need to use PageReserved bit or  add
some arch specific checks (in the x86 case, check arch_1) in free_pages_check().
Right?

thanks,
suresh
--

From: Ingo Molnar
Date: Wednesday, October 1, 2008 - 2:29 am

no, the generic -mm does not 'catch' PageReserved, it simply _ignores_ 
it. So i agree about having a debug check there, it's just that what you 
propose does not achieve that.

PageReserved is a legacy thing that should not be used in new code.  

the best way i think would be to add arch_1 to PAGE_FLAGS_CHECK_AT_FREE. 
That way the mm becomes very noisy if this is ever freed.

Nick, Andrew, any preferences?

	Ingo
--

From: Nick Piggin
Date: Wednesday, October 1, 2008 - 7:27 pm

It would be better if another flag is used, yes.

Setting PageReserved used to prevent put_page from decrementing the page's
refcount. Most drivers still do it just because we kept that code in there
to catch any bugs caused by the removal of PageReserved check from the
refcounting.
--

From: Frans Pop
Date: Saturday, September 13, 2008 - 10:24 am

Corrections for a few typos and some suggestions for minor improvements
in the comment.

Cheers,
FJP




"one (driver)"

--

From: Ingo Molnar
Date: Sunday, September 14, 2008 - 7:12 am

setting it PageReserved is a hack. We should set it PG_arch_1 and extend 
the page allocator to emit a kernel warning if a PG_arch_1 page is 
freed.

	Ingo
--

From: Pallipadi, Venkatesh
Date: Tuesday, September 23, 2008 - 2:48 pm

Marking PageReserved will take care of all the places where Reserved is being checked,
like swap etc. And we don't have to add checks for extra bit in architecture independent
code.

Thanks,
Venki
--

Previous thread: [git pull] PCI fixes by Jesse Barnes on Friday, September 12, 2008 - 5:02 pm. (1 message)

Next thread: [patch 0/2] PAT fix/optimization by Venkatesh Pallipadi on Friday, September 12, 2008 - 5:00 pm. (1 message)