Re: AGP and PAT (induced?) problem (on AMD family 6)

Previous thread: [PATCH 04/10] introduce generic iommu_num_pages function by Joerg Roedel on Monday, August 4, 2008 - 9:04 am. (19 messages)

Next thread: [PATCH] AMD IOMMU: initialize dma_ops after sysfs registration by Joerg Roedel on Monday, August 4, 2008 - 9:35 am. (1 message)
From: Rene Herman
Date: Monday, August 4, 2008 - 9:30 am

Hi Dave.

A while ago I sent a message about long AGP delays upon starting and 
exiting X:

http://marc.info/?l=linux-kernel&m=121647129632110&w=2

There was no reply (if that was due to the linux.ie address, could you 
perhaps update it in MAINTAINERS?) but today Shaohua Li posted a patch 
that made me wonder about PAT in this context:

http://marc.info/?l=linux-kernel&m=121783222306075&w=2
http://marc.info/?l=linux-kernel&m=121783222406078&w=2
http://marc.info/?l=linux-kernel&m=121783222406081&w=2

His patch does not solve anything appreciable for me -- the delays are 
still as described in that previous post, with an exception for (with 
Option "AGSize" "64") delays upon exiting X that are now sometimes as 
bad as a full 12 seconds.

What _does_ solve this though is booting with the "nopat" command line 
parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself. 
With "nopat", there's no problem to be seen anymore -- exiting X 
specifically is instantaneous.

With or without PAT, my /proc/mtrr is always:

reg00: base=0x00000000 (   0MB), size= 512MB: write-back, count=1
reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
reg02: base=0xe8000000 (3712MB), size=  64MB: write-combining, count=1

under X joined by:

reg03: base=0xe4000000 (3648MB), size=  32MB: write-combining, count=2

This is a machine with 768M, the AGP aperture set to 64MB and a 32MB 
Matrox Millenium G550 AGP card. More detail in previous post.

Is this something inherent to PAT? Inherent to PAT on AMD family 6? 
Inherent to DRM/AGP with PAT? On AMD family 6?

This is probably fairly important to get sorted because although I don't 
know what's where at the moment, last I saw was a patch in x86/tip that 
enabled PAT on many more models including all of AMD.

For reference, /proc/cpuinfo:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 7
model name	: AMD Duron(tm) Processor
stepping	: 1
cpu MHz		: 1313.094
cache size	: 64 KB
fdiv_bug	: ...
From: Andreas Herrmann
Date: Wednesday, August 6, 2008 - 6:51 am

To get some more debug data, can you please retest with latest kernel
(2.6.27-rc2) using "debugpat" kernel option and provide dmesg
output plus contents of <debugfs>/x86/pat_memtype_list?


Thanks,

Andreas


--

From: Rene Herman
Date: Wednesday, August 6, 2008 - 1:57 pm

No... my kernel message buffer isn't large enough for that :-(

Right, I guess I now know where the delay is coming from. I suppose this 
is not expected. dmesg as captured after starting X and without 
"debugpat" at:

http://members.home.nl/rene.herman/pat/dmesg.x

Truncated dmesg with "debugpat":


Before starting X (1K):

http://members.home.nl/rene.herman/pat/pat_memtype_list.console.debugpat

After starting X (625K):

http://members.home.nl/rene.herman/pat/pat_memtype_list.x.debugpat

(This is with 64MB AGP memory)

More data:

http://members.home.nl/rene.herman/pat/config-2.6.27-rc2-current
http://members.home.nl/rene.herman/pat/xorg.conf
http://members.home.nl/rene.herman/pat/Xorg.0.log

Thanks,
Rene
--

From: Rene Herman
Date: Monday, August 11, 2008 - 2:46 am

<ping>

Please note this a 2.6.27 problem (given that PAT isn't enabled by 
default, not a _pure_ regression I guess, but still).

I also still don't know if you (Andreas), Dave or Yinghai should be the 
To: on this but given that you've been the only one to react at all...

--

From: Ingo Molnar
Date: Friday, August 15, 2008 - 7:22 am

(more people Cc:-ed)


agreed - +12 seconds wait suggest some rather fundamental breakage. Did 
we go back to uncached for some critical display area that makes X start 
up (shut down) that slowly? Did we mark the BIOS uncacheable perhaps, 
causing X to execute BIOS code very slowly?

	Ingo
--

From: Rene Herman
Date: Friday, August 15, 2008 - 8:24 am

Quite a lot "uncached-minus" in those lists. I am desperately trying to 
avoid a clue about mostly anything graphics related so, "I dunno".

I haven't just disabled PAT yet (although I was about to just do so) and 
am available for testing.

Rene.
--

From: Rene Herman
Date: Tuesday, August 19, 2008 - 3:11 am

<waiting with bated breath>

Additional observation with respect to first,next shutdown:

With Option "AGPSize" "64", and booted with "nopat", X startup (from 
startx<enter> to functional desktop) is approximately 5 seconds, 
shutdown is 1 second as calibration times.

Booted without "nopat", X startup seems to alternate between 10+ and 16+ 
seconds and for shutdown -- the first shutdown after boot takes some 14 
seconds total, subsequent shutdowns settle at around 5 seconds.

Rene.
--

From: Ingo Molnar
Date: Tuesday, August 19, 2008 - 3:26 am

would it be possible to start up and shut down X in the slow case via 
strace, by doing something like this:

  strace -f -ttt -TTT -o trace.log startx

and see which system calls (or other activities) took suspiciously long?

	Ingo
--

From: Rene Herman
Date: Tuesday, August 19, 2008 - 7:19 am

It wouldn't it seems. Root X (needed for the strace) works fine but 
started this way hangs indefinitely.

I believe the 14 seconds for first shutdown to 5 later might be telling. 
Sounds like something might have fixed up uncached entries.

I'd really like a reply from the AGP or PAT side right about now.

Rene.
--

From: Venki Pallipadi
Date: Tuesday, August 19, 2008 - 12:07 pm

Hmm. Looks like there are more than 16000 entries in the PAT list!

This delay may be due to the overhead of parsing this linked list everytime
for a new entry, rather than any problem with cache setting itself.

I am working on a patch to optimize this pat list parsing for the simple case.
Should be able to send it out later today, for testing.

Thanks,
Venki

--

From: Rene Herman
Date: Tuesday, August 19, 2008 - 12:22 pm

Thanks for the reply. It's with 64MB of AGP memory which I guess is at 
the low end these days. Would your reply mean that basically everyone on 
2.6.27 should now be experiencing this?

I noticed it was PAT related due to Shaohua Li's:

http://marc.info/?l=linux-kernel&m=121783222306075&w=2

which lists very different times (patch there did not help any).

As another by the way, probably not surprising but I earlier also tried 
  both unmounting and completely compiling out debugfs just in case I 
was seeing a debugging related sysmptom. No help either.

It's evening here so I'll probably not be able to test until tomorrow.

Rene.
--

From: Venki Pallipadi
Date: Tuesday, August 19, 2008 - 4:28 pm

Below is the patch I am testing. Let me know if this patch helps.

Thanks,
Venki


Test patch. Adds cached_entry to list add routine, in order to speed up the
lookup for sequential reserve_memtype calls.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/mm/pat.c |   33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c	2008-08-19 15:21:07.000000000 -0700
+++ linux-2.6/arch/x86/mm/pat.c	2008-08-19 16:00:52.000000000 -0700
@@ -207,6 +207,9 @@ static int chk_conflict(struct memtype *
 	return -EBUSY;
 }
 
+static struct memtype *cached_entry;
+static u64 cached_start;
+
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -280,11 +283,17 @@ int reserve_memtype(u64 start, u64 end, 
 
 	spin_lock(&memtype_lock);
 
+	if (cached_entry && start >= cached_start)
+		entry = cached_entry;
+	else
+		entry = list_entry(&memtype_list, struct memtype, nd);
+
 	/* Search for existing mapping that overlaps the current range */
 	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
+	list_for_each_entry_continue(entry, &memtype_list, nd) {
 		if (end <= entry->start) {
 			where = entry->nd.prev;
+			cached_entry = list_entry(where, struct memtype, nd);
 			break;
 		} else if (start <= entry->start) { /* end > entry->start */
 			err = chk_conflict(new, entry, new_type);
@@ -292,6 +301,8 @@ int reserve_memtype(u64 start, u64 end, 
 				dprintk("Overlap at 0x%Lx-0x%Lx\n",
 					entry->start, entry->end);
 				where = entry->nd.prev;
+				cached_entry = list_entry(where,
+							struct memtype, nd);
 			}
 			break;
 		} else if (start < entry->end) { /* start > entry->start */
@@ -299,7 +310,20 @@ int reserve_memtype(u64 start, u64 end, 
 			if (!err) {
 				dprintk("Overlap at 0x%Lx-0x%Lx\n",
 					entry->start, entry->end);
-				where ...
From: Ingo Molnar
Date: Wednesday, August 20, 2008 - 3:09 am

i've queued this fix up in tip/x86/urgent for more testing - as ~10 
seconds delays are serious enough to warrant a quick fix.

Rene, you might want to try tip/master, which has this integrated as 
well:

  http://people.redhat.com/mingo/tip.git/README

	Ingo
--

From: Ingo Molnar
Date: Wednesday, August 20, 2008 - 3:04 am

hm, btw., why is that?

	Ingo
--

From: Rene Herman
Date: Wednesday, August 20, 2008 - 3:50 am

Because 64M of AGP memory divided by 4K pages is 16K. That is, the 
underlying problem seems to be AGP drivers using order 0 allocations. 
I'm looking.

Do note also that this means that Venki's change would not constitite a 
correct/final fix. Sure, caching the last entry speeds up traversing a 
16K entry list but the issue is that there shouldn't be a 16K entry 
list. Through AGP, or maybe even by coalescing entries in the PAT list 
if that's at all possible (I guess it's not really).

Even if such a more fundamental fix isn't (easily) available, the PAT 
code already comments that the list, which is sorted by ->start value, 
is expected to be short, and should be turned into an rbtree if it isn't 
which might be slightly less of a bandaid.

Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it 
seems so I've added Dave Jones for a possible comment from the AGP side.
If I'm reading this right upto now, still many AGP driver (among which 
my amd-k7-agp) are affected.

In the short run and if I'm not just mistaken, the best fix might be to 
make PAT dependent on not having a dumb AGP driver (but as said, still 
looking).

Note that my chipset is capable of a 2G AGP aperture. That's 512K pages 
if fully used, 256K for 1G, 128K for 512M, ...

Rene.
--

From: Rene Herman
Date: Wednesday, August 20, 2008 - 7:27 am

This was based on a wrong reading; I was looking at the GATT allocation.

I'm giving up looking until someone can tell me whether or not those 16K 
entries are expected though. I have just one AGP card in a PAT capable 
machine.

How many entries in /debug/x86/pat_memtype_list are there on other AGP 
systems with Option "AGPSize" "64" in their xorg.conf:"Device" section 
(and their AGP aperture set to 64M or bigger in the BIOS)?

Rene.
--

From: Venki Pallipadi
Date: Wednesday, August 20, 2008 - 12:41 pm

OK. I have reproduced this list size issue locally and this order 1
allocation and set_memory_uc on that allocation is actually coming from
agp_allocate_memory() -> agp_generic_alloc_page() -> map_page_into_agp()
agp_allocate_memory breaks higher order page requests into order 1 allocs.

On my system I see multiple agp_allocate_memory requests for nrpages
8841, 1020, 16, 2160, 2160, 8192. Together they end up resulting in
more than 22K entries in PAT pages.

Thanks,
Venki

--

From: Rene Herman
Date: Wednesday, August 20, 2008 - 2:40 pm

Okay, thanks for the confirmation.

Now, how to fix...

Firstly, it seems we can conclude that any expectancy of a short PAT 
list is simply destroyed by AGP. I believe the best thing migh be to 
look into "fixing" AGP rather than PAT for now?

In a sense the entire purpose of the AGP GART is collecting non 
contiguous pages but given that in practice it's generally still just 
one or at most a few regions, going to multi-page allocs sounds most 
appetising to me.

All in tree AGP drivers except sgi-agp use agp_generic_alloc_page(), ali 
via m1541_alloc_page and i460 via i460_alloc_page.

Rene.
--

From: Dave Airlie
Date: Wednesday, August 20, 2008 - 2:46 pm

In the future we will be getting more smaller AGP allocs, so the other
problem needs a fix as well.

http://git.kernel.org/?p=linux/kernel/git/airlied/agp-2.6.git;a=shortlog;h=agp-pageattr2

contains some code I started on before that moves the interfaces
around,  Shaohua has been looking at
it as it needs the changes to the set_pages interface as well, which
is where I ran out of time/steam last time.

However with alloc/free pages we could change to a higher order
allocation function as long as it fell back to lower
orders internally.

--

From: Venki Pallipadi
Date: Wednesday, August 20, 2008 - 3:16 pm

Yes. Atleast during the bootup, we should be able to get higher order pages.
And if we cannot get that, agp can internally fall back to zero order pages.
That will reduce the number of times set_memory_* gets called, even without
pat.

We are also looking at changing the reserve_memtype in PAT, not to use linked
list for RAM backed pages and track them in page struct. That way we will be
using the list only for reserved region which should still be less in
number and agp set_memory_* call will not have the list overhead.

BTW, Rene: Did the patch from yday, caching the last list add, help in
eliminating the slowdown in X startup?

Thanks,
Venki

--

From: Andi Kleen
Date: Wednesday, August 20, 2008 - 8:42 pm

Back when I hacked on this I explicitely chose to not do this because
it would make it impossible to put any normal anonymous pages into
the PAT list. While that's not done today there's no reason it couldn't
be done in the future.

Also it doesn't fix the scalability of the data structure anyways
(a list is a list),  just saves some memory.

-Andi
--

From: Suresh Siddha
Date: Thursday, August 21, 2008 - 2:13 pm

Andi, we are planning to add couple of page flags which will track
the memory attribute of the page. We need to do some checks like,
allow the memory attribute of the page to be changed, only if it is not
mapped any where and not on free lists(like the in the X driver case,
where they allocate the page and then change the attribute). Similarly,
in generic -mm, we need to ensure that the page before it gets added to free
list, has the right memory attribute etc. If the driver is exposing this
page with special attribute, then it is drivers responsibility to
use the same attribute across all the mappings.

Is there a reason why this won't work with anonymous pages? Can you please

With this, we will track only the reserved regions using the linked list
and typically these reserved regions will be small number (may be huge
contiguous chunks but total number of such chunks will be reasonably smaller).

thanks,
suresh
--

From: Andi Kleen
Date: Thursday, August 21, 2008 - 7:12 pm

> Andi, we are planning to add couple of page flags which will track

page flags are still scarce unfortunately, at least on 32bit.
It's a little better now than it used to be (at some point
they were nearly out before some were reclaimed), but adding a lot 
of flags is still difficult. 

In interest of full disclosure I need at least two for other
work too.

On 64bit adding a lot of new flags is fine, but 64bit only

You want to handle that in __free_pages? 

I would have thought that should be handled in some higher level
function which could just check the memattr.


The issue is just if you reuse the two list heads in struct page
because they're already used by the


Reserved region defined how exactly?

-Andi
--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 5:06 am

Would be nice to test tip/master - it has both that patch included and 
the latest pageattr-array API (with enablement in AGP drivers) patchset, 
done by Shaohua Li, based on Dave's original patch.

	Ingo
--

From: Rene Herman
Date: Thursday, August 21, 2008 - 10:15 am

Yes, it does. I was reluctant to say so as I feel it works around the 
problem instead of solving it but yes, it does (free_memtype() would 

That patch by itself doesn't help any -- the new set_memory_array_uc() 
still calls reserve_memtype() for each single page in the array. To 
help, it needs to coalesce adjacent entries, which is itself easy to do 
were it not for the need to keep the list of reserved (base,end) pairs 
around for free_memtype() time (and halfway fail time).

Also just now looked at multipage allocs in AGP but that has the same 
issue really -- if you do multipage allocs, you then need to keep the 
list of regions around for free time.

With worst case lists of AGPSize / 4K entries, this is not nice either 
(althought worst case is veryvery much worst and unlikely and best case 
is a 1 entry list).

Should PAT just coalesce the memtype list itself?

Rene.

P.S.

The pageattr-array patch that you currently have in tip/master only 
enables it for intel-agp, not the others. The attached enables it for 
all drivers currently directly using agp_generic_alloc_page() and 
agp_generic_destroy_page() (ocal driver is amd-k7-agp).

From: Rene Herman
Date: Thursday, August 21, 2008 - 3:10 pm

Worse yet, it appears to be broken. {reserve,free}_memtype() expect phys 
addresses but it's being passed virtual ones...

Rene.


From: Pallipadi, Venkatesh
Date: Thursday, August 21, 2008 - 3:16 pm

Yes. Noticed that too and sent a patch here for x86/tip.

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/2270.html

It is not very critical as it sounds as only set_memory_uc sets PAT
bits for RAM pages. Most other users (devmem mmap, ioramep, pci)
set PAT bits on the reserved memory. And there will not be conflicts
across RAM and reserveed regions. Regardless, this was a stupid
bug that we had missed earlier.

Thanks,
Venki
--

From: Rene Herman
Date: Thursday, August 21, 2008 - 3:26 pm

And unfortunately I don't think the above fully fixes it for AGP. __pa() 
gets the real physical address and the memtypes should be on the GART 
remapped physical addresses it seems.

Rene.
--

From: Pallipadi, Venkatesh
Date: Thursday, August 21, 2008 - 3:57 pm

Page being marked here as uncached is the page got from alloc_page().
We are not really marking GART physical address as uncacheable. And
that page returned from alloc_page is what we are tracking with
reserve and free.
IOW, the tracking is only to keep CPU accesses consistent across
different va->pa and va across different CPUs and has nothing to do
with GART physical address here.

Thanks,
Venki
--

From: Rene Herman
Date: Thursday, August 21, 2008 - 4:06 pm

Okay, if you say so... it _used_ to be before this array change to AGP 
that the GART addresses were in the memtype list, but I'll take your 
word for that being okay.

Rene
--

From: Rene Herman
Date: Thursday, August 21, 2008 - 4:02 pm

Actually, might as well simply reconstruct the memtype list at free time 
I guess. How is this for a coalescing version of the array functions?

NOTE: I am posting this because I'm going to bed but haven't stared 
comfortably long at this and might be buggy. Compiles, boots and 
provides me with:

root@7ixe4:~# wc -l /debug/x86/pat_memtype_list
53 /debug/x86/pat_memtype_list

otherwise (down from 16384+).

<snore>

Rene.
From: Ingo Molnar
Date: Thursday, August 21, 2008 - 9:15 pm

cool!

I'd do this in v2.6.27 but i forced myself to be reasonable and applied 
your patches to tip/x86/pat instead, for tentative v2.6.28 merging 
(assuming it all passes testing, etc.):

 # 9a79f4f: x86: {reverve,free}_memtype() take a physical address
 # c5e147c: x86: have set_memory_array_{uc,wb} coalesce memtypes.
 # 5f310b6: agp: enable optimized agp_alloc_pages methods

( note that i flipped them around a bit and have put your 
  enable-agp_alloc_pages()-widely patch last, so that we get better 
  bisection behavior. )

The frontside cache itself is in x86/urgent:

 # 80c5e73: x86: fix Xorg startup/shutdown slowdown with PAT

... and should at least solve the symptom that you've hit in practice 
(the slowdown), without changing the underlying PAT machinery. (which 
would be way too dangerous for v2.6.27)

And it's all merged up in tip/master, you might want to test that too to 
check whether all the pieces fit together nicely.

Tens of thousands of page granular memtypes was Not Nice.

Venki, Suresh, Shaohua, Dave, Arjan - any observations about this line 
of action?

	Ingo
--

From: Venki Pallipadi
Date: Friday, August 22, 2008 - 12:08 pm

The concern I have here is that the coalescing is not guaranteed to work.
We may still end up having horrible worst case latency, even though this
improves the normal case (boot the system, start X, exit X, reboot the system).
It depends on how pages are allocated and how much memory is there in the
system and what else is running etc.

Here on my test system, without this coalescing change I see

[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
19528

With the coalescing change I see
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
135

quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
985
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1468
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1749
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1916
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
2085

untar a kernel tar.bz2 and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
2737

compile the kernel and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
3089


I think this as a good workaround for now. But, for long run we still need to
look at other ways of eliminating this overhead (like using page struct
that Suresh mentioned in the other thread).


Also, there seems to be a bug in the error path of the patch. Below should
fix it.

Thanks,
Venki

Fix the start addr for free_memtype calls in the error path.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/mm/pageattr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: t/arch/x86/mm/pageattr.c
===================================================================
--- t.orig/arch/x86/mm/pageattr.c	2008-08-22 10:38:35.000000000 -0700
+++ ...
From: Rene Herman
Date: Friday, August 22, 2008 - 1:15 pm

Yes, I agree. Independent of the current trigger PAT wants a more 
scalable approach and yes, worst case is still single page entries.

That worst case is the guaranteed case now though, so I do feel it's a 
generic fix. After all, there wouldn't seem to be a reason to _not_ 

[ constantly growing number of entries ]

Yes, absolutely right, PAT definitely needs something other than the 
simple linked list. I do believe we also want the coalescing change 
though - it seems to make sense regardless of trigger and it's only 

Ah, yes, thanks, just sent out a final version with this fixed as well.

Rene.

--

From: Ingo Molnar
Date: Saturday, August 23, 2008 - 8:33 am

applied to tip/x86/pat, thanks.

	Ingo
--

From: Rene Herman
Date: Friday, August 22, 2008 - 1:02 pm

Thanks. Stared at it a little longer now and there  was a small cut and 
paste error in the error path (s/start/tmp/) but other than that, yes, 
I'll stand by this. set_memory_array_{uc,wb}() set all pages to the same
type, so coalescing them makes sense in any usage case it seems.

The attached version fixes the out path, is otherwise identical and this 
time comes with a proper changelog and a sign-off. Given that you needed 
the changelog and the sign-off anyway, I thought there wouldn't be much 
point in doing that incrementally, but if you disagree and refactor -- a 
changelog for the out path fix would be a simple:

===
x86: fix "have set_memory_array_{uc,wb} coalesce memtypes".

Fix copy and paste error in out path

Signed-off-by: Rene Herman <rene.herman@gmail.com>


Well, please note that that specific commit only fixes X startup -- it 
doesn't do anything for shutdown. With only that one, I'm still at 14 
seconds for X shutdown (first time after boot that is, 5 seconds 
subsequent shutdowns) versus 1 (or sub 1, feels immediate) normally.

It's also a black-screen "hang", so we'll probably be getting a lot of 
"long hang at shutdown" reports without something additionally for .27.


Wasn't in tip/master when I just now fetched it. It does indeed sit in 
tip/x86/pat.

Rene.
From: Rene Herman
Date: Wednesday, September 10, 2008 - 12:52 pm

Haven't been subcribed to any lists recently and someone was talking 
about "the other thread" before but just noticed that an -rc6 was cut.

Please note that the shutdown issue remains unfixed in it (I'm doing my 
coalescing changes locally).

Rene
--

From: Ingo Molnar
Date: Thursday, September 11, 2008 - 1:17 am

here's what is pending in tip/x86/pat for v2.6.28:

110e035: x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
c09ff7e: linux-next: fix x86 tree build failure
01de05a: x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
5f310b6: agp: enable optimized agp_alloc_pages methods
c5e147c: x86: have set_memory_array_{uc,wb} coalesce memtypes.
9a79f4f: x86: {reverve,free}_memtype() take a physical address
8b53b57: Merge branch 'x86/urgent' into x86/pat
ab7e792: x86: fix pageattr-test
bd07928: agp: add agp_generic_destroy_pages()
37acee1: agp: generic_alloc_pages()
d75586a: x86, pageattr: introduce APIs to change pageattr of a page array
cacf890: Revert "introduce two APIs for page attribute"
9326d61: Revert "reduce tlb/cache flush times of agpgart memory allocation"
5843d9a: x86, pat: avoid highmem cache attribute aliasing
466ae83: reduce tlb/cache flush times of agpgart memory allocation
1ac2f7d: introduce two APIs for page attribute
012f09e: x86: compile pat debugfs interface only if CONFIG_X86_PAT is set

that includes your coalescing optimizations and fixes. None of that is 
really v2.6.27 material i think.

Btw., if you track -tip as per:

  http://people.redhat.com/mingo/tip.git/README

you can merge these changes alone into latest -git just by doing:

  git merge tip/x86/pat

or you can use tip/master for the full range of pending features and 
fixes.

	Ingo
--

From: Rene Herman
Date: Thursday, September 11, 2008 - 1:30 am

Only talking about .27 and just making sure again the issue is known.

The above mentioned subject for the entry cache one ("fix Xorg 
startup/shutdown slowdown with PAT") after all mistakingly says it does 
something for shutdown.

Rene.
--

From: Pallipadi, Venkatesh
Date: Friday, September 12, 2008 - 5:26 pm

Can you try the patch here
http://www.ussg.iu.edu/hypermail/linux/kernel/0809.1/2074.html

That should resolve both reserve and free issues..

Thanks,
Venki
--

From: Rene Herman
Date: Friday, September 12, 2008 - 5:44 pm

Not for .27 though.

Rene.
--

From: Thomas Hellstrom
Date: Thursday, October 9, 2008 - 8:53 am

Hi!
What's the status on this?

There are graphics devices we're working with that require large 
(non-AGP) write-combined memory buffers at any time.
We can solve the tlb- and cache flush latencies by using pools of these 
pages to allocate from and free to, but sooner or later we'll probably 
end up with _huge_ memtype lists. In the (very unlikely) worst case I 
guess they'd contain every other lowmem page in they system.

If I understand things correctly the patch above will fix this issue?
Will it be considered for future kernel inclusion.

Any enlightenment on this subject would be greatly appreciated.
Thanks,
/Thomas



--

From: Pallipadi, Venkatesh
Date: Monday, October 13, 2008 - 10:10 am

Yes this patch will help the above case.

Thanks,
Venki
--

From: Thomas Hellström
Date: Monday, October 13, 2008 - 12:26 pm

Thanks, Venki.
Much appreciated.

/Thomas



--

From: Dave Airlie
Date: Wednesday, August 20, 2008 - 2:02 pm

I haven't anything to add, I'm the maintainer not the author, all the
people who wrote the offending code were
already involved.

--

From: Rene Herman
Date: Wednesday, August 20, 2008 - 2:16 pm

The underlying problem is the order 0 allocations (agp_allocate_memory 
--> agp_generic_allocate_page) where each single page is set uncached 
individually, creating a PAT entry.

Non order 0 allocations generally would ofcourse help. That's very much 
AGP internal -- do you feel that's the way to go?

All the current AGP drivers except sgi-agp use agp_generic_alloc_page().

Doing a quick local hack to collect pages in agp_allocate_memory() into 
regions and set the regions (generally 1) UC in one fell swoop, but I 
don't know if that's safe (and it feels like a rather poor hack anyway).

(not to mention that it's time for bed again).

Rene.
--

Previous thread: [PATCH 04/10] introduce generic iommu_num_pages function by Joerg Roedel on Monday, August 4, 2008 - 9:04 am. (19 messages)

Next thread: [PATCH] AMD IOMMU: initialize dma_ops after sysfs registration by Joerg Roedel on Monday, August 4, 2008 - 9:35 am. (1 message)