Re: [PATCH] mm: make do_move_pages() complexity linear

Previous thread: advantages of AXFS over squashfs? by Tomas M on Friday, September 12, 2008 - 2:37 am. (3 messages)

Next thread: Re: vfat file system extreme fragmentation on multiprocessor by Bodo Eggert on Friday, September 12, 2008 - 6:19 am. (2 messages)
From: Brice Goglin
Date: Friday, September 12, 2008 - 5:31 am

Page migration is currently very slow because its overhead is quadratic
with the number of pages. This is caused by each single page migration
doing a linear lookup in the page array in new_page_node().
    
Since pages are stored in the array order in the pagelist and do_move_pages
process this list in order, new_page_node() can increase the "pm" pointer
to the page array so that the next iteration will find the next page in
0 or few lookup steps.
    
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Nathalie Furmento <Nathalie.Furmento@labri.fr>

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -837,14 +837,23 @@ struct page_to_node {
 	int status;
 };
 
+/*
+ * Allocate a page on the node given as a page_to_node in private.
+ * Increase private to point to the next page_to_node so that the
+ * next iteration does not have to traverse the whole pm array.
+ */
 static struct page *new_page_node(struct page *p, unsigned long private,
 		int **result)
 {
-	struct page_to_node *pm = (struct page_to_node *)private;
+	struct page_to_node **pmptr = (struct page_to_node **)private;
+	struct page_to_node *pm = *pmptr;
 
 	while (pm->node != MAX_NUMNODES && pm->page != p)
 		pm++;
 
+	/* prepare for the next iteration */
+	*pmptr = pm + 1;
+
 	if (pm->node == MAX_NUMNODES)
 		return NULL;
 
@@ -926,10 +935,12 @@ set_status:
 		pp->status = err;
 	}
 
-	if (!list_empty(&pagelist))
+	if (!list_empty(&pagelist)) {
+		/* new_page_node() will modify tmp */
+		struct page_to_node *tmp = pm;
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm);
-	else
+				    (unsigned long)&tmp);
+	} else
 		err = -ENOENT;
 
 	up_read(&mm->mmap_sem);


--

From: Christoph Lameter
Date: Friday, September 12, 2008 - 6:45 am

Page migration in general is not affected by this issue. This is specific to
the sys_move_pages() system call. The API was so far only used to migrate a
limited number of pages. For more one would use either the cpuset or the
sys_migrate_pages() APIs since these do not require an array that describes

I agree. It would be good increase the speed of sys_move_pages().

However, note that your patch assumes that new_page_node() is called in
sequence for each of the pages in the page descriptor array. new_page_node()
is skipped in the loop if

1. The page is not present
2. The page is reserved
3. The page is already on the intended node
4. The page is shared between processes.

If any of those cases happen then your patch will result in the association of
page descriptors with the wrong pages for the remaining pages in the array.
--

From: Brice Goglin
Date: Friday, September 12, 2008 - 6:54 am

No, it assumes that pages are stored in pagelist in order. But some of

I don't think so. If this happens, the while loop will skip those pages.
(while in the regular case, the while loop does 0 iterations).
The while loop is still here to make sure we are processing the right pm
entry. What the patch changes is only that we don't uselessly look at
the already-processed beginning of pm.

thanks,
Brice
--

From: Christoph Lameter
Date: Friday, September 12, 2008 - 7:21 am

Ahh.. I missed that.

Acked-by: Christoph Lameter <cl@linux-foundation.org>
--

From: Brice Goglin
Date: Thursday, September 25, 2008 - 5:58 am

Actually, this "pm+1" breaks the case where migrate_pages() calls
unmap_and_move() multiple times on the same page. In this case, we need
the while loop to look at pm instead of pm+1 first. So we can't cache
pm+1 in private, but caching pm is ok. There will be 1 while loop
instead of 0 in the regular case. Updated patch (with more comments)
coming soon.

Brice

--

Previous thread: advantages of AXFS over squashfs? by Tomas M on Friday, September 12, 2008 - 2:37 am. (3 messages)

Next thread: Re: vfat file system extreme fragmentation on multi