[PATCH] fix to putback_lru_page()/unevictable page handling rework v3

Previous thread: [PATCH]rcu,inet,fib_trie,route,radix-tree,DECnet,mac80211: fix meaningless rcu_dereference(local_var) by Lai Jiangshan on Saturday, June 21, 2008 - 2:54 am. (5 messages)

Next thread: hdparm -M acoustic management for sata disks? by Soeren Sonnenburg on Saturday, June 21, 2008 - 3:14 am. (11 messages)
From: KOSAKI Motohiro
Date: Saturday, June 21, 2008 - 3:00 am

Hi,

I merged kamezawa-san's SHMEM related fix.
this patch works well >2H.
and, I am going to test on stress workload during this week end.

but I hope recieve review at first.
thus I post it now.

Thanks.


V2 -> V3
   o remove lock_page() from scan_mapping_unevictable_pages() and
     scan_zone_unevictable_pages().
   o revert ipc/shm.c mm/shmem.c change of SHMEM unevictable patch.
     it become unnecessary by this patch.

V1 -> V2
   o undo unintented comment killing.
   o move putback_lru_page() from move_to_new_page() to unmap_and_move().
   o folded depend patch
       http://marc.info/?l=linux-mm&m=121337119621958&w=2
       http://marc.info/?l=linux-kernel&m=121362782406478&w=2
       http://marc.info/?l=linux-mm&m=121377572909776&w=2


From: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

putback_lru_page()/unevictable page handling rework.

Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 ipc/shm.c     |   16 +-------
 mm/internal.h |    2 -
 mm/migrate.c  |   60 ++++++++++---------------------
 mm/mlock.c    |  ...
From: KOSAKI Motohiro
Date: Monday, June 23, 2008 - 7:43 pm

Unfortunately, my machine crashed last night ;-)
I'll dig it.


--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 10:10 am

I ran 26-rc5-mm3 with 5 split/unevictable lru patches that you posted on
19june.  I replaced patch 5 of that series with the subject patch
[rework v3, merged SHMEM fix].  This kernel ran my 'usex' stress load
overnight for 23+ hours on both ia64 and x86_64 platforms with no
problems.  I evidently did not hit the problem you did.

I'm rebuilding with a patch to a small problem that I discovered along
with your recent patch to "prevent incorrect oom...".  I'll let you know
how that goes as well.

I'll send along two additional patches shortly.

Lee

--

From: Andrew Morton
Date: Tuesday, June 24, 2008 - 10:55 am

My chances of working out which patches I need to apply to -mm are
near-zero.  I'm working through my vacation backlog in reverse order
and haven't got up to this topic yet.

As you've been paying attention it would be appreciated if you could
send me some stuff, please.

--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 12:11 pm

I saw your prior mail to Rik about this, but seem to have deleted it :(.

The stack that I'm currently running atop 26-rc5-mm3 contains the
[-mm][PATCH 1/5]  fix munlock page table walk
[-mm][PATCH 2/5] migration_entry_wait fix.
[-mm][PATCH 3/5] collect lru meminfo statistics from correct offset
[-mm][PATCH 4/5]  fix incorrect Mlocked field of /proc/meminfo.
The following patch replaces 5/5:
[RFC][PATCH] putback_lru_page()/unevictable page handling rework v3
The following "rfc" was acked by Rik:
[RFC][PATCH] prevent incorrect oom under split_lru

Two that I posted today [24Jun]--fixes to the "rework v3" patch:
[PATCH] fix to putback_lru_page()/unevictable page handling rework
[PATCH] fix2 to putback_lru_page()/unevictable page handling

The resulting kernel has been running well on my largish ia64 and x86_64
platforms under a work load that I use to stress reclaim, swapping,
mlocking, ...  However, Kosaki-san is apparently still experiencing
panics with a cpuset migration scenario discovered by Daisuke
Nishimura.  We're still investigating the crash, but the patches listed
above, despite the "rfc" on a couple of them, are an improvement over
26-rc5-mm3.

I believe that Rik has at least one other fix related to "loopback over
tmpfs" or such.

Is the list above sufficient to extract the patches from your mail
backlog, or would you prefer that we resend them?

I'll also send along a patch to update the document to match the
reworked lru handling methodology that Kamezawa Hiroyuki did.

Lee

--

From: Andrew Morton
Date: Tuesday, June 24, 2008 - 12:19 pm

On Tue, 24 Jun 2008 15:11:29 -0400

Please spoon-feed me ;) I can apply them and then I can pick through
--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 10:29 am

PATCH revert shmem_lock() prototypes to return int

Against: 26-rc5-mm3 with Kosaki Motohiro's splitlru unevictable lru
fixes.

Fix to putback_lru_page()/unevictable page handling rework v3 patch.

The subject patch reverted a prior change to shmem_lock() to return a
struct address_space pointer back to returning an int.  This patch
updates the prototypes in mm.h to match.  

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/mm.h |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Index: linux-2.6.26-rc5-mm3/include/linux/mm.h
===================================================================
--- linux-2.6.26-rc5-mm3.orig/include/linux/mm.h	2008-06-24 12:54:41.000000000 -0400
+++ linux-2.6.26-rc5-mm3/include/linux/mm.h	2008-06-24 13:25:29.000000000 -0400
@@ -706,13 +706,12 @@ static inline int page_mapped(struct pag
 extern void show_free_areas(void);
 
 #ifdef CONFIG_SHMEM
-extern struct address_space *shmem_lock(struct file *file, int lock,
-					struct user_struct *user);
+extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
-static inline struct address_space *shmem_lock(struct file *file, int lock,
+static inline int shmem_lock(struct file *file, int lock,
 					struct user_struct *user)
 {
-	return NULL;
+	return 0;
 }
 #endif
 struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags);


--

From: KOSAKI Motohiro
Date: Tuesday, June 24, 2008 - 10:38 am

Sure.
I forgot "quilt add mm.h" operation ;-)

Thank you!

     Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--

From: KOSAKI Motohiro
Date: Tuesday, June 24, 2008 - 2:49 am

at this point, We should call ClearPageUnevictable().
otherwise, BUG() is called on isolate_lru_pages().



--

From: KAMEZAWA Hiroyuki
Date: Tuesday, June 24, 2008 - 2:59 am

On Tue, 24 Jun 2008 18:49:05 +0900
Sure. thanks,
-Kame

--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 10:15 am

To which BUG() are you referring here?  There used to be a
BUG_ON(PageUnevictable(page)) in page_evictable(), but Kame-san removed
that.

By the wah, we'll never take the retry because 'lru' never ==
LRU_UNEVICTABLE in this version of putback_lru_page().  Patch to follow.

Lee

--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 10:19 am

PATCH fix to rework of putback_lru_page locking.

Against:  26-rc5-mm3 atop Kosaki Motohiro's v3 rework of Kamezawa
Hiroyuki's putback_lru_page rework patch.

'lru' was not being set to 'UNEVICTABLE when page was, in fact,
unevictable [really "nonreclaimable" :-)], so retry would never
happen, and culled pages never counted.

Also, redundant mem_cgroup_move_lists()--one with incorrect 'lru',
in the case of unevictable pages--messes up memcontroller tracking [I think].

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-23 11:45:26.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-24 12:45:15.000000000 -0400
@@ -514,8 +514,8 @@ redo:
 		 * Put unevictable pages directly on zone's unevictable
 		 * list.
 		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
 	}
 
 	mem_cgroup_move_lists(page, lru);


--

From: KOSAKI Motohiro
Date: Tuesday, June 24, 2008 - 10:35 am

indeed.
sorry, I forgot to send this fix.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

but I still happend panic on usex and nishimura-san's cpuset migration test.
  -> http://marc.info/?l=linux-mm&m=121375647720110&w=2


I'll  investigate it tommorow.
--

From: Lee Schermerhorn
Date: Tuesday, June 24, 2008 - 10:48 am

I saw the description of the cpuset migration test.  Have you wrapped
this in a script suitable for running under usex?  If so, I would like
to get a copy.  Actually, please send me any automation you have for
this test and I'll incorporate it into the usex load.  Meanwhile, I'll
take a cut at adding such a test to the load.  However, we know that

Later, then,
Lee



--

From: KOSAKI Motohiro
Date: Tuesday, June 24, 2008 - 11:19 am

Ah, no. sorry.
I use multiple console by cpuset test and usex.
--

Previous thread: [PATCH]rcu,inet,fib_trie,route,radix-tree,DECnet,mac80211: fix meaningless rcu_dereference(local_var) by Lai Jiangshan on Saturday, June 21, 2008 - 2:54 am. (5 messages)

Next thread: hdparm -M acoustic management for sata disks? by Soeren Sonnenburg on Saturday, June 21, 2008 - 3:14 am. (11 messages)