Re: Deadlock possibly caused by too_many_isolated.

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Wu Fengguang
Date: Wednesday, September 15, 2010 - 1:28 am

Neil,

Sorry for the rushed and imaginary ideas this morning..


The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
to cheat all the way up to believe "enough pages have been reclaimed".
So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
go on to call get_page_from_freelist(). That normally fails because
the task didn't really scanned the LRU lists. However it does have the
possibility to succeed -- when so many processes are doing concurrent
direct reclaims, it may luckily get one free page reclaimed by other
tasks. What's more, if it does fail to get a free page, the upper
layer __alloc_pages_slowpath() will be repeat recalling
__alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
"stealing" a free page reclaimed by other tasks.

In summary, the patch behavior for !__GFP_IO/FS is
- won't do any page reclaim
- won't fail the page allocation (unexpected)
- will wait and steal one free page from others (unreasonable)

So it will address the problem you encountered, however it sounds
pretty unexpected and illogical behavior, right?

I believe this patch will address the problem equally well.
What do you think?

Thanks,
Fengguang
---

mm: Avoid possible deadlock caused by too_many_isolated()

Neil finds that if too_many_isolated() returns true while performing
direct reclaim we can end up waiting for other threads to complete their
direct reclaim.  If those threads are allowed to enter the FS or IO to
free memory, but this thread is not, then it is possible that those
threads will be waiting on this thread and so we get a circular
deadlock.

some task enters direct reclaim with GFP_KERNEL
  => too_many_isolated() false
    => vmscan and run into dirty pages
      => pageout()
        => take some FS lock
	  => fs/block code does GFP_NOIO allocation
	    => enter direct reclaim again
	      => too_many_isolated() true
		=> waiting for others to progress, however the other
		   tasks may be circular waiting for the FS lock..

The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher
priority than normal ones, by honouring them higher throttle threshold.

Now !__GFP_IO/FS reclaims won't be waiting for __GFP_IO/FS reclaims to
progress. They will be blocked only when there are too many concurrent
!__GFP_IO/FS reclaims, however that's very unlikely because the IO-less
direct reclaims is able to progress much more faster, and they won't
deadlock each other. The threshold is raised high enough for them, so
that there can be sufficient parallel progress of !__GFP_IO/FS reclaims.

Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- linux-next.orig/mm/vmscan.c	2010-09-15 11:58:58.000000000 +0800
+++ linux-next/mm/vmscan.c	2010-09-15 15:36:14.000000000 +0800
@@ -1141,36 +1141,39 @@ int isolate_lru_page(struct page *page)
 	return ret;
 }
 
 /*
  * Are there way too many processes in the direct reclaim path already?
  */
 static int too_many_isolated(struct zone *zone, int file,
 		struct scan_control *sc)
 {
 	unsigned long inactive, isolated;
+	int ratio;
 
 	if (current_is_kswapd())
 		return 0;
 
 	if (!scanning_global_lru(sc))
 		return 0;
 
 	if (file) {
 		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
 		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
 	} else {
 		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
 		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
 	}
 
-	return isolated > inactive;
+	ratio = sc->gfp_mask & (__GFP_IO | __GFP_FS) ? 1 : 8;
+
+	return isolated > inactive * ratio;
 }
 
 /*
  * TODO: Try merging with migrations version of putback_lru_pages
  */
 static noinline_for_stack void
 putback_lru_pages(struct zone *zone, struct scan_control *sc,
 				unsigned long nr_anon, unsigned long nr_file,
 				struct list_head *page_list)
 {
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Deadlock possibly caused by too_many_isolated., Neil Brown, (Tue Sep 14, 4:11 pm)
Re: Deadlock possibly caused by too_many_isolated., Rik van Riel, (Tue Sep 14, 5:30 pm)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Tue Sep 14, 7:23 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 7:37 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 7:54 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 8:06 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 8:13 pm)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Tue Sep 14, 8:17 pm)
Re: Deadlock possibly caused by too_many_isolated., Shaohua Li, (Tue Sep 14, 8:18 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 8:31 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Sep 14, 8:47 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Wed Sep 15, 1:28 am)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Wed Sep 15, 1:44 am)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Sun Oct 17, 9:14 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Sun Oct 17, 10:04 pm)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Mon Oct 18, 3:58 am)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Mon Oct 18, 9:15 am)
Re: Deadlock possibly caused by too_many_isolated., Andrew Morton, (Mon Oct 18, 2:58 pm)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Mon Oct 18, 3:31 pm)
Re: Deadlock possibly caused by too_many_isolated., Andrew Morton, (Mon Oct 18, 3:41 pm)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Mon Oct 18, 4:11 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 5:57 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 6:15 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 6:21 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 6:32 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 7:03 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 7:16 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Mon Oct 18, 7:24 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Mon Oct 18, 7:35 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 7:37 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 7:37 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 7:52 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 7:54 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Mon Oct 18, 8:05 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 8:09 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Mon Oct 18, 8:13 pm)
Re: Deadlock possibly caused by too_many_isolated., Shaohua Li, (Mon Oct 18, 8:21 pm)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Mon Oct 18, 10:11 pm)
Re: Deadlock possibly caused by too_many_isolated., Shaohua Li, (Tue Oct 19, 12:15 am)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Tue Oct 19, 12:34 am)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Tue Oct 19, 1:43 am)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Tue Oct 19, 3:06 am)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Tue Oct 19, 10:57 pm)
Re: Deadlock possibly caused by too_many_isolated., KOSAKI Motohiro, (Wed Oct 20, 12:05 am)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Wed Oct 20, 12:25 am)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Wed Oct 20, 2:01 am)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Wed Oct 20, 2:27 am)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Wed Oct 20, 3:07 am)
Re: Deadlock possibly caused by too_many_isolated., Jens Axboe, (Wed Oct 20, 6:03 am)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Wed Oct 20, 7:23 am)
Re: Deadlock possibly caused by too_many_isolated., Torsten Kaiser, (Wed Oct 20, 8:35 am)
Re: Deadlock possibly caused by too_many_isolated., Minchan Kim, (Wed Oct 20, 4:31 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Thu Oct 21, 10:37 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Fri Oct 22, 1:07 am)
Re: Deadlock possibly caused by too_many_isolated., Jens Axboe, (Fri Oct 22, 1:09 am)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Sun Oct 24, 9:52 am)
Re: Deadlock possibly caused by too_many_isolated., Neil Brown, (Sun Oct 24, 11:40 pm)
Re: Deadlock possibly caused by too_many_isolated., Wu Fengguang, (Mon Oct 25, 12:26 am)