Re: Why PAGEOUT_IO_SYNC stalls for a long time

Previous thread: Re: slub numa: Fix rare allocation from unexpected node by Pekka Enberg on Thursday, July 29, 2010 - 3:00 am. (1 message)

Next thread: [PATCH 09/10 v2] oss: msnd: check request_region() return value by Kulikov Vasiliy on Thursday, July 29, 2010 - 3:45 am. (1 message)
From: KOSAKI Motohiro
Date: Thursday, July 29, 2010 - 3:34 am

May I explain more a bit? Generically, a worth of retrying depend on successful ratio.
now shrink_page_list() can't free the page when following situation.

1. trylock_page() failure
2. page is unevictable
3. zone reclaim and page is mapped
4. PageWriteback() is true and not synchronous lumpy reclaim
5. page is swapbacked and swap is full
6. add_to_swap() fail (note, this is frequently fail rather than expected because
    it is using GFP_NOMEMALLOC)
7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
8. page is pinned
9. IO queue is congested
10. pageout() start IO, but not finished

So, (4) and (10) are perfectly good condition to wait. (1) and (8) might be solved
by sleeping awhile, but it's unrelated on io-congestion. but might not be. It only works
by lucky. So I don't like to depned on luck.  (9) can be solved by io
waiting. but congestion_wait() is NOT correct wait. congestion_wait() mean 
"sleep until one or more block device in the system are no congested". That said,
if the system have two or more disks, congestion_wait() doesn't works well for 
synchronous lumpy reclaim purpose. btw, desktop user oftern use USB storage
device. (2), (3), (5), (6) and (7) can't be solved by waiting. It's just silly.

In the other hand, synchrounous lumpy reclaim work fine following situation.

1. called shrink_page_list(PAGEOUT_IO_ASYNC) 
2. pageout() kicked IO
3. waiting by wait_on_page_writeback()
4. application touched the page again. and the page became dirty again
5. IO finished, and wakeuped reclaim thread 
6. called pageout()
7. called wait_on_page_writeback() again
8. ok. we are successful high order reclaim


Well, outside pageout(), probably only XFS makes PF_MEMALLOC + writeout. 
because PF_MEMALLOC is enabled only very limited situation. but I don't know

I think this is unrelated issue.  actually, page_referenced() is called before try_to_unmap()
and page_referenced() will drop pte young bit. This logic have very narrowing race. but
 I don't think this is ...
From: Mel Gorman
Date: Thursday, July 29, 2010 - 7:24 am

In this case, waiting a while really in the right thing to do. It stalls
the caller, but it's a high-order allocation. The alternative is for it
to keep scanning which when under memory pressure could result in far
too many pages being evicted. How long to wait is a tricky one to answer


Indeed not. Eliminating congestion_wait there and depending instad on



All direct reclaimers have PF_MEMALLOC set so it's not that limited a
situation. See here

        p->flags |= PF_MEMALLOC;
        lockdep_set_current_reclaim_state(gfp_mask);
        reclaim_state.reclaimed_slab = 0;
        p->reclaim_state = &reclaim_state;

        *did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);

        p->reclaim_state = NULL;
        lockdep_clear_current_reclaim_state();

Ok.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Mel Gorman
Date: Friday, July 30, 2010 - 3:30 am

No reason. Using lock_page() in the synchronous case would be a sensible
choice. As you are realising, there are a number of warts around lumpy

In what case is a munlocked pages reference count permanently increased and

Right now, I can't think of a problem with calling lock_page instead of

Not that I'm aware of but it's not something I would know offhand. Will
go digging.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KOSAKI Motohiro
Date: Sunday, August 1, 2010 - 1:47 am

V4L, audio, GEM and/or other multimedia driver?



--

From: Mel Gorman
Date: Wednesday, August 4, 2010 - 4:10 am

Ok, that is quite likely. Have you made a start on a series related to
lumpy reclaim? I was holding off making a start on such a thing while I
reviewed the other writeback issues and travelling to MM Summit is going
to delay things for me. If you haven't started when I get back, I'll
make some sort of stab at it.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KOSAKI Motohiro
Date: Wednesday, August 4, 2010 - 11:20 pm

Yup, I posted them today. While my lite testing, they works intentionally. it mean
 - reduce low order reclaim latency
 - keep high successfull rate order-9 reclaim under heavy io workload

However, they obviously need more test. comment are welcome :)




--

From: Andreas Mohr
Date: Thursday, August 5, 2010 - 1:09 am

Thanks a lot!

I've been following recent discussions,
however testing is planned to be done in nearer future
since I'm currently "recovering" from large backlog (Thesis).

Andreas Mohr
--

Previous thread: Re: slub numa: Fix rare allocation from unexpected node by Pekka Enberg on Thursday, July 29, 2010 - 3:00 am. (1 message)

Next thread: [PATCH 09/10 v2] oss: msnd: check request_region() return value by Kulikov Vasiliy on Thursday, July 29, 2010 - 3:45 am. (1 message)