May I explain more a bit? Generically, a worth of retrying depend on successful ratio.
now shrink_page_list() can't free the page when following situation.
1. trylock_page() failure
2. page is unevictable
3. zone reclaim and page is mapped
4. PageWriteback() is true and not synchronous lumpy reclaim
5. page is swapbacked and swap is full
6. add_to_swap() fail (note, this is frequently fail rather than expected because
it is using GFP_NOMEMALLOC)
7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
8. page is pinned
9. IO queue is congested
10. pageout() start IO, but not finished
So, (4) and (10) are perfectly good condition to wait. (1) and (8) might be solved
by sleeping awhile, but it's unrelated on io-congestion. but might not be. It only works
by lucky. So I don't like to depned on luck. (9) can be solved by io
waiting. but congestion_wait() is NOT correct wait. congestion_wait() mean
"sleep until one or more block device in the system are no congested". That said,
if the system have two or more disks, congestion_wait() doesn't works well for
synchronous lumpy reclaim purpose. btw, desktop user oftern use USB storage
device. (2), (3), (5), (6) and (7) can't be solved by waiting. It's just silly.
In the other hand, synchrounous lumpy reclaim work fine following situation.
1. called shrink_page_list(PAGEOUT_IO_ASYNC)
2. pageout() kicked IO
3. waiting by wait_on_page_writeback()
4. application touched the page again. and the page became dirty again
5. IO finished, and wakeuped reclaim thread
6. called pageout()
7. called wait_on_page_writeback() again
8. ok. we are successful high order reclaim
Well, outside pageout(), probably only XFS makes PF_MEMALLOC + writeout.
because PF_MEMALLOC is enabled only very limited situation. but I don't know
I think this is unrelated issue. actually, page_referenced() is called before try_to_unmap()
and page_referenced() will drop pte young bit. This logic have very narrowing race. but
I don't think this is ...In this case, waiting a while really in the right thing to do. It stalls
the caller, but it's a high-order allocation. The alternative is for it
to keep scanning which when under memory pressure could result in far
too many pages being evicted. How long to wait is a tricky one to answer
Indeed not. Eliminating congestion_wait there and depending instad on
All direct reclaimers have PF_MEMALLOC set so it's not that limited a
situation. See here
p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(gfp_mask);
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;
*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
p->reclaim_state = NULL;
lockdep_clear_current_reclaim_state();
Ok.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
No reason. Using lock_page() in the synchronous case would be a sensible choice. As you are realising, there are a number of warts around lumpy In what case is a munlocked pages reference count permanently increased and Right now, I can't think of a problem with calling lock_page instead of Not that I'm aware of but it's not something I would know offhand. Will go digging. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
V4L, audio, GEM and/or other multimedia driver? --
Ok, that is quite likely. Have you made a start on a series related to lumpy reclaim? I was holding off making a start on such a thing while I reviewed the other writeback issues and travelling to MM Summit is going to delay things for me. If you haven't started when I get back, I'll make some sort of stab at it. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Yup, I posted them today. While my lite testing, they works intentionally. it mean - reduce low order reclaim latency - keep high successfull rate order-9 reclaim under heavy io workload However, they obviously need more test. comment are welcome :) --
Thanks a lot! I've been following recent discussions, however testing is planned to be done in nearer future since I'm currently "recovering" from large backlog (Thesis). Andreas Mohr --
