Just an idea I had, it seems like a good idea to wait for RCU callbacks
in reclaim so that we won't get all of memory stuck there.If this location is too aggressive we might stick it next to
disable_swap_token().---
Couple RCU and reclaim.There could be a lot of memory stuck in RCU callbacks. Wait for RCU to
finish before giving it another go.Placed in kswapd and not direct reclaim path because kswapd never holds
rcu_read_lock() at this point and can thus not deadlock. Direct reclaim
callers might hold rcu_read_lock() and would suffer from deadlocks if
sync_rcu() were to be called.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/vmscan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -1435,8 +1435,10 @@ loop_again:
unsigned long lru_pages = 0;/* The swap token gets in the way of swapout... */
- if (!priority)
+ if (!priority) {
+ synchronize_rcu();
disable_swap_token();
+ }all_zones_ok = 1;
-
I think it would be much too aggressive (_especially_ with preemptible
RCU, I would have thought?). And not just for the case of adding a lot
of idle time while other CPUs are busy not being preempted -- also
that if you have lots of CPUs reclaiming and asking to synchronize_rcu,I don't see a really good reason to add this kind of wait here. I mean,
we have to wait for RCU anyway, so we may as well scan what we can
until then?Putting it in a slowpath at some point when we think we're heading for
OOM I think would be a bit better.
-
Interesting change
1. I suspect that synchronize_rcu() is most likely to free up
slab pages, so shrink_slab() will clean up all the freed
pages. Could we add a comment to indicate the same?--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
On Mon, 24 Sep 2007 16:12:19 +0530 Balbir Singh
Bah, it seems I send the wrong patch out :-/
this is the one against disable_swap_token(). I meant to send out this
one:@@ -1527,8 +1527,10 @@ loop_again:
* OK, kswapd is getting into trouble. Take a nap, then take
* another pass across the zones.
*/
- if (total_scanned && priority < DEF_PRIORITY - 2)
+ if (total_scanned && priority < DEF_PRIORITY - 2) {
+ synchronize_rcu();
congestion_wait(WRITE, HZ/10);Uhm, this is balance_pgdat() (both these changes) :-)
Only kswapd can do this, direct reclaim has deadlock potential.
-
Yes, but not in all cases, do you want to add any gfp_mask
based smartness for direct reclaim?--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
On Mon, 24 Sep 2007 16:52:15 +0530 Balbir Singh
---
mm/vmscan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -1527,8 +1527,10 @@ loop_again:
* OK, kswapd is getting into trouble. Take a nap, then take
* another pass across the zones.
*/
- if (total_scanned && priority < DEF_PRIORITY - 2)
+ if (total_scanned && priority < DEF_PRIORITY - 2) {
+ synchronize_rcu();
congestion_wait(WRITE, HZ/10);
+ }/*
gfp_mask doesn't carry the needed information. It depends on whether
the current context holds a rcu_read_lock().so something like:
rcu_read_lock()
foo = kmalloc(sizeof(foo))
new_slab()
__alloc_pages()
try_to_free_pages()
synchronise_rcu() <-- deadlock
rcu_read_unlock()
-
What I meant was that nobody would hold rcu_read_lock() and pass
At this point, you really can't use GFP_KERNEL, since rcu_read_lock()
disables pre-emption in the current kernel, ideally you should see--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
On Mon, 24 Sep 2007 18:18:30 +0530 Balbir Singh
I guess I've been using preemptibe rcu for way too long. Would need to
clean up some of -rt to make this true there, but should be doable
indeed.-
