Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nick Piggin <npiggin@...>
Cc: Christoph Lameter <clameter@...>, <linux-mm@...>, <linux-kernel@...>, <akpm@...>, <dkegel@...>, David Miller <davem@...>, Daniel Phillips <phillips@...>
Date: Wednesday, August 15, 2007 - 9:12 am

On Wed, 2007-08-15 at 14:22 +0200, Nick Piggin wrote:
.

Mainly because we seem to go in circles :-(


Honestly, I don't. They very much do not solve the problem, they just
displace it.

Christoph's suggestion to set min_free_kbytes to 20% is ridiculous - nor
does it solve all deadlocks :-(



Please do ponder the problem and its proposed solutions, because I'm
going crazy here.

The problem with networked swap is:

TX
 - we need some memory to initiate writeout
 - writeout needs to be throttled in order to make this bounded

  (currently sort-of done by throttle_vm_writeout() - but evginey and
   daniel phillips are working on a more generic approach)

RX
 - we basically need infinite memory to receive the network reply
   to complete writeout. Consider the following scenario:

   3 machines, A, B, C;

     A: * networked swapped
        * networked service

     B: * client for networked service

     C: * server for networked swap

   C becomes unreachable/slow for a while
   B sends massive amounts of traffic A wards
   A consumes all memory with non-critical traffic from B and wedges

 - so we need a threshold of some sorts to start tossing non-critical
   network packets away. (because the consumer of these packets may be
   the one swapping and is therefore frozen)

 - we also need to ensure memory doesn't fragment too badly during the
   receiving -> tossing phase. Otherwise we might again wedge due to OOM

and then there is an TCP specific deadlock: TCP has a global limit on
the amount of skb memory that can be in socket receive queues. Once we
hit this limit with non-critical data (because the consumers are waiting
on swap) all further packets will be tossed and we'll never receive C's
completion


<> Now my solution was to have a reserve just big enough to fit:
  - TX
  - RX (large enough to overflow the IP fragment reassembly)

that way, whenever we receive a packet and find we need the reserve
to back this packet we must only use this for critical services.
(this provides the threshold previously mentioned)

we then process the packet until socket demux (where the skb gets
associated with a sk - and can therefore determine whether it is
critical or not) and toss all packets that are non-critical. This frees
up the memory to receive the next packet, and this can continue ad
infinitum - until we finally do get C's completion and get out of the
tight spot.


<> What Christoph is proposing is doing recursive reclaim and not
initiating writeout. This will only work _IFF_ there are clean pages
about. Which in the general case need not be true (memory might be
packed with anonymous pages - consider an MPI cluster doing computation
stuff). So this gets us a workload dependant solution - which IMHO is
bad!

Also his suggestion to crank up min_free_kbytes to 20% of machine memory
is not workable (again imagine this MPI cluster loosing 20% of its
collective memory, very much out of the question).

Nor does that solve the TCP deadlock, you need some additional condition
to break that.


Do get well.
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 10:21 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Wed Sep 5, 5:20 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 5, 6:42 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Wed Sep 5, 12:16 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:25 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 4:22 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 4:48 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Pavel Machek, (Fri Oct 26, 1:44 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Sat Oct 27, 7:08 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Fri Oct 26, 1:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Sat Oct 27, 6:58 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Mike Snitzer, (Sat Sep 8, 1:12 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Mon Sep 17, 8:28 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Mike Snitzer, (Mon Sep 17, 11:27 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Sep 18, 5:30 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Tue Sep 18, 1:37 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Sep 5, 7:42 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 5, 8:14 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Sep 12, 6:52 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 12, 6:47 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Thu Sep 13, 4:19 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Thu Sep 13, 2:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Thu Sep 13, 3:24 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Sep 5, 8:19 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Tue Sep 11, 3:41 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:37 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:41 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 4:17 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 4:48 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 15, 8:22 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Aug 15, 9:12 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 15, 11:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Sun Aug 19, 11:51 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Mon Aug 20, 8:28 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 12, 6:39 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 21, 11:29 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 22, 11:02 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Aug 20, 3:15 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Mon Aug 20, 8:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Thu Aug 16, 4:27 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Aug 15, 4:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Andi Kleen, (Wed Aug 15, 10:15 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Aug 15, 9:55 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Aug 15, 4:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Andi Kleen, (Wed Aug 15, 10:34 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 14, 10:36 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 11:29 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 14, 3:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 3:41 pm)