Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Mike Snitzer <snitzer@...>
Cc: Christoph Lameter <clameter@...>, <linux-mm@...>, <linux-kernel@...>, <akpm@...>, <dkegel@...>, Peter Zijlstra <a.p.zijlstra@...>, David Miller <davem@...>, Nick Piggin <npiggin@...>
Date: Monday, September 17, 2007 - 8:28 pm

On Friday 07 September 2007 22:12, Mike Snitzer wrote:

Sorry, I was incommunicado out on the high seas all last week.  OK, the
measures that actually prevent our ddsnap driver from deadlocking are:

  - Statically prove bounded memory use of all code in the writeout
    path.

  - Implement any special measures required to be able to make such a
    proof.

  - All allocations performed by the block driver must have access
    to dedicated memory resources.

  - Disable the congestion_wait mechanism for our code as much as
    possible, at least enough to obtain the maximum memory resources
    that can be used on the writeout path.

The specific measure we implement in order to prove a bound is:

  - Throttle IO on our block device to a known amount of traffic for
    which we are sure that the MEMALLOC reserve will always be
    adequate.

Note that the boundedness proof we use is somewhat loose at the moment. 
It goes something like "we only need at most X kilobytes of reserve and 
there are X megabytes available".  Much of Peter's patch set is aimed 
at getting more precise about this, but to be sure, handwaving just 
like this has been part of core kernel since day one without too many 
ill effects.

The way we provide guaranteed access to memory resources is:

  - Run critical daemons in PF_MEMALLOC mode, including
    any userspace daemons that must execute in the block IO path
   (cluster coders take note!)

Right now, all writeout submitted to ddsnap gets handed off to a daemon
running in PF_MEMALLOC mode.  This is a needless inefficiency that we 
want to remove in future, and handle as many of those submissions as 
possible entirely in the context of the submitter.  To do this, further 
measures are needed:

  - Network writes performed by the block driver must have access to
    dedicated memory resources.

We have not yet managed to trigger network read memory deadlock, but it 
is just a matter of time, additional fancy virtual block devices, and 
enough stress.  So:

  - Network reads need some fancy extra support because dedicated
    memory resources must be consumed before knowing whether the
    network traffic belongs to a block device or not.

Now, the interesting thing about this whole discussion is, none of the 
measures that we are actually using at the moment are implemented in 
either Peter's or Christoph's patch set.  In other words, at present we 
do not require either patch set in order to run under heavy load 
without deadlocking.  But in order to generalize our solution to a 
wider range of virtual block devices and other problematic systems such 
as userspace filesystems, we need to incorporate a number of elements 
of Peter's patch set.

As far as Christoph's proposal goes, it is not required to prevent 
deadlocks.   Whether or not it is a good optimization is an open 
question.

Of all the patches posted so far related to this work, the only 
indispensable one is the bio throttling patch developed by Evgeniy and 
I in a parallel thread.  The other essential pieces are all implemented 
in our block driver for now.  Some of those can be generalized and 
moved at least partially into core, and some cannot.

I do need to write some sort of primer on this, because there is no 
fire-and-forget magic core kernel solution.  There are helpful things 
we can do in core, but some of it can only be implemented in the 
drivers themselves.

Regards,

Daniel
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 10:21 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Wed Sep 5, 5:20 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 5, 6:42 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Wed Sep 5, 12:16 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:25 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 4:22 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 4:48 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Pavel Machek, (Fri Oct 26, 1:44 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Sat Oct 27, 7:08 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Fri Oct 26, 1:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Sat Oct 27, 6:58 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Mike Snitzer, (Sat Sep 8, 1:12 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Mon Sep 17, 8:28 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Mike Snitzer, (Mon Sep 17, 11:27 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Sep 18, 5:30 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Daniel Phillips, (Tue Sep 18, 1:37 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Sep 5, 7:42 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 5, 8:14 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Sep 12, 6:52 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 12, 6:47 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Thu Sep 13, 4:19 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Thu Sep 13, 2:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Thu Sep 13, 3:24 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Sep 5, 8:19 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Tue Sep 11, 3:41 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:37 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 3:41 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 3:55 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Sep 10, 4:17 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Mon Sep 10, 4:48 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 15, 8:22 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Aug 15, 9:12 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 15, 11:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Sun Aug 19, 11:51 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Mon Aug 20, 8:28 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Sep 12, 6:39 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 21, 11:29 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Wed Aug 22, 11:02 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Mon Aug 20, 3:15 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Nick Piggin, (Mon Aug 20, 8:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Thu Aug 16, 4:27 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Aug 15, 4:29 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Andi Kleen, (Wed Aug 15, 10:15 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Wed Aug 15, 9:55 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Wed Aug 15, 4:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Andi Kleen, (Wed Aug 15, 10:34 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 14, 10:36 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 11:29 am)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Peter Zijlstra, (Tue Aug 14, 3:32 pm)
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC), Christoph Lameter, (Tue Aug 14, 3:41 pm)