Re: [00/41] Large Blocksize Support V7 (adds memmap support)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nick Piggin <nickpiggin@...>
Cc: Christoph Lameter <clameter@...>, <andrea@...>, <torvalds@...>, <linux-fsdevel@...>, <linux-kernel@...>, Christoph Hellwig <hch@...>, Mel Gorman <mel@...>, William Lee Irwin III <wli@...>, David Chinner <dgc@...>, Jens Axboe <jens.axboe@...>, Badari Pulavarty <pbadari@...>, Maxim Levitsky <maximlevitsky@...>, Fengguang Wu <fengguang.wu@...>, swin wang <wangswin@...>, <totty.lu@...>, <hugh@...>, <joern@...>
Date: Tuesday, September 11, 2007 - 11:36 am

On Tue, 2007-09-11 at 04:52 +1000, Nick Piggin wrote:

I thought we had discussed this already at VM and reached something
resembling a conclusion. It was acknowledged that depending on
contiguous allocations to always succeed will get a caller into trouble
and they need to deal with fallback - whether the problem was
theoritical or not. It was also strongly pointed out that the large
block patches as presented would be vunerable to that problem.

The alternatives were fs-block and increasing the size of order-0. It
was felt that fs-block was far away because it's complex and I thought
that increasing the pagesize like what Andrea suggested would lead to
internal fragmentation problems. Regrettably we didn't discuss Andrea's
approach in depth.

I *thought* that the end conclusion was that we would go with
Christoph's approach pending two things being resolved;

o mmap() support that we agreed on is good
o A clear statement, with logging maybe for users that mounted a large 
  block filesystem that it might blow up and they get to keep both parts
  when it does. Basically, for now it's only suitable in specialised
  environments.

I also thought there was an acknowledgement that long-term, fs-block was
the way to go - possibly using contiguous pages optimistically instead
of virtual mapping the pages. At that point, it would be a general
solution and we could remove the warnings.

Basically, to start out with, this was going to be an SGI-only thing so
they get to rattle out the issues we expect to encounter with large
blocks and help steer the direction of the
more-complex-but-safer-overall fs-block.


When that brushing occured, I thought I made it very clear what the
expectations were and that without fallback they would be taking a risk.
I am not sure if that message actually sank in or not.

That said, the filesystem people can experiement to some extent against
Christoph's approach as long as they don't think they are 100% safe.
Again, their experimenting will help steer the direction of fs-block.


That's the absolute worst case but yes, in theory this can occur and
it's safest to assume the situation will occur somewhere to someone. It
would be difficult to craft an attack to do it but conceivably a machine
running for a long enough time would trigger it particularly if the
large block allocations are GFP_NOIO or GFP_NOFS.


The -mm kernels have patches related to watermarking that will not be
making it to mainline for reasons we don't need to revisit right now.
The lack of the watermarking patches may turn out to be a non-issue but
the point is that what's in mainline is not exactly the same as -mm and
mainline will be running for longer periods of time in a different
environment.

Where we expected to see the the use of this patchset was in specialised
environments *only*. The SGI people can mitigate their mixed
fragmentation problems somewhat by setting slub_min_order ==
large_block_order so that blocks get allocated and freed at the same
size. This is partial way towards Andrea's solution of raising the size
of an order-0 allocation. The point of printing out the warnings at
mount time was not so much for a general user who may miss the logs but
for distributions that consider turning large block use on by default to
discourage them until such time as we have proper fallback in place.


If the mmap() support is poor and going to be an obstacle in the future,
then that is a reason to hold it up. I haven't actually read the mmap()
support patch yet so I have no worthwhile opinion yet.

If the mmap() mess can be agreed on, the large block patchset as it is
could give us important information from the users willing to deal with
this risk about what sort of behaviour to expect. If they find it fails
all the time, then fs-block having the complexity of optimistically
using large pages is not worthwhile either. That is useful data.


This was also brought up at VM Summit but for the benefit of the people
that were not there;

It was emphasised that large block support is not the solution to all
scalability problems. There was a strong emphasis on fixing up the
order-0 uses should be encouraged. In particular, readahead should be
batched so that each page is not individually locked. There were also
other page-related operations that should be done in batch. On a similar
note, it was pointed out that dcache lookup is something that should be
scaled better - possibly before spending too much time on things like
page cache or radix locks.

For scalability, it was also pointed out at some point that heavy users
of large blocks may now find themselves contending on the zone->lock and
they might well find that order-0 pages were what they wanted to use
anyway.


My magic 8-ball is in the garage.

I thought the following plan was sane but I could be la-la

1. Go with large block + explosions to start with
   - Second class feature at this point, not fully supported
   - Experiment in different places to see what it gains (if anything)
2. Get fs-block in slowly over time with the fallback options replacing
   Christophs patches bit by bit
3. Kick away warnings
   - First class feature at this point, fully supported

Independently of that, we would work on order-0 scalability,
particularly readahead and batching operations on ranges of pages as
much as possible.

-- 
Mel "la-la" Gorman

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 2:03 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:01 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:07 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:41 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 11, 7:26 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 8:04 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Wed Sep 12, 4:20 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sat Sep 15, 8:14 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 5:58 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 23, 2:22 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 17, 6:00 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 18, 4:36 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Sat Sep 15, 11:51 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sat Sep 15, 4:14 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Sat Sep 15, 6:30 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 9:54 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Sun Sep 16, 11:08 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 6:48 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 6:51 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 6:06 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Sun Sep 16, 2:50 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 6:56 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 18, 3:31 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 23, 2:56 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Mon Sep 24, 11:39 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Sun Sep 16, 5:31 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 23, 1:50 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 17, 6:03 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Tue Sep 11, 12:02 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 11, 8:05 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:03 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Mel Gorman, (Tue Sep 11, 11:36 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 11, 12:47 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:13 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 11, 3:20 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:11 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 4:42 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 5:41 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 5:52 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Wed Sep 12, 7:06 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 17, 6:10 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Fri Sep 14, 1:52 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 17, 6:05 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 18, 4:42 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Fri Sep 14, 12:10 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Fri Sep 14, 8:31 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 16, 6:38 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Sun Sep 23, 2:49 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 5:35 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 5:48 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Martin J. Bligh, (Wed Sep 12, 10:29 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Tue Sep 11, 8:00 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Wed Sep 12, 7:17 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Fri Sep 14, 2:08 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Fri Sep 14, 2:15 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Goswin von Brederlow, (Fri Sep 14, 8:33 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 17, 6:21 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Tue Sep 18, 3:18 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Linus Torvalds, (Tue Sep 18, 11:50 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Linus Torvalds, (Wed Sep 19, 12:33 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Wed Sep 19, 10:04 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Thu Sep 20, 2:07 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Mon Sep 24, 5:13 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Andrea Arcangeli, (Thu Sep 20, 10:54 am)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Thu Sep 20, 2:11 pm)
Re: [00/41] Large Blocksize Support V7 (adds memmap support), Christoph Lameter, (Thu Sep 13, 10:38 pm)