[00/17] [RFC] Virtual Compound Page Support

Previous thread: [04/17] vmalloc: clean up page array indexing by Christoph Lameter on Tuesday, September 18, 2007 - 8:36 pm. (1 message)

Next thread: [01/17] Vmalloc: Move vmalloc_to_page to mm/vmalloc. by Christoph Lameter on Tuesday, September 18, 2007 - 8:36 pm. (1 message)
From: Christoph Lameter
Date: Tuesday, September 18, 2007 - 8:36 pm

Currently there is a strong tendency to avoid larger page allocations in
the kernel because of past fragmentation issues and the current
defragmentation methods are still evolving. It is not clear to what extend
they can provide reliable allocations for higher order pages (plus the
definition of "reliable" seems to be in the eye of the beholder).

Currently we use vmalloc allocations in many locations to provide a safe
way to allocate larger arrays. That is due to the danger of higher order
allocations failing. Virtual Compound pages allow the use of regular
page allocator allocations that will fall back only if there is an actual
problem with acquiring a higher order page.

This patch set provides a way for a higher page allocation to fall back.
Instead of a physically contiguous page a virtually contiguous page
is provided. The functionality of the vmalloc layer is used to provide
the necessary page tables and control structures to establish a virtually
contiguous area.

Advantages:

- If higher order allocations are failing then virtual compound pages
  consisting of a series of order-0 pages can stand in for those
  allocations.

- "Reliability" as long as the vmalloc layer can provide virtual mappings.

- Ability to reduce the use of vmalloc layer significantly by using
  physically contiguous memory instead of virtual contiguous memory.
  Most uses of vmalloc() can be converted to page allocator calls.

- The use of physically contiguous memory instead of vmalloc may allow the
  use larger TLB entries thus reducing TLB pressure. Also reduces the need
  for page table walks.

Disadvantages:

- In order to use fall back the logic accessing the memory must be
  aware that the memory could be backed by a virtual mapping and take
  precautions. virt_to_page() and page_address() may not work and
  vmalloc_to_page() and vmalloc_address() (introduced through this
  patch set) may have to be called.

- Virtual mappings are less efficient than physical mappings.
  ...
From: Anton Altaparmakov
Date: Wednesday, September 19, 2007 - 12:34 am

Hi Christoph,


I like this a lot.  It will get rid of all the silly games we have to  
play when needing both large allocations and efficient allocations  
where possible.  In NTFS I can then just allocated higher order pages  
instead of having to mess about with the allocation size and  
allocating a single page if the requested size is <= PAGE_SIZE or  
using vmalloc() if the size is bigger.  And it will make it faster  
because a lot of the time a higher order page allocation will succeed  
with your patchset without resorting to vmalloc() so that will be a  
lot faster.

So where I currently have fs/ntfs/malloc.h the below mess I could get  
rid of it completely and just use the normal page allocator/ 
deallocator instead...

static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
{
         if (likely(size <= PAGE_SIZE)) {
                 BUG_ON(!size);
                 /* kmalloc() has per-CPU caches so is faster for  
now. */
                 return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
                 /* return (void *)__get_free_page(gfp_mask); */
         }
         if (likely(size >> PAGE_SHIFT < num_physpages))
                 return __vmalloc(size, gfp_mask, PAGE_KERNEL);
         return NULL;
}

And other places in the kernel can make use of the same.  I think XFS  
does very similar things to NTFS in terms of larger allocations at  
least and there are probably more places I don't know about off the  
top of my head...

I am looking forward to your patchset going into mainline.  (-:

Best regards,


Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


-

From: Eric Dumazet
Date: Wednesday, September 19, 2007 - 1:34 am

On Wed, 19 Sep 2007 08:34:47 +0100

Sure, it sounds *really* good. But...

1) Only power of two allocations are good candidates, or we waste RAM

2) On i386 machines, we have a small vmalloc window. (128 MB default value)
  Many servers with >4GB memory (PAE) like to boot with vmalloc=32M option to get 992MB of LOWMEM.
  If we allow some slub caches to fallback to vmalloc land, we'll have problems to tune this.

3) A fallback to vmalloc means an allocation of one vm_struct per compound page.

4) vmalloc() currently uses a linked list of vm_struct. Might need something more scalable.

-

From: Christoph Lameter
Date: Wednesday, September 19, 2007 - 10:33 am

We would first do the vmalloc conversion to GFP_VFALLBACK which would 
reduce the vmalloc requirements of drivers and core significantly. The 
patchset should actually reduce the vmalloc space requirements 
significantly. They are only needed in situations where the page allocator 
cannot provide a contiguous mapping and that gets rarer the better Mel's 

If its rarely used then its not that big of a deal. The better the anti 
fragmentation measures the less vmalloc use.
-

From: Andi Kleen
Date: Wednesday, September 19, 2007 - 1:24 am

Christoph Lameter <clameter@sgi.com> writes:

It seems like a good idea simply because the same functionality
is already open coded in a couple of places and unifying

Is there a reason this needs to be a GFP flag versus a wrapper
around alloc_page/free_page ?  page_alloc.c is already too complicated
and it's better to keep new features separated. The only drawback
would be that free_pages would need a different call, but that
doesn't seem like a big problem.

Especially integrating it into slab would seem wrong to me.
slab is already too complicated and for users who need that
large areas page granuality rounding to pages is probably fine.

Also such a wrapper could do the old alloc_page_exact() trick:
instead of always rounding up to next order return the left over
pages to the VM. In some cases this can save significant memory.

I'm also a little dubious about your attempts to do vmalloc in
interrupt context. Is that really needed? GFP_ATOMIC allocations of
large areas seem to be extremly unreliable to me and not design. Even
if it works sometimes free probably wouldn't work there due to the
flushes, which is very nasty. It would be better to drop that.

-

From: Christoph Lameter
Date: Wednesday, September 19, 2007 - 10:36 am

I tried to make this a wrapper but there is a lot of logic in 
__alloc_pages() that would have to be replicated. Also there are specific 
places in __alloc_pages() were we can establish that we have enough memory
but its the memory fragmentation that prevents us from satisfying the 

The flushes are only done on virtuall mapped architectures (xtensa) and 
are simple ASM code that can run in an interrupt context
-

Previous thread: [04/17] vmalloc: clean up page array indexing by Christoph Lameter on Tuesday, September 18, 2007 - 8:36 pm. (1 message)

Next thread: [01/17] Vmalloc: Move vmalloc_to_page to mm/vmalloc. by Christoph Lameter on Tuesday, September 18, 2007 - 8:36 pm. (1 message)