Use virtual compound pages for the large swap maps. This only works for
swap maps that are smaller than a MAX_ORDER block though. If the swap map
is larger then there is no way around the use of vmalloc.Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
mm/swapfile.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)Index: linux-2.6.25-rc5-mm1/mm/swapfile.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/mm/swapfile.c 2008-03-20 20:32:12.793950570 -0700
+++ linux-2.6.25-rc5-mm1/mm/swapfile.c 2008-03-20 20:37:43.367821147 -0700
@@ -1312,7 +1312,7 @@ asmlinkage long sys_swapoff(const char _
p->flags = 0;
spin_unlock(&swap_lock);
mutex_unlock(&swapon_mutex);
- vfree(swap_map);
+ __free_vcompound(swap_map);
inode = mapping->host;
if (S_ISBLK(inode->i_mode)) {
struct block_device *bdev = I_BDEV(inode);
@@ -1636,13 +1636,13 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;/* OK, set up the swap map and apply the bad block list */
- if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {
+ if (!(p->swap_map = __alloc_vcompound(GFP_KERNEL | __GFP_ZERO,
+ get_order(maxpages * sizeof(short))))) {
error = -ENOMEM;
goto bad_swap;
}error = 0;
- memset(p->swap_map, 0, maxpages * sizeof(short));
for (i = 0; i < swap_header->info.nr_badpages; i++) {
int page_nr = swap_header->info.badpages[i];
if (page_nr <= 0 || page_nr >= swap_header->info.last_page)
@@ -1718,7 +1718,7 @@ bad_swap_2:
if (!(swap_flags & SWAP_FLAG_PREFER))
++least_priority;
spin_unlock(&swap_lock);
- vfree(swap_map);
+ __free_vcompound(swap_map);
if (swap_file)
filp_close(swap_file, NULL);
out:--
--
Have you considered the potential memory wastage from rounding up
to the next page order now? (similar in all the other patches
to change vmalloc). e.g. if the old size was 64k + 1 byte it will
suddenly get 128k now. That is actually not a uncommon situation
in my experience; there are often power of two buffers with
some small headers.A long time ago (in 2.4-aa) I did something similar for module loading
as an experiment to avoid too many TLB misses. The module loader
would first try to get a continuous range in the direct mapping and
only then fall back to vmalloc.But I used a simple trick to avoid the waste problem: it allocated a
continuous range rounded up to the next page-size order and then freed
the excess pages back into the page allocator. That was called
alloc_exact(). If you replace vmalloc with alloc_pages you should
use something like that too I think.-Andi
--
One way of dealing with it would be to define an additional allocation
variant that allows the limiting of the loss? I noted that both the swap
and the wait tables vary significantly between allocations. So we could
specify an upper boundary of a loss that is acceptable. If too much memory
would be lost then use vmalloc unconditionally.---
include/linux/vmalloc.h | 12 ++++++++----
mm/page_alloc.c | 4 ++--
mm/swapfile.c | 4 ++--
mm/vmalloc.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 46 insertions(+), 8 deletions(-)Index: linux-2.6.25-rc5-mm1/include/linux/vmalloc.h
===================================================================
--- linux-2.6.25-rc5-mm1.orig/include/linux/vmalloc.h 2008-03-24 12:51:47.457231129 -0700
+++ linux-2.6.25-rc5-mm1/include/linux/vmalloc.h 2008-03-24 12:52:05.449313572 -0700
@@ -88,14 +88,18 @@ extern void free_vm_area(struct vm_struc
/*
* Support for virtual compound pages.
*
- * Calls to vcompound alloc will result in the allocation of normal compound
- * pages unless memory is fragmented. If insufficient physical linear memory
- * is available then a virtually contiguous area of memory will be created
- * using the vmalloc functionality.
+ * Calls to vcompound_alloc and friends will result in the allocation of
+ * a normal physically contiguous compound page unless memory is fragmented.
+ * If insufficient physical linear memory is available then a virtually
+ * contiguous area of memory will be created using vmalloc.
*/
struct page *alloc_vcompound(gfp_t flags, int order);
+struct page *alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss);
void free_vcompound(struct page *);
void *__alloc_vcompound(gfp_t flags, int order);
+void *__alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss);
void __free_vcompound(void *addr);
struct page *vcompound_head_page(const void *x);Index: linux-2.6.25-rc5-mm1/mm...
Right. It just requires a page allocator rewrite. Which is overdue
Well. Guess we need a definition of preserving memory. All allocations
typically have some kind of overhead.--
Not when the trick of getting high order, returning left over pages
is used. I meant just updating the GFP_COMPOUND code to always
use number of pages instead of order so that it could deal with a compound
where the excess pages are already returned. That is not actually that
much work (I reimplemented this recently for dma alloc and it's < 20 LOC)Of course the full rewrite would be also great, agreed :)
-Andi
--
Would you post the patch here?
--
That trick is still in use for alloc_large_system_hash....
But cutting off the tail of compound pages would make treating them as
order N pages difficult. The vmalloc fallback situation is easy to deal
with.Maybe we can think about making compound pages being N consecutive pages
of PAGE_SIZE rather than an order O page? The api would be a bit
different then and it would require changes to the page allocator. More
fragmentation if pages like that are freed.--
| debian developer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Linus Torvalds | Re: Slow DOWN, please!!! |
| Tony Lindgren | [PATCH 37/90] ARM: OMAP: MPUIO wake updates |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
