The per cpu allocator allows dynamic allocation of memory on all
processors simultaneously. A bitmap is used to track used areas.
The allocator implements tight packing to reduce the cache footprint
and increase speed since cacheline contention is typically not a concern
for memory mainly used by a single cpu. Small objects will fill up gaps
left by larger allocations that required alignments.
The size of the cpu_alloc area can be changed via make menuconfig.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/percpu.h | 46 +++++++++++++
include/linux/vmstat.h | 2
mm/Kconfig | 6 +
mm/Makefile | 2
mm/cpu_alloc.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++
mm/vmstat.c | 1
6 files changed, 222 insertions(+), 2 deletions(-)
Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h 2008-05-29 19:41:21.000000000 -0700
+++ linux-2.6/include/linux/vmstat.h 2008-05-29 20:15:37.000000000 -0700
@@ -37,7 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGSCAN_KSWAPD),
FOR_ALL_ZONES(PGSCAN_DIRECT),
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
- PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
#endif
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2008-05-29 19:41:21.000000000 -0700
+++ linux-2.6/mm/Kconfig 2008-05-29 20:13:39.000000000 -0700
@@ -205,3 +205,9 @@ config NR_QUICK
config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config CPU_ALLOC_SIZE
+ int "Size of cpu alloc area"
+ default "30000"
+ help
+ Sets the maximum amount of memory that can be allocated via cpu_alloc
Index: ...strange choice of a default? I guess it makes it clear that there's no Perhaps it should return UNIT_TYPE? (ugh). This is kinda bitmap_find_free_region(), only bitmap_find_free_region() isn't quite strong enough. Generally I think it would have been better if you had added new primitives to the bitmap library (or enhanced existing ones) and used If this assertion triggers for someone, you'll wish like hell that it eek, a major interface function which is ALL IN CAPS! --
The cpu alloc has a cpu_bytes field in vmstat that shows how much memory is being used. 30000 seemed to be reasonable after staring at these No. The UNIT_TYPE is the basic allocation unit. This returns the number of We could go to finer or coarser grained someday? Maybe if the area becomes size_to_units is fairly basic for most of the logic. These are variaables The scope of the patchset is already fairly large. The search here is different and not performance critical. Not sure if this is useful for No. This is a macro and therefore uppercase (there is macro magic going on that ppl need to be aware of). AFAICR you wanted it this way last year. C function not possible because of the type checking. --
I think that strengthening bitmap_find_free_region() would end up urgh. This is a C-convention versus kernel-convention thing. The C convention exists for very good reasons. But it sure does suck. What do others think? --
A few, key symbols get to be special ... short but distinctive names
that become (in)famous. The classic was "u", for the per-user
structure, aka the "user area", in old Unix kernels. In people's
names, a few one word or first names such as "Ike", "Madonna", "Ali",
"Tiger", "Cher", "Mao", "OJ", "Plato", "Linus", ... have become
distinctive and well known to many people.
How about "_pcpu", instead of CPU_PTR? "_pcpu" is a short, unique
(not currently in use) symbol that, tersely, says what we want to say.
Yes - it violates multiple conventions. "The Boss" (Bruce Springsteen)
gets to do that.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--
On Thu, 29 May 2008 22:10:25 -0700 (PDT) 30000 is suitable for both of 32bits/64bits arch ? Thanks, -kame --
Was developed on 64bit. As ususal 32 bit was an afterthought. --
Andrew Morton wrote: Hi Andrew, I've sort of inherited this now... So are you suggesting that we add new bitmap primitives to: bitmap_fill_offset(bitmap, start, nbits) /* start at bitmap[start] */ bitmap_zero_offset(bitmap, start, nbits) bitmap_find_free_area(bitmap, nbits, size, align) /* size not order */ Thanks, Mike --
area[] is not guaranteed to be aligned on anything but 4 bytes. If someone then needs to call cpu_alloc(8, GFP_KERNEL, 8), it might get an non aligned result. Either you should add an __attribute__((__aligned__(PAGE_SIZE))), or take into account the real address of area[] in cpu_alloc() to avoid waste of up to PAGE_SIZE bytes --
I think cacheline aligning should be sufficient. People should not allocate large page aligned objects here. --
I vaguely recall there were issues with this in the module code. They might be gone now, but failing to meet alignment contraints without a big warning would suck. But modifying your code to consider the actual alignment is actually pretty trivial, AFAICT. Cheers, Rusty. --
So paraphrasing my earlier email, we should add: bitmap_find_free_area(bitmap, nbits, size, align, alignbase) so that > cacheline alignment is possible? My thinking is that if we do go to true dynamically sized cpu_alloc area then allocating PAGE_SIZE units may be both practical and worthwhile...? Thanks, Mike --
Hum, maybe, but then we broke modules that might request up to PAGE_SIZE alignement for their percpu section, if I read your 3rd patch correctly. Taking into account the ((unsigned long)area & (PAGE_SIZE-1)) offset in cpu_alloc() should give up to PAGE_SIZE alignment for free. --
I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
ifdef MODULE?
#ifdef MODULE
#define SHARED_ALIGNED_SECTION ".data.percpu"
#else
#define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
#endif
#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
__attribute__((__section__(SHARED_ALIGNED_SECTION))) \
PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name \
____cacheline_aligned_in_smp
Thanks,
Mike
--
Because we had crashes when loading oprofile module, when a previous version of oprofile used to use DEFINE_PER_CPU_SHARED_ALIGNED variable module loader only takes into account the special section ".data.percpu" and ignores ".data.percpu.shared_aligned" I therefore submitted two patches : 1) commit 8b8b498836942c0c855333d357d121c0adeefbd9 oprofile: don't request cache line alignment for cpu_buffer Alignment was previously requested because cpu_buffer was an [NR_CPUS] array, to avoid cache line sharing between CPUS. After commit 608dfddd845da5ab6accef70154c8910529699f7 (oprofile: change cpu_buffer from array to per_cpu variable ), we dont need to force an alignement anymore since cpu_buffer sits in per_cpu zone. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2) and commit 44c81433e8b05dbc85985d939046f10f95901184 per_cpu: fix DEFINE_PER_CPU_SHARED_ALIGNED for modules Current module loader lookups ".data.percpu" ELF section to perform per_cpu relocation. But DEFINE_PER_CPU_SHARED_ALIGNED() uses another section (".data.percpu.shared_aligned"), currently only handled in vmlinux.lds, not by module loader. To correct this problem, instead of adding logic into module loader, or using at build time a module.lds file for all arches to group ".data.percpu.shared_aligned" into ".data.percpu", just use ".data.percpu" for modules. Alignment requirements are correctly handled by ld and module loader. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --
Put one way, putting page-aligned per-cpu data in a separate section is a space-saving hack: one which is not really required for modules because of the low frequency of such variables. Put another way, not respecting the .data.percpu.shared_aligned section in modules is a bug. But a comment would probably be nice! Cheers, Rusty. --
Looks wrong to me. There can be shared objects even without modules. --
Well, MODULE is not CONFIG_MODULES :)
If compiling an object that is going to be statically linked to kernel,
MODULE is not defined, so we have shared objects.
When compiling a module, we cannot *yet* use .data.percpu.shared_aligned
section, since module loader wont handle this section.
Alternative is to change modules linking for all arches to merge
.data.percpu{*} subsections correctly, or tell module loader to take
into account all .data.percpu sections.
AFAIK no module uses DEFINE_PER_CPU_SHARED_ALIGNED() yet...
--
Ahhh. Makes sense. Add a comment to explain this? --
Allocator seems nice and simple, similar to existing one in module.c (which predates cool bitmap operators). Being able to do per-cpu allocations in an interrupt handler seems like enouraging a Bad Idea though: I'd be tempted to avoid the flags word, always zero, and use a mutex instead of a spinlock. Cheers, --
I haven't seen any further discussion on these aspects... is there a consensus to remove the flags from CPU_ALLOC() and use a mutex? Thanks, Mike --
We want to have extensable per cpu areas. This means you need an allocation context. So we need to keep the flags. Mutex is not a bad idea. --
Missing . at end of line, but more important: How do you expect user to answer this?! I mean, can they put 0 there and expect kernel to work? If not, you should disallow that config. And you shoudl really explain how much memory is needed... and perhaps mention that size is in bytes? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
