Re: [patch 02/41] cpu alloc: The allocator

Previous thread: [patch 07/41] cpu alloc: Workqueue conversion by Christoph Lameter on Thursday, May 29, 2008 - 8:56 pm. (1 message)

Next thread: [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator by Christoph Lameter on Thursday, May 29, 2008 - 8:56 pm. (6 messages)
From: Christoph Lameter
Date: Thursday, May 29, 2008 - 8:56 pm

The per cpu allocator allows dynamic allocation of memory on all
processors simultaneously. A bitmap is used to track used areas.
The allocator implements tight packing to reduce the cache footprint
and increase speed since cacheline contention is typically not a concern
for memory mainly used by a single cpu. Small objects will fill up gaps
left by larger allocations that required alignments.

The size of the cpu_alloc area can be changed via make menuconfig.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/percpu.h |   46 +++++++++++++
 include/linux/vmstat.h |    2 
 mm/Kconfig             |    6 +
 mm/Makefile            |    2 
 mm/cpu_alloc.c         |  167 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/vmstat.c            |    1 
 6 files changed, 222 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h	2008-05-29 19:41:21.000000000 -0700
+++ linux-2.6/include/linux/vmstat.h	2008-05-29 20:15:37.000000000 -0700
@@ -37,7 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		FOR_ALL_ZONES(PGSCAN_KSWAPD),
 		FOR_ALL_ZONES(PGSCAN_DIRECT),
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
-		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+		PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig	2008-05-29 19:41:21.000000000 -0700
+++ linux-2.6/mm/Kconfig	2008-05-29 20:13:39.000000000 -0700
@@ -205,3 +205,9 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config CPU_ALLOC_SIZE
+	int "Size of cpu alloc area"
+	default "30000"
+	help
+	  Sets the maximum amount of memory that can be allocated via cpu_alloc
Index: ...
From: Andrew Morton
Date: Thursday, May 29, 2008 - 9:58 pm

strange choice of a default?  I guess it makes it clear that there's no

Perhaps it should return UNIT_TYPE? (ugh).







This is kinda bitmap_find_free_region(), only bitmap_find_free_region()
isn't quite strong enough.

Generally I think it would have been better if you had added new
primitives to the bitmap library (or enhanced existing ones) and used

If this assertion triggers for someone, you'll wish like hell that it

eek, a major interface function which is ALL IN CAPS!



--

From: Christoph Lameter
Date: Thursday, May 29, 2008 - 10:10 pm

The cpu alloc has a cpu_bytes field in vmstat that shows how much memory 
is being used. 30000 seemed to be reasonable after staring at these 

No. The UNIT_TYPE is the basic allocation unit. This returns the number of 

We could go to finer or coarser grained someday? Maybe if the area becomes 

size_to_units is fairly basic for most of the logic. These are variaables 




The scope of the patchset is already fairly large. The search here is 
different and not performance critical. Not sure if this is useful for 


No. This is a macro and therefore uppercase (there is macro magic going on 
that ppl need to be aware of). AFAICR you wanted it this way last year. C 
function not possible because of the type checking.

--

From: Andrew Morton
Date: Thursday, May 29, 2008 - 10:31 pm

I think that strengthening bitmap_find_free_region() would end up

urgh.  This is a C-convention versus kernel-convention thing.  The C
convention exists for very good reasons.  But it sure does suck.

What do others think?
--

From: Paul Jackson
Date: Monday, June 2, 2008 - 2:29 am

A few, key symbols get to be special ... short but distinctive names
that become (in)famous.  The classic was "u", for the per-user
structure, aka the "user area", in old Unix kernels.  In people's
names, a few one word or first names such as "Ike", "Madonna", "Ali",
"Tiger", "Cher", "Mao", "OJ", "Plato", "Linus", ... have become
distinctive and well known to many people.

How about "_pcpu", instead of CPU_PTR?  "_pcpu" is a short, unique
(not currently in use) symbol that, tersely, says what we want to say.

Yes - it violates multiple conventions.  "The Boss" (Bruce Springsteen)
gets to do that.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214
--

From: KAMEZAWA Hiroyuki
Date: Thursday, May 29, 2008 - 10:56 pm

On Thu, 29 May 2008 22:10:25 -0700 (PDT)

30000 is suitable for both of 32bits/64bits arch ?

Thanks,
-kame

--

From: Christoph Lameter
Date: Thursday, May 29, 2008 - 11:16 pm

Was developed on 64bit. As ususal 32 bit was an afterthought.

--

From: Mike Travis
Date: Wednesday, June 4, 2008 - 7:48 am

Andrew Morton wrote:

Hi Andrew,

I've sort of inherited this now...

So are you suggesting that we add new bitmap primitives to:

	bitmap_fill_offset(bitmap, start, nbits)   /* start at bitmap[start] */
	bitmap_zero_offset(bitmap, start, nbits)
	bitmap_find_free_area(bitmap, nbits, size, align)  /* size not order */

Thanks,
Mike 
--

From: Eric Dumazet
Date: Thursday, May 29, 2008 - 10:04 pm

area[] is not guaranteed to be aligned on anything but 4 bytes.

If someone then needs to call cpu_alloc(8, GFP_KERNEL, 8), it might get 
an non aligned result.

Either you should add an __attribute__((__aligned__(PAGE_SIZE))),
or take into account the real address of area[] in cpu_alloc() to avoid 
waste of up to PAGE_SIZE bytes




--

From: Christoph Lameter
Date: Thursday, May 29, 2008 - 10:20 pm

I think cacheline aligning should be sufficient. People should not 
allocate large page aligned objects here.
--

From: Rusty Russell
Date: Thursday, May 29, 2008 - 10:52 pm

I vaguely recall there were issues with this in the module code.  They might 
be gone now, but failing to meet alignment contraints without a big warning 
would suck.

But modifying your code to consider the actual alignment is actually pretty 
trivial, AFAICT.

Cheers,
Rusty.
--

From: Mike Travis
Date: Wednesday, June 4, 2008 - 8:30 am

So paraphrasing my earlier email, we should add:

	bitmap_find_free_area(bitmap, nbits, size, align, alignbase)

so that > cacheline alignment is possible?

My thinking is that if we do go to true dynamically sized cpu_alloc area then
allocating PAGE_SIZE units may be both practical and worthwhile...?

Thanks,
Mike
--

From: Rusty Russell
Date: Thursday, June 5, 2008 - 4:48 pm

[Empty message]
From: Eric Dumazet
Date: Thursday, May 29, 2008 - 10:54 pm

Hum, maybe, but then we broke modules that might request up to PAGE_SIZE 
alignement for their percpu section,
if I read your 3rd patch correctly.

Taking into account the ((unsigned long)area & (PAGE_SIZE-1)) offset in 
cpu_alloc()
should give up to PAGE_SIZE alignment for free.





--

From: Mike Travis
Date: Wednesday, June 4, 2008 - 7:58 am

I'm a bit confused.  Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
ifdef MODULE?

        #ifdef MODULE
        #define SHARED_ALIGNED_SECTION ".data.percpu"
        #else
        #define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
        #endif

        #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)                       \
                __attribute__((__section__(SHARED_ALIGNED_SECTION)))            \
                PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name             \
                ____cacheline_aligned_in_smp

Thanks,
Mike
--

From: Eric Dumazet
Date: Wednesday, June 4, 2008 - 8:11 am

Because we had crashes when loading oprofile module, when a previous version of oprofile
used to use DEFINE_PER_CPU_SHARED_ALIGNED variable

module loader only takes into account the special section ".data.percpu" and ignores ".data.percpu.shared_aligned"

I therefore submitted two patches :

1) commit 8b8b498836942c0c855333d357d121c0adeefbd9
oprofile: don't request cache line alignment for cpu_buffer

Alignment was previously requested because cpu_buffer was an [NR_CPUS]
array, to avoid cache line sharing between CPUS.

After commit 608dfddd845da5ab6accef70154c8910529699f7 (oprofile: change
cpu_buffer from array to per_cpu variable ), we dont need to force an
alignement anymore since cpu_buffer sits in per_cpu zone.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


2) and commit 	44c81433e8b05dbc85985d939046f10f95901184
per_cpu: fix DEFINE_PER_CPU_SHARED_ALIGNED for modules

Current module loader lookups ".data.percpu" ELF section to perform
per_cpu relocation.  But DEFINE_PER_CPU_SHARED_ALIGNED() uses another
section (".data.percpu.shared_aligned"), currently only handled in
vmlinux.lds, not by module loader.

To correct this problem, instead of adding logic into module loader, or
using at build time a module.lds file for all arches to group
".data.percpu.shared_aligned" into ".data.percpu", just use ".data.percpu"
for modules.

Alignment requirements are correctly handled by ld and module loader.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>



--

From: Rusty Russell
Date: Thursday, June 5, 2008 - 5:32 pm

Put one way, putting page-aligned per-cpu data in a separate section is a 
space-saving hack: one which is not really required for modules because of 
the low frequency of such variables.  Put another way, not respecting 
the .data.percpu.shared_aligned section in modules is a bug.

But a comment would probably be nice!

Cheers,
Rusty.
--

From: Christoph Lameter
Date: Tuesday, June 10, 2008 - 10:33 am

Looks wrong to me. There can be shared objects even without modules.

--

From: Eric Dumazet
Date: Tuesday, June 10, 2008 - 11:05 am

Well, MODULE is not CONFIG_MODULES :)

If compiling an object that is going to be statically linked to kernel, 
MODULE is not defined, so we have shared objects.

When compiling a module, we cannot *yet* use .data.percpu.shared_aligned 
section, since module loader wont handle this section.

Alternative is to change modules linking for all arches to merge 
.data.percpu{*} subsections correctly, or tell module loader to take 
into account all .data.percpu sections.

AFAIK no module uses DEFINE_PER_CPU_SHARED_ALIGNED() yet...



--

From: Christoph Lameter
Date: Tuesday, June 10, 2008 - 11:28 am

Ahhh. Makes sense. Add a comment to explain this?

--

From: Rusty Russell
Date: Thursday, May 29, 2008 - 10:46 pm

Allocator seems nice and simple, similar to existing one in module.c (which 
predates cool bitmap operators).

Being able to do per-cpu allocations in an interrupt handler seems like 
enouraging a Bad Idea though: I'd be tempted to avoid the flags word, always 
zero, and use a mutex instead of a spinlock.

Cheers,


--

From: Mike Travis
Date: Wednesday, June 4, 2008 - 8:04 am

I haven't seen any further discussion on these aspects... is there a consensus
to remove the flags from CPU_ALLOC() and use a mutex?

Thanks,
Mike
--

From: Christoph Lameter
Date: Tuesday, June 10, 2008 - 10:34 am

We want to have extensable per cpu areas. This means you need an 
allocation context. So we need to keep the flags. Mutex is not a bad idea.


--

From: Pavel Machek
Date: Saturday, May 31, 2008 - 1:58 pm

Missing . at end of line, but more important:

How do you expect user to answer this?!

I mean, can they put 0 there and expect kernel to work? If not, you
should disallow that config. And you shoudl really explain how much
memory is needed... and perhaps mention that size is in bytes?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

Previous thread: [patch 07/41] cpu alloc: Workqueue conversion by Christoph Lameter on Thursday, May 29, 2008 - 8:56 pm. (1 message)

Next thread: [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator by Christoph Lameter on Thursday, May 29, 2008 - 8:56 pm. (6 messages)