login
Header Space

 
 

Re: [rfc][patch 2/3] slab: introduce SMP alignment

Previous thread: Re: [PATCH] V4L/DVB: tuner and radio addresses are missing for the PixelView PlayTV card by Wojciech Migda on Monday, March 3, 2008 - 4:35 am. (1 message)

Next thread: Re: [PATCH] ACPI: Unneccessary to scan the PCI bus already scanned. by Yinghai Lu on Monday, March 3, 2008 - 6:01 am. (10 messages)
To: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:34 am

SLUB should pack even small objects nicely into cachelines if that is what
has been asked for. Use the same algorithm as SLAB for this.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
---
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c
+++ linux-2.6/mm/slub.c
@@ -1896,12 +1896,15 @@ static unsigned long calculate_alignment
 	 * specified alignment though. If that is greater
 	 * then use it.
 	 */
-	if ((flags &amp; SLAB_HWCACHE_ALIGN) &amp;&amp;
-			size &gt; cache_line_size() / 2)
-		return max_t(unsigned long, align, cache_line_size());
+	if (flags &amp; SLAB_HWCACHE_ALIGN) {
+		unsigned long ralign = cache_line_size();
+		while (size &lt;= ralign / 2)
+			ralign /= 2;
+		align = max(align, ralign);
+	}
 
 	if (align &lt; ARCH_SLAB_MINALIGN)
-		return ARCH_SLAB_MINALIGN;
+		align = ARCH_SLAB_MINALIGN;
 
 	return ALIGN(align, sizeof(void *));
 }
--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 3:08 pm

Why? SLAB does not cacheline align objects smaller than cache_line_size() 
/2. If they are already not cache line aligned then we can make them as 
dense as possible. That is what SLUB does.

--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:06 pm

Because when you specify HWCACHE_ALIGN, it means that you want the object
not to cross cacheline boundaries for at least cache_line_size() bytes.
SLAB does this. SLUB does not without this patch.

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:10 pm

HWCACHE_ALIGN means that you want the object to be aligned at 
cacheline boundaries for optimization. Why does crossing cacheline 
boundaries matter in this case?

--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:17 pm

No, HWCACHE_ALIGN means that you want the object not to cross cacheline
boundaries for at least cache_line_size() bytes. You invented new
semantics with SLUB, but all the callers were written expecting the
existing semantics, presumably. So we should retain that behaviour, and
if you disagree with the actual callers, then that is fine but you
should discuss each case with the relevant people.

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:16 pm

Interesting new definition....


--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Tuesday, March 4, 2008 - 8:06 pm

Huh?? It is not a new definition, it is exactly what SLAB does. And
then you go and do something different and claim that you follow
what slab does.

--
To: <npiggin@...>
Cc: <clameter@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Tuesday, March 4, 2008 - 8:10 pm

From: Nick Piggin &lt;npiggin@suse.de&gt;

I completely agree with Nick.
--
To: David Miller <davem@...>
Cc: <npiggin@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Wednesday, March 5, 2008 - 5:06 pm

So you also want subalignment because of cacheline crossing for 24 byte 
slabs? We then only have 2 objects per cacheline instead of 3 but no 
crossing anymore.

Well okay if there are multiple requests then lets merge Nick's patch that 
does this. Still think that this will do much ...
Instead of 170 we will only have 128 objects per slab (64 byte 
cacheline).

It will affect the following slab caches (mm) reducing the density of 
objects. 

scsi_bidi_sdb numa_policy fasync_cache xfs_bmap_free_item xfs_dabuf 
fstrm_item dm_target_io

Nothing related to networking....
--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Wednesday, March 5, 2008 - 10:57 pm

That's what callers expect when they pass the HWCACHE flag. Wouldn't it
be logical to fix the callers if you think it costs too much memory with

There looks like definitely some networking slabs that pass the flag
and can be non-power-of-2. And don't forget cachelines can be anywhere
up to 256 bytes in size. So yeah it definitely makes sense to merge
the patch and then examine the callers if you feel strongly about it.


--
To: Nick Piggin <npiggin@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 6:56 pm

Just do not like to add fluff that has basically no effect. I just tried 
to improve things by not doing anything special if we cannot cacheline 
align object. Least surprise (at least for me). It is bad enough that we 
just decide to ignore the request for alignment for small caches.




--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 10:23 pm

That's just because you (apparently still) have a misconception about what
the flag is supposed to be for. It is not for aligning things to the start
of a cacheline boundary. It is not for avoiding false sharing on SMP. It
is for ensuring that a given object will span the fewest number of
cachelines. This can actually be important if you do anything like random
lookups or tree walks where the object contains the tree node.

Consider a 64 byte cacheline, and a 24 byte object:
cacheline  |-------|-------|-------
object     |--|--|--|--|--|--|--|--

So if you touch 8 random objects, it is statistically likely to cost you
10 cache misses (so long as the working set is sufficiently cold / larger
than cache that cacheline sharing is insignificant).

If you actually honour HWCACHE_ALIGN, then the same object will be 32
bytes:
cacheline  |-------|
object     |---|---|

Now 8 will cost 8. A 20% saving. Maybe almost a 20% performance improvement.

Before we go around in circles again, do you accept this? If yes, then
what is your argument that SLUB knows better than the caller; if no, then
why not?

--
To: Nick Piggin <npiggin@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 10:26 pm

The alignment of the object to the start of a cacheline is the obvious 
meaning and that is also reflected in the comment in slab.h.

--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 10:32 pm

It doesn't say start of cache line. It says align them *on* cachelines.
2 32 byte objects on a 64 byte cacheline are aligned on the cacheline.
2.67 24 bytes objects on a 64 byte cacheline are not aligned on the
cacheline.

Anyway, if you want to be myopic about it, then good luck with that.

--
To: Nick Piggin <npiggin@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 10:54 pm

2 32 byte objects means only one is aligned on a cache line.

Certainly cacheline contention is reduced and performance potentially 
increased if there are less objects in a cacheline.

The same argument can be made of aligning 8 byte objects on 32 byte 
boundaries. Instead of 8 objects per cacheline you only have two. Why 8?

Isnt all of this a bit arbitrary and contrary to the intend of avoiding 
cacheline contention?

The cleanest solution to this is to specify the alignment for each 
slabcache if such an alignment is needed. The alignment can be just a 
global constant or function like cache_line_size(). 

I.e. define

	int smp_align;

On bootup check the number of running cpus and then pass smp_align as
the align parameter (most slabcaches have no other alignment needs, if 
they do then we can combine these using max).

If we want to do more sophisticated things then lets have function that
aligns the object on power of two boundaries like SLAB_HWCACHE_ALIGN does
now:

	sub_cacheline_align(size)

Doing so will make it more transparent as to what is going on and which 
behavior we want. And we can get rid of SLAB_HWCACHE_ALIGN with the weird
semantics. Specifying smp_align wil truly always align on a cacheline 
boundary if we are on an SMP system.

--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 11:10 pm

No, it *is not about avoiding cacheline contention*. As such, the rest
of what you wrote below about smp_align etc is rubbish.

Can you actually read what I posted?

--
To: Nick Piggin <npiggin@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 11:18 pm

I am trying but it barely makes sense. You first state that it 
is *not* about avoiding cacheline contention and then argue that it 
reduces cacheline contention.

--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 11:22 pm

Where do I argue that it reduces cacheline contention? All arguments
I made apply to a UP system equally.

--
To: Nick Piggin <npiggin@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Thursday, March 6, 2008 - 11:58 pm

Huh? I thought you wanted to avoid cacheline alignment on UP to save 
memory?


--
To: Christoph Lameter <clameter@...>
Cc: David Miller <davem@...>, <netdev@...>, <linux-kernel@...>, <yanmin_zhang@...>, <dada1@...>
Date: Friday, March 7, 2008 - 12:05 am

We're talking about HWCACHE_ALIGN in this thread. SMP_ALIGN is for
reducing cacheline contention and that's where you can avoid it on UP.


--
To: Christoph Lameter <clameter@...>
Cc: Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:30 pm

Hi,



Well, not my definition either but SLAB has guaranteed that for small
objects in the past, so I think Nick has a point here. However, with
all this back and forth, I've lost track why this matters. I suppose
it causes regression on some workload?

                                Pekka
--
To: Pekka Enberg <penberg@...>
Cc: Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:32 pm

Well the guarantee can only be exploited if you would check the cacheline 
sizes and the object size from the code that creates the slab cache. 
Basically you would have to guestimate what the slab allocator is doing.

So the guarantee is basically meaningless. If the object is larger than a 
cacheline then this will never work.

--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Tuesday, March 4, 2008 - 8:08 pm

Of course it works. It fits the object into the fewest number of cachelines
possible. If you need to be accessing such objects in a random manner, then
for highest performance you want to touch as few cachelines as possible.

--
To: Christoph Lameter <clameter@...>
Cc: Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:35 pm

Hi,


Yes, I know that. That's why I am asking why this matters. If there's 
some sort of regression because SLUB does HWCACHE_ALIGN bit differently, 
we need to fix that. Not that it necessarily means we have to change 
HWCACHE_ALIGN but I am assuming Nick has some reason why he wants to 
introduce the SMP alignment flag.

			Pekka
--
To: Pekka Enberg <penberg@...>
Cc: Christoph Lameter <clameter@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Tuesday, March 4, 2008 - 8:28 pm

It started out as a SLUB regression that was exposing poor code in the
percpu allocator due to different SLUB kmalloc alignments. That prompted
some further investigation about the alignment handling in the
allocators and showed up this problem with SLUB's HWCACHE_ALIGN. While
I don't know of a regression caused by it as such, it is totally
unreasonable to just change the semantics of it (seemingly for no good
reason). It is used quite a bit in networking for one, and those guys

The SMP flag was just an RFC. I think some people (like Christoph) were
being confused about the HWCACHE_ALIGN flag being for avoiding false
sharing on SMP systems. It would actually be also generally useful to
have the SMP flag (eg. see the sites I added it to in patch #3).
--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Wednesday, March 5, 2008 - 4:56 pm

That was due to SLUB's support for smaller allocation sizes. AFAICT has 

Hmmm. We could define a global constant for that? Determine it on bootup 
and then pass it as an alignment parameter?
 
--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Wednesday, March 5, 2008 - 10:49 pm

The smaller sizes meant objects were less often aligned on cacheline
  
We could, but I'd rather just use the flag.

--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 6:53 pm

Do you have a case in mind where that would be useful? We had a 
SLAB_HWCACHE_MUST_ALIGN or so at some point but it was rarely to never 
used. Note that there is also KMEM_CACHE which picks up the alignment from 
the compiler.


--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:04 pm

It doesn't align small objects to cacheline boundaries in either allocator.
The regression is just because slub can support smaller sizes of objects



Yeah, that's not quite as good either. My allocation flag is dynamic, so
it will not bloat things for no reason on UP machines and SMP kernels. 
It also aligns to the detected machine cacheline size rather than a
compile time constant.

--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:20 pm

Those already have SLAB_HWCACHE_ALIGN.

The point is to switch off alignment for UP? Cant we do that with 
SLAB_HWCACHE_ALIGN too since SLOB seemed to work successfully without it 

Is that really a noteworthy effect?
--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:25 pm

Yes.

--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:27 pm

Numbers?

--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:33 pm

Uhh... big number &gt; small number

--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Thursday, March 6, 2008 - 10:33 pm

Can you quantify the effect?
--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 1:23 am

It obviously won't be a great deal until if/when it starts getting
used more. The space savings on UP are just one reason why my approach
is better than using the static alignment annotations.

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:44 am

Hi Nick,


The patches look good but this changelog is missing context. Why do we
want to do this?


--
To: Pekka Enberg <penberg@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 8:28 am

You're right, the changelog should be improved if/when it is merged.
Actually the context is discussed in another thread just now on lkml

--
To: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:36 am

Use SLAB_SMP_ALIGN in a few places.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
---
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -332,8 +332,8 @@ void __init bdev_cache_init(void)
 {
 	int err;
 	bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
-			0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-				SLAB_MEM_SPREAD|SLAB_PANIC),
+			0, (SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|
+			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_PANIC),
 			init_once);
 	err = register_filesystem(&amp;bd_type);
 	if (err)
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -1547,23 +1547,23 @@ void __init proc_caches_init(void)
 {
 	sighand_cachep = kmem_cache_create("sighand_cache",
 			sizeof(struct sighand_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_DESTROY_BY_RCU,
+			SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|SLAB_PANIC|SLAB_DESTROY_BY_RCU,
 			sighand_ctor);
 	signal_cachep = kmem_cache_create("signal_cache",
 			sizeof(struct signal_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|SLAB_PANIC, NULL);
 	files_cachep = kmem_cache_create("files_cache",
 			sizeof(struct files_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|SLAB_PANIC, NULL);
 	fs_cachep = kmem_cache_create("fs_cache",
 			sizeof(struct fs_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|SLAB_PANIC, NULL);
 	vm_area_cachep = kmem_cache_create("vm_area_struct",
 			sizeof(struct vm_area_struct), 0,
 			SLAB_PANIC, NULL);
 	mm_cachep = kmem_cache_create("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_SMP_ALIGN|SLAB_PANIC, NULL);
 }
 
 /*
Index: linux-2.6/...
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 5:53 am

I dont understand why you added SLAB_SMP_ALIGN, without removing 




--
To: Eric Dumazet <dada1@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 8:41 am

Because I thought that in most of the cases, we also want some cacheline
alignment on UP systems as well because we care about the layout of the
structure WRT the cachelines for the mandatory/capacity miss cases, as
well as wanting to avoid false sharing misses on SMP.

Actually I didn't think _too_ hard about them, possibly some could be
removed. But the problem is that these things do require careful
thought so I should not change them unless I have done that ;)

I guess there are some basic guidelines -- if size is a problem (ie if
there can be lots of these structures), then that is going to be a
factor; if the total pool of objects is likely to be fairly densely
resident in cache, then it will start to favour dense packing rather
--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 9:00 am

Well, if a kmem_cache_create() is used, this is probably because number 
of objects can be large, so kmalloc() power-of-two granularity could 
waste lot of ram.

But yes, you are right that SLAB_SMP_ALIGN doesnt imply SLAB_HWCACHE_ALIGN

- SMP_ALIGN is a hint about false sharing (when object contains a refcnt 
for example), that is a concern only if

num_possible_cpus() &gt; 1

While HWCACHE_ALIGN might be a hint saying :
- The writer carefully designed the structure so that max performance is 
obtained when all objects starts on a cache line boundary, even on 
Uniprocessor.

But I suspect some uses of HWCACHE_ALIGN are not a hint but a strong 
requirement.

Maybe we need to use three flags to separate the meanings ?


SLAB_HINT_SMP_ALIGN
SLAB_HINT_HWCACHE_ALIGN
SLAB_HWCACHE_ALIGN /* strong requirement that two objects dont share a 
cache line */





--
To: Eric Dumazet <dada1@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 9:46 am

Yes. In which case, we are also happy if the objects are small if they
share cachelines (so long as they are still nicely aligned, which slub

Possibly, but I'm beginning to prefer that strong requirements should
request the explicit alignment (they can even use cache_line_size() after
Pekka's patch to make it generic). I don't like how the name implies
that you get a guarantee, however I guess in practice people are using it
more as a hint (or because they vaguely hope it makes their code run
faster :))

So I wouldn't be adverse to a rename...
--
To: Nick Piggin <npiggin@...>
Cc: Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 9:53 am

Hi,


At least historically SLAB_HWCACHE_ALIGN has been just a hint,
although slab tries very hard to satisfy it (see the comments in
mm/slab.c). Why do we need stronger guarantees than that, btw?
--
To: Pekka Enberg <penberg@...>
Cc: Nick Piggin <npiggin@...>, Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 3:09 pm

Its a hint. Alignment is specified as a parameter to kmem_cache_create not 
as a flag. Maybe we could remove SLAB_HWCACHE_ALIGN because it is causing 
so much confusion?
 
--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 4:10 pm

Yeah, that's what I thought too, when I got confused by these new
SLUB semantics that you made up. Actually if you look at SLAB,
it has very precise and rational semantics. SLUB should respect that.

If you really configure for tiny memory footprint, then I'm fine with
it going away. In that respect, it is still a hint (the callers can't
rely on it being a particular alignment), and that also applies for
SLAB_SMP_ALIGN, in case you are concerned that flags must only be hints.

--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 4:12 pm

These crappy semantics are SLAB semantics that SLUB must support.

--
To: Christoph Lameter <clameter@...>
Cc: Pekka Enberg <penberg@...>, Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 4:18 pm

They are only crappy in SLUB. In SLAB they are actually useful.
--
To: Nick Piggin <npiggin@...>
Cc: Pekka Enberg <penberg@...>, Eric Dumazet <dada1@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 5:14 pm

Cannot figure out what you are talking about....
 
--
To: Pekka Enberg <penberg@...>
Cc: Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>
Date: Monday, March 3, 2008 - 10:15 am

This reminds me a previous attempt of removing SLAB_HWCACHE_ALIGN

http://kernel.org/pub/linux/kernel/people/christoph/patch-archive/2007/2.6.21-rc6/remo...

At that time Christoph didnt took into account the CONFIG_SMP thing 
(false sharing avoidance), but also that L1_CACHE_SIZE is a compile 
constant, that can differs with cache_line_size()




--
To: Eric Dumazet <dada1@...>
Cc: Pekka Enberg <penberg@...>, Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>
Date: Monday, March 3, 2008 - 3:10 pm

If cache_line_size() is universally available then we can specify it as 
the alignment parameter to kmem_cache_create(). That would allow removal 
of SLAB_HWCACHE_ALIGN.

--
To: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Christoph Lameter <clameter@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:35 am

Introduce SLAB_SMP_ALIGN, for allocations where false sharing must be
minimised on SMP systems.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
---
Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h
+++ linux-2.6/include/linux/slab.h
@@ -22,7 +22,8 @@
 #define SLAB_RED_ZONE		0x00000400UL	/* DEBUG: Red zone objs in a cache */
 #define SLAB_POISON		0x00000800UL	/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN	0x00002000UL	/* Align objs on cache lines */
-#define SLAB_CACHE_DMA		0x00004000UL	/* Use GFP_DMA memory */
+#define SLAB_SMP_ALIGN		0x00004000UL	/* Align on cachelines for SMP*/
+#define SLAB_CACHE_DMA		0x00008000UL	/* Use GFP_DMA memory */
 #define SLAB_STORE_USER		0x00010000UL	/* DEBUG: Store the last owner for bug hunting */
 #define SLAB_PANIC		0x00040000UL	/* Panic if kmem_cache_create() fails */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* Defer freeing slabs to RCU */
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -174,13 +174,13 @@
 /* Legal flag mask for kmem_cache_create(). */
 #if DEBUG
 # define CREATE_MASK	(SLAB_RED_ZONE | \
-			 SLAB_POISON | SLAB_HWCACHE_ALIGN | \
+			 SLAB_POISON | SLAB_HWCACHE_ALIGN | SLAB_SMP_ALIGN | \
 			 SLAB_CACHE_DMA | \
 			 SLAB_STORE_USER | \
 			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #else
-# define CREATE_MASK	(SLAB_HWCACHE_ALIGN | \
+# define CREATE_MASK	(SLAB_HWCACHE_ALIGN | SLAB_SMP_ALIGN | \
 			 SLAB_CACHE_DMA | \
 			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
@@ -2245,6 +2245,9 @@ kmem_cache_create (const char *name, siz
 		ralign = BYTES_PER_WORD;
 	}
 
+	if ((flags &amp; SLAB_SMP_ALIGN) &amp;&amp; num_possible_cpus() &gt; 1)
+		ralign = max_t(unsigned long, ralign, cache_line_size());
+
 	/*
 	 * Redzoning and user stor...
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 3:06 pm

Mandatory alignment are specified as a parameter to kmem_cache_create not 
as flags. Ycan specify the alignment as a parameter. i.e. L1_CACHE_BYTES 
or cache_line_size().

Maybe we should drop SLAB_HWCACHE_ALIGN because of the confusion is 
creates because it seems that flags can be used to enforce alignments?


--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:03 pm

We don't want either of those, though.

I don't see why this is a problem to do inside slab.
--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:09 pm

I am not sure why you need to have two ways of specifying alignments.
 
--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:12 pm

I want a way to ask for SMP friendly allocation.
--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:17 pm

Then align the objects at cacheline boundaries by providing a value for 
the align parameter to kmem_cache_create().
 
--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:24 pm

max(num_possible_cpus() &gt; 1 ? cache_line_size() : 0, mandatory_alignment)?

Then suppose we want a CONFIG_TINY option to eliminate it?

max(!CONFIG_TINY &amp;&amp; num_possible_cpus() &gt; 1 ? cache_line_size() : 0, mandatory_alignment)

And maybe the VSMP guys will want to blow this out to their internode
alignment?

max(!CONFIG_TINY &amp;&amp; num_possible_cpus() &gt; 1 ? (is_vsmp ? internode_alignemnt : cache_line_size()) : 0, mandatory_alignment)


--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:31 pm

The mandatory alignment is applied anyways. You do not need to max() on 
that.

One could simply specify cache_line_size() with Pekka's patch. 

No slab allocator supports that right now. However, SLOB in the past has 
ignored the alignment in order to reduce memory use. So maybe Matt wants 

No the slab allocators were optimized for VSMP so that the 
internode_alignment is not necessary there. That was actually one of the 
requirements that triggered the slab numa work.

--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 12:37 am

BTW. can you explain this incredible claim that the slab allocators
do not need internode_alignment to avoid false sharing of allocated
objects on VSMP? I don't understand how that would work at all.

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 1:11 am

The queues are per node which means that pages from which we allocate are 
distinct per node. As long as processes allocate and free object on a 
single node all is fine. Problems can arise though if objects are used 
across node. The earlier incarnation of SLAB had a single global queue for 
slab pages and objects could be allocated from an arbitrary processor. 
Objects from the same page were allocated from all processors.

--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 1:19 am

This does not solve the problem. Say if you have 2 objects allocated
from a given slab and you want to avoid cacheline contention between
them, then you have to always ensure they don't overlap on the same
cacheline. The fact this happens naturally when you allocate 2 objects
from 2 different nodes doesn't really help: the same single CPU can
allocate objects that you want to avoid false sharing with. Consider
a task struct for example: if you start up a whole lot of worker threads
from a single CPU, then you will allocate them all from that one CPU.
Then they are spread over all CPUs and you get cacheline contention.

Which is why VSMP would legitimately want to use internode alignment
on some structures. And internode alignment is total overkill on any
other type of system, so you can't go around annotating all your
structures with it and hope KMEM_CACHE does the right thing.

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 1:26 am

Internode alignment is 4k. If you need an object aligned like that then it 
may be better to go to the page allocator.



--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Tuesday, March 11, 2008 - 3:13 am

I never got a reply about this. Again: how can the modern slab allocator
avoid the need for internode alignment in order to prevent cacheline
pingpong on vsmp? Because I'm going to resubmit the SMP_ALIGN patch.

Also, the last I heard about the HWCACHE_ALIGN from you is that it didn't
make sense... I see it's merged now, which I appreciate, but I think it is
really important that we're on the same page here, so let me know if it
still doesn't make sense.

Thanks,
Nick

--
To: Nick Piggin <npiggin@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Wednesday, March 12, 2008 - 2:21 am

You quoted my answer. Use the page allocator to allocate data that has an 

I still think that the semantics are weird because it only works sometimes 
and then performs an alignment within a cacheline that improves the 
situation somewhat in some cases but does not give the user what he 
expected.
--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Friday, March 7, 2008 - 1:37 am

But you don't need an object aligned like that unless you are VSMP.
--
To: Christoph Lameter <clameter@...>
Cc: <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Tuesday, March 4, 2008 - 8:16 pm

What do you mean internode_alignment is not necessary there? This is
about memory returned by the allocator to the caller, and the caller
does not want any false sharing of the cacheline on SMP systems. How
can internode alignemnt be not necessary?

--
To: Nick Piggin <npiggin@...>
Cc: Christoph Lameter <clameter@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 4:41 pm

Hmm... Can't we just fix SLAB_HWCACHE_ALIGN in SLUB to follow the
semantics of SLAB?
--
To: Pekka Enberg <penberg@...>
Cc: Nick Piggin <npiggin@...>, <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, <yanmin_zhang@...>, David Miller <davem@...>, Eric Dumazet <dada1@...>
Date: Monday, March 3, 2008 - 5:23 pm

AFAICT it follows SLAB semantics. The only small difference is for objects 
small than cache_line_size() / 2 where SLUB does not bother to align to a 
fraction of a cacheline since we are already placing multile object into a 
cacheline. We effectively have made the decision to give up the 
organization of objects in separatate cache lines.

Lets say you have a 64 byte cache line size. Then the alignment can be 
as follows. (8 byte alignment is the basic alignment requirement).

Objsize	[C	SLAB	SLUB
33 .. 64	64	64
32		32	32
24		32	24	-&gt; 3 object per cacheline sizes = 72 so overlap.
16		16	16
8		8	8

So there is only one difference for 24 byte sizes slabs.


--
Previous thread: Re: [PATCH] V4L/DVB: tuner and radio addresses are missing for the PixelView PlayTV card by Wojciech Migda on Monday, March 3, 2008 - 4:35 am. (1 message)

Next thread: Re: [PATCH] ACPI: Unneccessary to scan the PCI bus already scanned. by Yinghai Lu on Monday, March 3, 2008 - 6:01 am. (10 messages)
speck-geostationary