On Thu, Mar 06, 2008 at 02:56:42PM -0800, Christoph Lameter wrote:
That's just because you (apparently still) have a misconception about what
the flag is supposed to be for. It is not for aligning things to the start
of a cacheline boundary. It is not for avoiding false sharing on SMP. It
is for ensuring that a given object will span the fewest number of
cachelines. This can actually be important if you do anything like random
lookups or tree walks where the object contains the tree node.
Consider a 64 byte cacheline, and a 24 byte object:
cacheline |-------|-------|-------
object |--|--|--|--|--|--|--|--
So if you touch 8 random objects, it is statistically likely to cost you
10 cache misses (so long as the working set is sufficiently cold / larger
than cache that cacheline sharing is insignificant).
If you actually honour HWCACHE_ALIGN, then the same object will be 32
bytes:
cacheline |-------|
object |---|---|
Now 8 will cost 8. A 20% saving. Maybe almost a 20% performance improvement.
Before we go around in circles again, do you accept this? If yes, then
what is your argument that SLUB knows better than the caller; if no, then
why not?
--