On Sun, 2007-06-24 at 09:40 -0700, Linus Torvalds wrote:
It was.
Hehe, it is the simplest thing I could come up with. It is
deterministic, fast and has a nice physical model :-)
No we don't (well, on 64bit arches we do). I actually only use unsigned
long, and even cast whatever comes out of the percpu_counter thing to
unsigned long.
That is only needed on 64bit arches, and even there, actually
encountering such large values will be rare at best.
Also, this re-normalisation event that uses the call is low frequency.
That is, that part will be used once every ~ total_dirty_limit/nr_bdis
written out.
Agreed.
It all just stems from Andrew asking if I could please re-use something
instread of duplication a lot of things. I picked percpu_counter because
that was the closest to what was needed. An unsigned long based per-cpu
counter would suit better.
There is another problem I have with this percpu_counter, it is rather
space hungry. It does a node affine sizeof(s32) kalloc on each cpu.
Which will end up using the smallest slab, and that is quite a bit
bigger than needed. But should be about the size of a cacheline
(otherwise we might still end up with false sharing).
I've been thinking of extending this per cpu allocator thing a bit to be
a little smarter about these things. What would be needed is a strict
per-cpu slab allocator. The current ones are node affine, which can
still cause false sharing (unless - as should be the case - these
objects are both cacheline aligned and of cacheline size). When we have
that, we can start using smaller objects.
-