> Don Porter wrote:
> > From: Donald E. Porter <porterde@cs.utexas.edu>
> >
> > In the bulk page allocation/free routines in mm/page_alloc.c, the zone
> > lock is held across all iterations. For certain parallel workloads, I
> > have found that releasing and reacquiring the lock for each iteration
> > yields better performance, especially at higher CPU counts. For
> > instance, kernel compilation is sped up by 5% on an 8 CPU test
> > machine. In most cases, there is no significant effect on performance
> > (although the effect tends to be slightly positive). This seems quite
> > reasonable for the very small scope of the change.
> >
> > My intuition is that this patch prevents smaller requests from waiting
> > on larger ones. While grabbing and releasing the lock within the loop
> > adds a few instructions, it can lower the latency for a particular
> > thread's allocation which is often on the thread's critical path.
> > Lowering the average latency for allocation can increase system throughput.
> >
> > More detailed information, including data from the tests I ran to
> > validate this change are available at
> >
http://www.cs.utexas.edu/~porterde/kernel-patch.html .
> >
> > Thanks in advance for your consideration and feedback.
>
> That's an interesting insight. My intuition is that Nick Piggin's
> recently-posted ticket spinlocks patches[1] will reduce the need for this patch,
> though it may be useful to have both. Can you benchmark again with only ticket
> spinlocks, and with ticket spinlocks + this patch? You'll probably want to use
> 2.6.24-rc1 as your baseline, due to the x86 architecture merge.