> Reduce rq->lock contention on try_to_wake_up() by changing the task
> state using a cmpxchg loop.
>
> Once the task is set to TASK_WAKING we're guaranteed the only one
> poking at it, then proceed to pick a new cpu without holding the
> rq->lock (XXX this opens some races).
>
> Then instead of locking the remote rq and activating the task, place
> the task on a remote queue, again using cmpxchg, and notify the remote
> cpu per IPI if this queue was empty to start processing its wakeups.
>
> This avoids (in most cases) having to lock the remote runqueue (and
> therefore the exclusive cacheline transfer thereof) but also touching
> all the remote runqueue data structures needed for the actual
> activation.
>
> As measured using:
http://oss.oracle.com/~mason/sembench.c
>
> $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
> $ ./sembench -t 2048 -w 1900 -o 0
>
> unpatched: run time 30 seconds 537953 worker burns per second
> patched: run time 30 seconds 657336 worker burns per second
>
> Still need to sort out all the races marked XXX (non-trivial), and its
> x86 only for the moment.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/x86/kernel/smp.c | 1
> include/linux/sched.h | 7 -
> kernel/sched.c | 241 ++++++++++++++++++++++++++++++++++--------------
> kernel/sched_fair.c | 5
> kernel/sched_features.h | 3
> kernel/sched_idletask.c | 2
> kernel/sched_rt.c | 4
> kernel/sched_stoptask.c | 3
> 8 files changed, 190 insertions(+), 76 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/smp.c
> +++ linux-2.6/arch/x86/kernel/smp.c
> @@ -205,6 +205,7 @@ void smp_reschedule_interrupt(struct pt_
> /*
> * KVM uses this interrupt to force a cpu out of guest mode
> */
> + sched_ttwu_pending();
> }