[PATCH 2/2] ipc semaphores: order wakeups based on waiter CPU

Previous thread: [PATCH][v2] Changes to more io-controller stats patches to address review comments. by Divyesh Shah on Monday, April 12, 2010 - 11:41 am. (5 messages)

Next thread: [PATCH RFC] Optimize semtimedop by Chris Mason on Monday, April 12, 2010 - 11:49 am. (23 messages)
From: Chris Mason
Date: Monday, April 12, 2010 - 11:49 am

When IPC semaphores are used in a bulk post and wait system, we
can end up waking a very large number of processes per semtimedop call.
At least one major database will use a single process to kick hundreds
of other processes at a time.

This patch tries to reduce the runqueue lock contention by ordering the
wakeups based on the CPU the waiting process was on when it went to
sleep.

A later patch could add some code in the scheduler to help
wake these up in bulk and take the various runqueue locks less often.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
---
 include/linux/sem.h |    1 +
 ipc/sem.c           |   37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/include/linux/sem.h b/include/linux/sem.h
index 8b97b51..4a37319 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -104,6 +104,7 @@ struct sem_array {
 struct sem_queue {
 	struct list_head	list;	 /* queue of pending operations */
 	struct task_struct	*sleeper; /* this process */
+	unsigned long		sleep_cpu;
 	struct sem_undo		*undo;	 /* undo structure */
 	int    			pid;	 /* process id of requesting process */
 	int    			status;	 /* completion status of operation */
diff --git a/ipc/sem.c b/ipc/sem.c
index 335cd35..07fe1d5 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -544,6 +544,25 @@ static void wake_up_sem_queue(struct sem_queue *q, int error)
 	preempt_enable();
 }
 
+/*
+ * sorting helper for struct sem_queues in a list.  This is used to
+ * sort by the CPU they are likely to be on when waking them.
+ */
+int list_comp(void *priv, struct list_head *a, struct list_head *b)
+{
+	struct sem_queue *qa;
+	struct sem_queue *qb;
+
+	qa = list_entry(a, struct sem_queue, list);
+	qb = list_entry(b, struct sem_queue, list);
+
+	if (qa->sleep_cpu < qb->sleep_cpu)
+		return -1;
+	if (qa->sleep_cpu > qb->sleep_cpu)
+		return 1;
+	return 0;
+}
+
 /**
  * update_queue(sma, semnum): Look for tasks that can be completed.
  * @sma: ...
From: Manfred Spraul
Date: Saturday, April 17, 2010 - 3:24 am

Hi Chris,

What about moving this step much later?

There is no need to hold any locks for the actual wake_up_process().

I've updated my patch:
- improved update_queue that guarantees no O(N^2) for your workload.
- move the actual wake-up after dropping all locks
- optimize setting sem_otime
- cacheline align the ipc spinlock.

But the odd thing:
It doesn't improve the sembench result at all (AMD Phenom X4)
The only thing that is reduced is the system time:
 From ~1 min system time for "sembench -t 250 -w 250 -r 30 -o 0" to ~30 sec.

cpu binding the sembench threads results in an improvement of ~50% - at 
the cost of a significant increase of the system time (from 30 seconds 
to 1 min) and the user time (from 2 seconds to 14 seconds).

Are you sure that the problem is contention on the semaphore array spinlock?
With the above changes, the code that is under the spin_lock is very short.
Especially:
- Why does optimizing ipc/sem.c only reduce the system time [reported by 
time] and not the sembench output?
- Why is there no improvement from the ___cache_line_align?
If there would be  contention, then there should be trashing from 
accessing the lock and writing sem_otime and reading sem_base.
- Additionally: you wrote that reducing the array size does not help much.
But: The arrays are 100% independant, the ipc code scales linearly.
Spreading the work over multiple spinlocks is - like cache line aligning 
- usually a 100% guaranteed improvement if there is contention.

I've attached a modified sembench.c and the proposal for ipc/sem.c
Could you try it?
What do you think?
How many cores do you have in your test system?

--
     Manfred
Previous thread: [PATCH][v2] Changes to more io-controller stats patches to address review comments. by Divyesh Shah on Monday, April 12, 2010 - 11:41 am. (5 messages)

Next thread: [PATCH RFC] Optimize semtimedop by Chris Mason on Monday, April 12, 2010 - 11:49 am. (23 messages)