Re: [RFC] sched: implement the exclusive wait queue as a LIFO queue

Previous thread: [PATCH 1/2 v2] MMC:Add support MMCIF for SuperH by Yusuke Goda on Tuesday, April 27, 2010 - 10:06 pm. (3 messages)

Next thread: linux-next: manual merge of the staging-next tree with the pcmcia tree by Stephen Rothwell on Tuesday, April 27, 2010 - 11:07 pm. (2 messages)
From: Changli Gao
Date: Tuesday, April 27, 2010 - 10:03 pm

implement the exclusive wait queue as a LIFO queue

If the exclusive wait queue is also a LIFO queue as the normal wait queue, the
process who goes to sleep recently, will be woke up first. As its memory is
more likely in cache, we will get better performance. And when there are many
processes waiting on a exclusive wait queue, some of them may not be woke up,
if the others can handle the workload, and it will reduce the load of
the scheduler.

Note: before applying this patch, you need my previous patch patched first.
https://patchwork.kernel.org/patch/95600/

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 fs/eventpoll.c       |    3 +--
 include/linux/wait.h |   17 +++++++----------
 kernel/sched.c       |    8 ++++----
 kernel/wait.c        |    9 +++------
 4 files changed, 15 insertions(+), 22 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index bd056a5..e9b3ebe 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1140,8 +1140,7 @@ retry:
 		 * ep_poll_callback() when events will become available.
 		 */
 		init_waitqueue_entry(&wait, current);
-		wait.flags |= WQ_FLAG_EXCLUSIVE;
-		__add_wait_queue(&ep->wq, &wait);
+		__add_wait_queue_ex(&ep->wq, &wait);
 
 		for (;;) {
 			/*
diff --git a/include/linux/wait.h b/include/linux/wait.h
index a48e16b..95c127d 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -30,8 +30,6 @@ typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, v
 int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);
 
 struct __wait_queue {
-	unsigned int flags;
-#define WQ_FLAG_EXCLUSIVE	0x01
 	void *private;
 	wait_queue_func_t func;
 	struct list_head task_list;
@@ -50,6 +48,7 @@ struct wait_bit_queue {
 struct __wait_queue_head {
 	spinlock_t lock;
 	struct list_head task_list;
+	struct list_head task_list_ex;
 };
 typedef struct __wait_queue_head wait_queue_head_t;
 
@@ -69,7 +68,8 @@ struct task_struct;
 
 #define ...
From: Changli Gao
Date: Tuesday, April 27, 2010 - 11:22 pm

I am sorry. the later task_list should be task_list_ex. The updated
patch is attached. And I will replace task_list list with hlist for
saving space.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
From: Changli Gao
Date: Wednesday, April 28, 2010 - 1:05 am

This version replaces list with hlist.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
From: Xiaotian Feng
Date: Wednesday, April 28, 2010 - 12:47 am

From: Changli Gao
Date: Wednesday, April 28, 2010 - 12:52 am

Starve? Oh, No. If we don't need these processes, and we can do better
without them, why we wake them up?


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Yong Zhang
Date: Wednesday, April 28, 2010 - 1:15 am

So some processs(at the tail of exclusive list)will be treated abnormally
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 1:23 am

If the work is less than the workers, we don't need the workers at the

If there isn't enough work to be done, we'd better not disrupt them
and  leave them sleeping forever to keep the scheduler happier. Do we
have reason to keep fair to all the workers? Does it have benefit?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Johannes Weiner
Date: Wednesday, April 28, 2010 - 2:25 am

How about starving lock contenders?  See wait_on_bit_lock() and grep
for the users e.g.
--

From: David Howells
Date: Wednesday, April 28, 2010 - 2:29 am

You've made one important assumption: the processes on the wait queue are
sleeping waiting to service things... but what if the wait queue governs
access to a resource, and all the processes on that wait queue need access to
that resource to do things?  Some of the processes waiting for it may never
get a go, and so necessary work may be left undone.

So NACK.

David
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 4:17 am

You are right. I made the wrong assumption. But we indeed need some
primitive to add wait_queue at the head of the wait_queue_head, and I
know epoll needs it, at least.

fs/eventpoll.c: 1443.
                wait.flags |= WQ_FLAG_EXCLUSIVE;
                __add_wait_queue(&ep->wq, &wait);

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Jamie Lokier
Date: Wednesday, April 28, 2010 - 6:21 am

The same thing about assumptions applies here.  The userspace process
may be waiting for an epoll condition to get access to a resource,
rather than being a worker thread interchangeable with others.

For example, userspace might be using a pipe as a signal-safe lock, or
signal-safe multi-token semaphore, and epoll to wait for that pipe.

WQ_FLAG_EXCLUSIVE means there is no point waking all tasks, to avoid a
pointless thundering herd.  It doesn't mean unfairness is ok.

The LIFO idea _might_ make sense for interchangeable worker-thread
situations - including userspace.  It would make sense for pipe
waiters, socket waiters (especially accept), etc.

Do you have any measurements which showing the LIFO mode performing
better than FIFO, and by how much?

-- Jamie
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 6:42 am

Oh, the lines above are the current ones. So the assumptions applies

The users should not make any assumption about the waking up sequence,


I didn't do any test yet. But some work done by LSE project years ago
showed that it is better.

http://lse.sourceforge.net/io/aionotes.txt

" Also in view of
better cache utilization the wake queue mechanism is LIFO by default.
(A new exclusive LIFO wakeup option has been introduced for this purpose)"

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Jamie Lokier
Date: Wednesday, April 28, 2010 - 8:25 am

No, because WQ_FLAG_EXCLUSIVE doesn't have your LIFO semantic at the moment.

Your patch changes the behaviour of epoll, though I don't know if it
matters.  Perhaps all programs which have multiple tasks waiting on

Correct, but they should be able to assume non-starvation (eventual
progress) for all waiters.

It's one of those subtle things, possibly a unixy thing: Non-RT tasks
should always make progress when the competition is just other non-RT
tasks, even if the progress is slow.

Starvation can spread out beyond the starved process, to cause
priority inversions in other tasks that are waiting on a resource
locked by the starved process.  Among other things, that can cause
higher priority tasks, and RT priority tasks, to block permanently.

Occasionally unix socketpairs are occasionally used in the above ways too.

I'm not against your patch, but I worry that starvation is a new
semantic, and it may have a significant effect on something - either

I suspect it's possible to combine LIFO-ish and FIFO-ish queuing to
prevent starvation while getting some of the locality benefit.
Something like add-LIFO and increment a small counter in the next wait
entry, but never add in front of an entry whose counter has reached
MAX_LIFO_WAITERS? :-)

-- Jamie
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 8:49 am

No. You are wrong. I meant epoll implemented LIFO on its own. You


It is a little complex, and I'll keep it simple and improve it when necessary.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Davide Libenzi
Date: Wednesday, April 28, 2010 - 11:57 am

I'm not sure one user deserves a new function, but like it has been 
noticed, the patch for that should eventually be totally isolated from 
other bits.


- Davide

From: David Howells
Date: Wednesday, April 28, 2010 - 6:21 am

You can add at either end of the wait queue with __add_wait_queue() and
__add_wait_queue_tail().

David
--

From: David Howells
Date: Wednesday, April 28, 2010 - 2:34 am

Can you split the wait queue code and differentiate exclusive wait queues with
LIFO functionality from wait queues with FIFO functionality.  I can see why
your suggestion is desirable.

David
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 6:47 am

OK. I will use two lists: one for non-exclusive wait queues, and the
other for exclusive wait queues, and I'll add new interfaces for LIFO
functionality instead of changing the current interfaces.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Howells
Date: Wednesday, April 28, 2010 - 2:32 am

It would be preferable it if you could avoid making struct __wait_queue_head
bigger.  That will increase the size of a lot of things.

David
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 6:56 am

I don't know how to do that, as maybe there are non-exclusive and
exclusive wait queues in the same wait queue head. If we want to
enqueue exclusive wait queues at the head of exclusive queues, we have
to know where the head is, otherwise, we have to loop to find the head
when enqueuing.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Howells
Date: Wednesday, April 28, 2010 - 7:06 am

I suspect you really want to have the semantics defined per-queue.  _Either_ a
queue is FIFO (such as processes waiting for a resource so they can do
something with it) _or_ it is LIFO (such as a pool of processes waiting to be
given work).

How often do the two actually mix?  And if they do, is that really an error?

David
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 7:53 am

The sock.sk_sleep is used by exclusive and non-exclusive wait queues.
exclusive and non-exclusive is identified by wait queues, not wait
queue heads. Maybe there is a historical reason. It is much like a
hack.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: David Howells
Date: Wednesday, April 28, 2010 - 8:00 am

But is it actually used in both modes at the same time?

David
--

From: Changli Gao
Date: Wednesday, April 28, 2010 - 8:33 am

It should be seldom in good applications. So I only need to add a
function, like this:

add_wait_queue_head_exclusive(wq, wait)
   first_exclusive = find_the_first_exclusive(wq); // it is fast, as
there should no or few non-exclusive wait queues
   list_add(wait, first_exclusive);


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

Previous thread: [PATCH 1/2 v2] MMC:Add support MMCIF for SuperH by Yusuke Goda on Tuesday, April 27, 2010 - 10:06 pm. (3 messages)

Next thread: linux-next: manual merge of the staging-next tree with the pcmcia tree by Stephen Rothwell on Tuesday, April 27, 2010 - 11:07 pm. (2 messages)