Re: [REV2: PATCH 1/2]: workqueue: Implement the kernel API

Previous thread: Re: [kernel-abi] symbol versioning vs. out-of-tree modules. by Alan Young on Sunday, September 28, 2008 - 11:51 pm. (2 messages)

Next thread: [PATCH] markers: fix unregister bug and reenter bug by Lai Jiangshan on Monday, September 29, 2008 - 1:00 am. (23 messages)
From: Krishna Kumar
Date: Monday, September 29, 2008 - 12:03 am

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement two API's for quickly updating delayed works:
	int schedule_update_delayed_work(struct delayed_work *dwork,
					  unsigned long delay);
	int queue_update_delayed_work(struct workqueue_struct *wq,
				       struct delayed_work *dwork,
				       unsigned long delay);

I tested the following combinations, after Oleg suggested some improvements:
	A. - Original submission.
		1 CPU -> Org: 474213	New: 55365	Time saved: 88.3%
		4 CPU -> Org: 3650631	New: 225160	Time saved: 93.8%
	B1. - Oleg's suggestion to avoid costly __cancel_work_timer.
		1 CPU -> Org: 474213	New: 233714	Time saved: 50.7%
		4 CPU -> Org: 3650631	New: 959455	Time saved: 73.7%
	B2. - B1 + check for same expiry.
		1 CPU -> Org: 474213	New: 72276	Time saved: 84.8%
		4 CPU -> Org: 3650631	New: 301510	Time saved: 91.7%
	C. - Merge Oleg's idea with code A.
		1 CPU -> Org: 474213	New: 55160	Time saved: 88.4%
		4 CPU -> Org: 3650631	New: 215369	Time saved: 94.1%

Merged code seems to perform the best, though the difference with the original
submission is only marginal. Code snippets below with comments removed:

A: Original submission:
-----------------------
void queue_update_delayed_work(struct workqueue_struct *wq,
			       struct delayed_work *dwork, unsigned long delay)
{
	struct work_struct *work = &dwork->work;

	if (likely(test_and_set_bit(WORK_STRUCT_PENDING,
				    work_data_bits(work)))) {
		struct timer_list *timer = &dwork->timer;

		if (jiffies + delay == timer->expires)
			return;

		__cancel_work_timer_internal(work, timer);
	}

	__queue_delayed_work(-1, dwork, work, wq, delay);
}

B1: Oleg suggestion:
------------------------
(I optimized a bit as parallel updates are not important, only one needs to
succeed)

int queue_update_delayed_work(struct workqueue_struct *wq,
			      struct delayed_work *dwork, unsigned long delay)
{
	unsigned long when = jiffies + delay;

	if (queue_delayed_work(wq, dwork, delay))
		return ...
From: Krishna Kumar
Date: Monday, September 29, 2008 - 12:03 am

[Empty message]
From: Oleg Nesterov
Date: Monday, September 29, 2008 - 7:27 am

Please note that the above is the open-coded queue_delayed_work().
I'd suggest to just start with

	if (queue_delayed_work(...))
		return 1;

	... slow path ...


I can't understand why do you want to optimize this very unlikely case.
Afaics, it can only improve the benchmark, when we are doing
queue_update_delayed_work() in a loop with the same timeout, no?

But more importantly, this is not right. We can not trust timer->expires.
For example, let's suppose we are doing

	queue_delayed_work(dwork, delay);
	cancel_delayed_work_sync(dwork);	// does not clear ->expires
	queue_work(&dwork->work);		// the same

Now, queue_update_delayed_work(new_delay) sees the pending dwork, and
it is possible that timer->expires == jiffies + new_delay.

Note also that INIT_DELAYED_WORK() does not initialize ->expires. Now,
again, if we do queue_work(&dwork->work) and then update(), we may have
problems.


It is a bit silly we are checking "ret < 0" twice, I'd suggest to just
kill "int ret" and do

	for (;;) {
		...
		if (try_to_grab_pending(work) >= 0)
			break;
		wait_on_work(work);
	}

But this needs a comment. Why wait_on_work() is needed? "ret < 0" means that
the queueing is in progress, and it is not necessary to "flush" this work.

Note!!! queue_update_delayed_work() is _not_ equal to cancel() + queue()
anyway, the fastpath doesn't cancel the work if it is active (I mean,
it is ->current_work and running).

But yes, we do need wait_on_work(). Let's suppose that some rt thread
does queue_update_delayed_work() and preempts work->func() which tries
to re-queue itself, right after it sets WORK_STRUCT_PENDING. This is
livelockable.

But: this also means that 2 concurrent queue_update_delayed_work()s can
livelock in the same manner, perhaps this deserves a note.


I am not really sure it is worth to play with WORK_STRUCT_PENDING,
the simple

	int requeue_delayed_work(...)
	{
		int ret = 1;
	
		while (queue_delayed_work(...)) {
			ret = 0;
			if ...
From: Krishna Kumar2
Date: Tuesday, September 30, 2008 - 4:08 am

Oleg Nesterov <oleg@tv-sign.ru> wrote on 09/29/2008 07:57:34 PM:



The reason I had coded this way was to avoid an unnecessary call -
unnecessary
since the update function should be usually called to update a work and
hence
the work is almost always pending. But I will make this change as it is
more


work.

I had tried to capture this information in the comment above, maybe it
needs

Correct. But the behavior is the same as the existing (say driver) code
which
calls cancel (which also doesn't cancel as the work is already running) and
then calls queue. In both cases, the work is running and we wait for the
work

Assuming the work is already pending (which is why both calls are in the
middle
of this new API):

1. Timer is pending (work is not executing) - update_timer will break out
for
   both calls in any order.
2. Timer is not pending due to timer not yet been added - we loop until
either
   update_timer or try_to_grab_pending() breaks out, which happens when the
   timer is added or fired (if delay=0) and is almost instantaneous.
3. If timer is not pending due to timer handler already having got called -
   try_to_grab_pending returns 0 when run_workqueue clears the PENDING bit.
   This also breaks from the loop before work is finished executing (since
the

I am not insisting either way since both approaches achieve a big
improvement in
time saved. I am open to which way to go, please suggest. Here is the
updated
function. If you feel it is too complicated, I will go with the above
approach.

int queue_update_delayed_work(struct workqueue_struct *wq,
                              struct delayed_work *dwork, unsigned long delay)
{
      int ret = 1;

      if (!queue_delayed_work(wq, dwork, delay)) {
            struct work_struct *work = &dwork->work;
            struct timer_list *timer = &dwork->timer;

            ret = 0;
            for (;;) {
                  unsigned long when = jiffies + delay;

                  if ...
From: Oleg Nesterov
Date: Tuesday, September 30, 2008 - 6:15 am

Yes I see. But actually I meant "see below, I don't think we need

Exactly! And in that case update_timer_expiry() will do its work.

Yes, the games with __queue_delayed_work() can save a couple of clear/set
bit, but this is the slow an unlikely path, that was my point.

And, if we are talking about the function call overhead. Please note
that __queue_delayed_work() adds the "unnecessary" call to the "normal"
queue_delayed_work(), and this path is more common.


Hmm. How so? cancel() always cancells the work.

OK, let's suppose we have

	cancel_delayed_work(dwork);
	queue_delayed_work(dwork, delay);

and now (the next patch you sent) we turn this code into

	queue_update_delayed_work(dwork, delay);

Now we have a subtle difference which _may_ be important for some users.

Suppose that this dwork was queued before, the timer has expired, and
now cwq->thread runs dwork->work->func(). Note that this dwork is not
pending (WORK_STRUCT_PENDING is cleared).

Before the change above, cancel_delayed_work() waits until work->func()
completes, then we re-queue the work.

After the change, the fast path just queues the same dwork again.
If the workqueue is not singlethreaded, it is possible that both
old and new "instances" will run on the different CPUs at the same
time.


No.

dwork is not pending, L is a lower priority thread, H - real time.

L calls queue_update_delayed_work(), and sets WORK_STRUCT_PENDING
successfully(). It is going to do __queue_delayed_work(), but it
is preempted by H.

H does queue_update_delayed_work(), WORK_STRUCT_PENDING is already
set and the timer is not pending. Now it will do try_to_grab_pending()

Yes. I must admit, I prefer the simple, non-intrusive code I suggested
much more.

Once again, the slow path is (at least, supposed to be) unlikely, and
the difference is not that large. (I mean the slow path is when both
queue() and update_timer() fail).

Should we complicate the code to add this minor optimization (and
btw pessimize ...
From: Krishna Kumar
Date: Monday, September 29, 2008 - 12:03 am

From: Krishna Kumar <krkumar2@in.ibm.com>

Modify some users to use the new API.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/wireless/ipw2100.c       |   12 ++++++------
 drivers/net/wireless/libertas/scan.c |    5 ++---
 drivers/net/wireless/libertas/wext.c |    7 +++----
 fs/afs/vlocation.c                   |   11 +++--------
 fs/ocfs2/cluster/heartbeat.c         |    5 ++---
 5 files changed, 16 insertions(+), 24 deletions(-)

diff -ruNp 2.6.27-rc7-org/drivers/net/wireless/ipw2100.c 2.6.27-rc7-new/drivers/net/wireless/ipw2100.c
--- 2.6.27-rc7-org/drivers/net/wireless/ipw2100.c	2008-09-22 03:59:55.000000000 +0530
+++ 2.6.27-rc7-new/drivers/net/wireless/ipw2100.c	2008-09-29 12:06:55.000000000 +0530
@@ -2085,9 +2085,8 @@ static void isr_indicate_rf_kill(struct 
 
 	/* Make sure the RF Kill check timer is running */
 	priv->stop_rf_kill = 0;
-	cancel_delayed_work(&priv->rf_kill);
-	queue_delayed_work(priv->workqueue, &priv->rf_kill,
-			   round_jiffies_relative(HZ));
+	queue_update_delayed_work(priv->workqueue, &priv->rf_kill,
+				  round_jiffies_relative(HZ));
 }
 
 static void send_scan_event(void *data)
@@ -4241,9 +4240,10 @@ static int ipw_radio_kill_sw(struct ipw2
 					  "disabled by HW switch\n");
 			/* Make sure the RF_KILL check timer is running */
 			priv->stop_rf_kill = 0;
-			cancel_delayed_work(&priv->rf_kill);
-			queue_delayed_work(priv->workqueue, &priv->rf_kill,
-					   round_jiffies_relative(HZ));
+
+			queue_update_delayed_work(priv->workqueue,
+						  &priv->rf_kill,
+						  round_jiffies_relative(HZ));
 		} else
 			schedule_reset(priv);
 	}
diff -ruNp 2.6.27-rc7-org/drivers/net/wireless/libertas/scan.c 2.6.27-rc7-new/drivers/net/wireless/libertas/scan.c
--- 2.6.27-rc7-org/drivers/net/wireless/libertas/scan.c	2008-09-22 03:59:55.000000000 +0530
+++ 2.6.27-rc7-new/drivers/net/wireless/libertas/scan.c	2008-09-29 12:06:55.000000000 +0530
@@ -435,9 +435,8 @@ int lbs_scan_networks(struct lbs_private
 ...
From: Oleg Nesterov
Date: Monday, September 29, 2008 - 7:45 am

Looks good, but I don't understand this code.

Krishna, I'd suggest you to send each change in a separate patch with
the maintainer cc'ed.

And please don't forget to mention that update() != cancel() + queue().
Otherwise maintainer can miss this fact.

Oleg.

--

Previous thread: Re: [kernel-abi] symbol versioning vs. out-of-tree modules. by Alan Young on Sunday, September 28, 2008 - 11:51 pm. (2 messages)

Next thread: [PATCH] markers: fix unregister bug and reenter bug by Lai Jiangshan on Monday, September 29, 2008 - 1:00 am. (23 messages)