Re: destroy_workqueue can livelock

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Michal Schmidt <mschmidt@...>
Cc: <linux-kernel@...>
Date: Wednesday, July 11, 2007 - 6:26 pm

Michal Schmidt wrote:

In short: "while (flush_cpu_workqueue(cwq))" can livelock because a re-niced
caller can add a new barrier before the lower-priority cwq->thread clears
->current_work.


Yes, and the fix is very simple. In fact cleanup_workqueue_thread() doesn't
need the "while" loop. I did it that way to avoid a subtle dependency with
run_workqueue(), and because I failed to invent a good comment which explains
why it is safe to do flush_cpu_workqueue() once.

In short, if we have another barrier when flush_cpu_workqueue() returns,
cwq->thread must be "inside" run_workqueue() which can't return until
cwq->worklist becomes empty. This means we can do kthread_stop() right now,
kthread_should_stop() won't be checked until run_workqueue() returns.

(Another option is to clear cwq->current_work in wq_barrier_func(), before
 complete(). This is possible because nobody can "see" this barrier except
 flush_cpu_workqueue()).

I'll re-check my thinking and send a patch tomorrow.

Thanks a lot, Michal.

Oleg.

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: destroy_workqueue can livelock, Oleg Nesterov, (Wed Jul 11, 6:26 pm)