Yes I hate this barrier too. That is why changelog explicitly mentions it.
With some trivial code modifications we can move set_wq_data() from insert_work()
to __queue_work(), then
void set_wq_data(work, cwq)
{
struct cpu_workqueue_struct *old = get_wq_data(work);
if (likely(cwq == old))
return;
if (old)
spin_lock(old->lock);
atomic_long_set(&work->data, ...);
if (old)
spin_lock(old->lock);
}
I can't say I like this very much, though. I'd prefer use smp_mb__before_spinlock().
Probably we can do something else.
But first I'd like to kill cwq_should_stop(). (Gautham, Srivatsa, you were
right, but I was blind, deaf, and stupid).
Oleg.
-