On Thu, 2007-08-02 at 23:50 +0400, Oleg Nesterov wrote:
I think you have argued very effectively that there are situations in
which the priority should not be taken into account. I also think a
case could easily be made to say that this RT business may be too
complex to justifiably pollute the workqueue infrastructure.
However, I assure you this: The concept of a priority ordered queue
coupled with PI-boost is *very* desirable (IMHO) in the RT kernel for
use as an RPC mechanism. Couple that with the fact that there are many
many parallels to what we need to do for an RT/RPC as there is for
workqueues. To say outright that priority *can* never or *should* never
be taken into account is equally as bogus.
Ideally we can modify workqueues so that they both retain their
usefulness as they are used today, as well as provide new features as an
RT/RPC transport mechanism. It would be a bonus if it could be done in
such a way as to advance the usefulness of the workqueues in general.
Perhaps it is not possible, but I doubt some compromise could not be
found.
The motivation to go in this direction is twofold:
1) To avoid having redundant "schedule something arbitrary on a kthread"
type subsystems
2) To possibly allow all users of the workqueue subsystem to benefit
from any advances we can come up with.
As I mentioned in the last mail, to make our proposal work I agree that
the API proposed in the original patch needs to change. The priority
behavior simply cannot be the default. I believe this right there will
make most of your examples of where it's broken fall away.
But that's just it. The thing we are building *does* have to wait, thus
the concern.
Like I said before, the original non-workqueue implementation *only*
provided for a REQUEST/RESPONSE pattern for RPCs. The translation to
workqueues was not as straightforward as we originally hoped for because
they support more than REQUEST/RESPONSE. I realize now that we need to
consider patterns beyond RPCs if we are to get this thing to work so
that all camps are happy.
That is not a bad idea, actually.
Assume the following:
1) Three tasks: A, B, and C where A has the highest priority and C the
lowest.
2) C is your workqueue-daemon, and is bound to processor 0.
3) B is cpu bound and is currently scheduled on processor 0, and A is on
processor 1.
4) A inserts work to C's workqueue.
When will the job complete? Whats more, what if making any more forward
progress on A was predicated on the completion of that job (as it would
be in your RPC type model).
The answer, of course, is that the work will not complete until B gives
up the processor because C is lower priority than B. Both A and C will
block indefinitely behind B, thus effectively inverting B's priority to
become logically higher than A's. This is an example of
priority-inversion and preventing it is one of the things that the patch
tries to address.
No, that's not livelock. That's preemption and it's by design. Work
will continue when the higher-priority items are finished. Livelock
would be logic like "while (!queue_empty());". Here you are still
retaining your position in the queue relative to others at your priority
and below.
That code is completely broken, so I don't think it matters much. But
regardless, the new API changes will address that.
Ok. I am not against this idea (I think). Like I said, the original
system had the work and barrier logic as one unit so boosting at
queue-insert made sense. We just brought the concept straight over in
the port to workqueues without thinking about it the way you are
proposing. Upon cursory thought, your argument for flush-time boosting
makes sense to me.
No, that's just broken in the other direction. We might as well use
smp_call() at that point.
That's fair. It's very possible that when its all said and done it may
turn out to be illogical to make the requirements harmonize into one
subsystem. They may be simply too different from one another justify
the complexity needed to managed between the two worlds. But I am not
convinced that is the case just yet. I think we can make it work for
everyone with a little more design work.
I will submit a new patch later which addresses these concerns.
Regards,
-Greg
-