Actually I had a discussion on that with Oleg Nesterov. If you remember my
original solution (ie centralized cpu_isolate_map) was to completely redirect
work onto other cpus. Then you pointed out that it's the flush_() that really
makes the box stuck. So I started thinking about redoing the flush. While
looking at the code I realized that if I only change the flush_() then queued
work can get stale so to speak. ie Machine does not get stuck but some work
submitted on the isolated cpus will sit there for a long time. Oleg pointed
out exact same thing. So the simplest solution that does not require any
surgery to the workqueue is to just move the threads to other cpus. I did not
want to get into too much detail on the workqueue stuff here. I'll start a
separate thread on this.
As I pointed out, there are a bunch of other kthreads like: kswapd, kacpid,
pdflush, khubd, etc, etc, that clearly do not need any pinning but still
violate cpuset constraints they inherit from kthreadd.
Max
--