That patch basically just picks an arbitrary cpu for each flow. This
would spread the load out across cpus, but it doesn't allow any input
from userspace.
We have a current application where there are 16 cores and 16 threads.
They would really like to be able to pin one thread to each core and
tell the kernel what packets they're interested in so that the kernel
can process those packets on that core to gain the maximum caching
benefit as well as reduce reordering issues. In our case the hardware
supports filtering for multiqueues, so we could pass this information
down to the hardware to avoid software filtering.
Either way, it requires some way for userspace to indicate interest in a
particular flow. Has anyone given any thought to what an API like this
would look like?
I suppose we could automatically look at bound network sockets owned by
tasks that are affined to single cpus. This would simplify userspace
but would reduce flexibility for things like packet sockets with socket
filters applied.
Chris
--