Any chance the NIC hardware could provide that guarantee?
8139cp, for example, has two TX DMA rings, with hardcoded
characteristics: one is a high prio q, and one a low prio q. The logic
is pretty simple: empty the high prio q first (potentially starving
low prio q, in worst case).
In terms of overall parallelization, both for TX as well as RX, my gut
feeling is that we want to move towards an MSI-X, multi-core friendly
model where packets are LIKELY to be sent and received by the same set
of [cpus | cores | packages | nodes] that the [userland] processes
dealing with the data.
There are already some primitive NUMA bits in skbuff allocation, but
with modern MSI-X and RX/TX flow hashing we could do a whole lot more,
along the lines of better CPU scheduling decisions, directing flows to
clusters of cpus, and generally doing a better job of maximizing cache
efficiency in a modern multi-thread environment.
IMO the current model where each NIC's TX completion and RX processes
are both locked to the same CPU is outmoded in a multi-core world with
modern NICs. :)
But I readily admit general ignorance about the kernel process
scheduling stuff, so my only idea about a starting point was to see how
far to go with the concept of "skb affinity" -- a mask in sk_buff that
is a hint about which cpu(s) on which the NIC should attempt to send and
receive packets. When going through bonding or netfilter, it is trivial
to 'or' together affinity masks. All the various layers of net stack
should attempt to honor the skb affinity, where feasible (requires
interaction with CFS scheduler?).
Or maybe skb affinity is a dumb idea. I wanted to get people thinking
on the bigger picture. Parallelization starts at the user process.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html