This is also very important for routing performance.
Experiences from practical 10GbE routing tests (done by Roberts team and
my self), reveals that we can only achieve (close to) 10Gbit/s routing
performance when carefully making sure that the rx-queue and tx-queue runs
on the same CPU. (Not doing so really kills performance).
Currently I'm using some patches by Jens L