We have not seen this problem in our testing.
We do keep the skb processing with the same CPU from RX to TX.
This is done via setting affinity for queues and using custom select_queue.
+static u16 select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+ if( dev->real_num_tx_queues && skb_rx_queue_recorded(skb) )
+ return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
+
+ return smp_processor_id() % dev->real_num_tx_queues;
+}
+
The hash based default for selecting TX-queue generates an uneven
spread that is hard to follow with correct affinity.
We have not been able to generate quite as much traffic from the sender.
Sender: (64 byte pkts)
eth5 4.5 k bit/s 3 pps 1233.9 M bit/s 2.632 M pps
Router:
eth0 1077.2 M bit/s 2.298 M pps 1.7 k bit/s 1 pps
eth1 744 bit/s 1 pps 1076.3 M bit/s 2.296 M pps
Im not sure I like the proposed concept since it decouples RX
processing from receiving.
There is no point collecting lots of packets just to drop them later
in the qdisc.
Infact this is bad for performance, we just consume cpu for nothing.
It is important to have as strong correlation as possible between RX
and TX so we dont receive more pkts than we can handle. Better to drop
on the interface.
We might start thinking of a way for userland to set the policy for
multiq mapping.
Cheers,
Jens Låås
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html