virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.
However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:
"... it is not allowed for your TX mitigation scheme
to let TX packets "hang out" in the TX ring unreclaimed
forever if no new TX packets are sent."
Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.
This fixes any easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
drivers/net/virtio_net.c | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 555b70c..f331168 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,8 @@ struct virtnet_info
struct net_device *dev;
struct napi_struct napi;
+ struct timer_list xmit_free_timer;
+
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
@@ -227,6 +229,15 @@ static void free_old_xmit_skbs(struct virtnet_info *vi)
}
}
+static void xmit_free(unsigned long data)
+{
+ struct virtnet_info *vi = (void *)data;
+
+ netif_tx_lock(vi->dev);
+ free_old_xmit_skbs(vi);
+ netif_tx_unlock(vi->dev);
+}
+
static int start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -295,6 +306,8 @@ again:
}
vi->svq->vq_ops->kick(vi->svq);
+ mod_timer(&vi->xmit_free_timer, jiffies + (HZ/10));
+
return 0;
}
@@ -396,6 +409,10 @@ static int virtnet_probe(struct virtio_device *vdev)
skb_queue_head_init(&vi->recv);
...Hi Mark, This patch is fine, but it's better to do it from skb_xmit_done(). Of course, this is usually called from an interrupt handler, so it's not entirely trivial: we can't free the skbs there. A softirq is probably the answer here, but AFAICT that's old fashioned. Not sure what the right way of doing this is now... Rusty. --
Hi Rusty,
Sorry 'bout the lag ...
Unless I'm missing something, we only get this callback when we've
stopped the queue and we're waiting for buffers to be freed up.
In the normal case, where the callback is disabled, we don't get any
notification that the host has finished with the buffer ... hence the
need for a timer.
Thanks,
Mark.
Subject: [PATCH] virtio_net: free transmit skbs in a timer
virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.
However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:
"... it is not allowed for your TX mitigation scheme
to let TX packets "hang out" in the TX ring unreclaimed
forever if no new TX packets are sent."
Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.
This fixes an easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++++++--
1 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f926b5a..69b308a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -44,6 +44,8 @@ struct virtnet_info
/* The skb we couldn't send because buffers were full. */
struct sk_buff *last_xmit_skb;
+ struct timer_list xmit_free_timer;
+
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
@@ -230,9 +232,23 @@ static void free_old_xmit_skbs(struct virtnet_info *vi)
}
}
+static void xmit_free(unsigned long data)
+{
+ struct virtnet_info *vi = (void ...Sorry to barge in late, but IMO the timer should be on the host, which is cheaper than on the guest (well, a 100ms timer is likely zero cost, but I still don't like it). the host should fire a tx completion interrupt whenever the completion queue has "enough" entries, where we can define "enough" now as the halfway mark or a timer expiry, whichever comes earlier. We can later improve "enough" to be "just enough so the timer never triggers" and adjust it dynamically. It probably doesn't matter for Linux, but I don't want to punish guests that can do true async networking and depend on timely completion notification. -- error compiling committee.c: too many arguments to function --
This implies that we should not be supressing notifications in the guest at all (unless we're sure there are more packets to come, which currently we never are: that needs new net infrastructure). But that means we'd get a notification on every xmit at the moment. Benchmarks anyone? Rusty. --
We don't have to be sure, just reasonably confident. If we see a stream of packets, we open the window, but set a timer in case we're wrong. The expectation is that the timer will only fire when tx rate drops (or Notification on every xmit will surely kill performance. I'm trying to get batching to work but also good latency when the link is not saturated. -- error compiling committee.c: too many arguments to function --
I think Rusty is speaking from the POV of the guest driver - i.e. that virtio_net should never disable notifications on the xmit queue using disable_cb()? Sounds like you think agree, but that the host side should throttle the rate of xmit notifications? Cheers, Mark. --
Yes. -- error compiling committee.c: too many arguments to function --
But performance is going to suck in the meantime, as currently our host doesn't do this. Rusty. --
That's what the feature bits are for. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. --
OK. And since the current situation is that the host doesn't throttle, the feature bit should be "don't throttle, host is doing it for you", and Mark's patch should go in... Cheers, Rusty. --
Yes. We should have thought of this before, though, especially as Xen does req_event and rsp_event allow the other side to indicate when it wants a notification. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. --
Well, we do have such a thing, in the ring suppression flags. Note that DaveM is talking about moving network tx queue into the net drivers themselves, which will make them much more efficient (ie. drain entire queue before kick), which may again change the balance of what the Right Thing is. I'll merge Mark's patch, and look at hacking up a feature to change behaviour to never suppress tx-done interrupts. Thanks, Rusty. --
That depends on whether Linux knows whether more packets are coming. -- error compiling committee.c: too many arguments to function --
Ah, sorry, I misunderstood. No, we don't have a threshold like this, we have an all-or-nothing flag in each direction. We have the ability to add new fields to the rings. I've put it on my TODO to benchmark what this does. It may or may not help. In this case, notification when there are no more packets in xmit ring would be sufficient. We already kick the host when we fill a ring even if The current problem is that hard_start_xmit gets called with a packet, and has no idea if there are more in the tx queue. By having the driver control the queue, it can at least tell that. The question remains whether this will be sufficient for tx mitigation. A program may be about to write more to the socket, and we can't know that. But it can hardly hurt. In turn, sending out such packet bursts will have an effect on packet reuse. Cheers, Rusty. --
Notification on ring full is too late with smp. You need to warn the other side in advance. The reason I'm interested in adjustable thresholds is that you can then to tx mitigation without (usually) suffering the worst-case latency when Right. -- error compiling committee.c: too many arguments to function --
Sure, linux/virtio_ring.h: /* The Host uses this in used->flags to advise the Guest: don't kick me when * you add a buffer. It's unreliable, so it's simply an optimization. Guest * will still kick if it's out of buffers. */ #define VRING_USED_F_NO_NOTIFY 1 /* The Guest uses this in avail->flags to advise the Host: don't interrupt me * when you consume a buffer. It's unreliable, so it's simply an * optimization. */ No, you misunderstand; this is not a performance issue. On xmit, the driver cleans up any old used packets before trying to send anyway. So it doesn't care when xmit packets are used, and suppresses the 'used' interrupt on the xmit virtqueue. Only if the xmit ring is full does it enable the xmit-used notification. This is optimal. The issue is that we *do* actually care when xmit packets are used: we're supposed to free them in a timely manner and if the packet flow stops, we don't. By always sending a used interrupt when *all* packets are used, we would cover this case quite nicely without impacting the normal case of Right, this would be a threshold that the host would set, approx. "when you've put this many packets in the xmit ring, tell me" (the opposite direction of the discussion above). Currently we will kick the host on the first packet, and qemu will suppress the notifications based on some timer and we'll notify it anyway if the ring fills (which is suboptimal). With this technique the host could double the threshold up to some maximum percentage of the ring. While I like the Xen scheme, we can do the same thing from within the guest with the existing scheme using an internal threshold. We are always allowed to send "spurious" notifications to the host, so it can't break anything. Added to TODO. Thanks, Rusty. --
