Re: [PATCH] virtio_net: free transmit skbs in a timer

Previous thread: debug advice request by plamen.petrov on Wednesday, April 30, 2008 - 5:37 am. (1 message)

Next thread: [PATCH] iTCO_wdt: ICH9DO support by Gabriel C on Wednesday, April 30, 2008 - 7:51 am. (5 messages)
From: Mark McLoughlin
Date: Wednesday, April 30, 2008 - 7:31 am

virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.

However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:

  "... it is not allowed for your TX mitigation scheme
   to let TX packets "hang out" in the TX ring unreclaimed
   forever if no new TX packets are sent."

Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.

This fixes any easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/net/virtio_net.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 555b70c..f331168 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,8 @@ struct virtnet_info
 	struct net_device *dev;
 	struct napi_struct napi;
 
+	struct timer_list xmit_free_timer;
+
 	/* Number of input buffers, and max we've ever had. */
 	unsigned int num, max;
 
@@ -227,6 +229,15 @@ static void free_old_xmit_skbs(struct virtnet_info *vi)
 	}
 }
 
+static void xmit_free(unsigned long data)
+{
+	struct virtnet_info *vi = (void *)data;
+
+	netif_tx_lock(vi->dev);
+	free_old_xmit_skbs(vi);
+	netif_tx_unlock(vi->dev);
+}
+
 static int start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -295,6 +306,8 @@ again:
 	}
 	vi->svq->vq_ops->kick(vi->svq);
 
+	mod_timer(&vi->xmit_free_timer, jiffies + (HZ/10));
+
 	return 0;
 }
 
@@ -396,6 +409,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 	skb_queue_head_init(&vi->recv);
 ...
From: Rusty Russell
Date: Friday, May 2, 2008 - 3:55 am

Hi Mark,

   This patch is fine, but it's better to do it from skb_xmit_done().  Of 
course, this is usually called from an interrupt handler, so it's not 
entirely trivial: we can't free the skbs there.

   A softirq is probably the answer here, but AFAICT that's old fashioned.  
Not sure what the right way of doing this is now...

Rusty.
--

From: Mark McLoughlin
Date: Monday, May 12, 2008 - 1:37 pm

Hi Rusty,
	Sorry 'bout the lag ...


Unless I'm missing something, we only get this callback when we've
stopped the queue and we're waiting for buffers to be freed up.

In the normal case, where the callback is disabled, we don't get any
notification that the host has finished with the buffer ... hence the
need for a timer.


Thanks,
Mark.

Subject: [PATCH] virtio_net: free transmit skbs in a timer

virtio_net currently only frees old transmit skbs just
before queueing new ones. If the queue is full, it then
enables interrupts and waits for notification that more
work has been performed.

However, a side-effect of this scheme is that there are
always xmit skbs left dangling when no new packets are
sent, against the Documentation/networking/driver.txt
guideline:

  "... it is not allowed for your TX mitigation scheme
   to let TX packets "hang out" in the TX ring unreclaimed
   forever if no new TX packets are sent."

Add a timer to ensure that any time we queue new TX
skbs, we will shortly free them again.

This fixes an easily reproduced hang at shutdown where
iptables attempts to unload nf_conntrack and nf_conntrack
waits for an skb it is tracking to be freed, but virtio_net
never frees it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/net/virtio_net.c |   30 ++++++++++++++++++++++++++++--
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f926b5a..69b308a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -44,6 +44,8 @@ struct virtnet_info
 	/* The skb we couldn't send because buffers were full. */
 	struct sk_buff *last_xmit_skb;
 
+	struct timer_list xmit_free_timer;
+
 	/* Number of input buffers, and max we've ever had. */
 	unsigned int num, max;
 
@@ -230,9 +232,23 @@ static void free_old_xmit_skbs(struct virtnet_info *vi)
 	}
 }
 
+static void xmit_free(unsigned long data)
+{
+	struct virtnet_info *vi = (void ...
From: Avi Kivity
Date: Tuesday, May 13, 2008 - 12:47 am

Sorry to barge in late, but IMO the timer should be on the host, which 
is cheaper than on the guest (well, a 100ms timer is likely zero cost, 
but  I still don't like it).

the host should fire a tx completion interrupt whenever the completion 
queue has "enough" entries, where we can define "enough" now as the 
halfway mark or a timer expiry, whichever comes earlier.

We can later improve "enough" to be "just enough so the timer never 
triggers" and adjust it dynamically.  It probably doesn't matter for 
Linux, but I don't want to punish guests that can do true async 
networking and depend on timely completion notification.

-- 
error compiling committee.c: too many arguments to function

--

From: Rusty Russell
Date: Tuesday, May 13, 2008 - 11:07 pm

This implies that we should not be supressing notifications in the guest at 
all (unless we're sure there are more packets to come, which currently we 
never are: that needs new net infrastructure).

But that means we'd get a notification on every xmit at the moment.  
Benchmarks anyone?

Rusty.
--

From: Avi Kivity
Date: Wednesday, May 14, 2008 - 1:59 am

We don't have to be sure, just reasonably confident.  If we see a stream 
of packets, we open the window, but set a timer in case we're wrong.  
The expectation is that the timer will only fire when tx rate drops (or 

Notification on every xmit will surely kill performance.  I'm trying to 
get batching to work but also good latency when the link is not saturated.

-- 
error compiling committee.c: too many arguments to function

--

From: Mark McLoughlin
Date: Thursday, May 15, 2008 - 8:29 am

I think Rusty is speaking from the POV of the guest driver - i.e. that
virtio_net should never disable notifications on the xmit queue using
disable_cb()?

Sounds like you think agree, but that the host side should throttle the
rate of xmit notifications?

Cheers,
Mark.

--

From: Avi Kivity
Date: Thursday, May 15, 2008 - 8:32 am

Yes.

-- 
error compiling committee.c: too many arguments to function

--

From: Rusty Russell
Date: Thursday, May 15, 2008 - 4:25 pm

But performance is going to suck in the meantime, as currently our host 
doesn't do this.

Rusty.
--

From: Avi Kivity
Date: Saturday, May 17, 2008 - 11:40 pm

That's what the feature bits are for.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Rusty Russell
Date: Sunday, May 18, 2008 - 7:16 am

OK.  And since the current situation is that the host doesn't throttle, the 
feature bit should be "don't throttle, host is doing it for you", and Mark's 
patch should go in...

Cheers,
Rusty.
--

From: Avi Kivity
Date: Sunday, May 18, 2008 - 7:27 am

Yes.

We should have thought of this before, though, especially as Xen does 

req_event and rsp_event allow the other side to indicate when it wants a 
notification.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Rusty Russell
Date: Sunday, May 18, 2008 - 6:52 pm

Well, we do have such a thing, in the ring suppression flags.  Note that DaveM 
is talking about moving network tx queue into the net drivers themselves, 
which will make them much more efficient (ie. drain entire queue before 
kick), which may again change the balance of what the Right Thing is.

I'll merge Mark's patch, and look at hacking up a feature to change behaviour 
to never suppress tx-done interrupts.

Thanks,
Rusty.
--

From: Avi Kivity
Date: Monday, May 19, 2008 - 3:26 am

That depends on whether Linux knows whether more packets are coming.

-- 
error compiling committee.c: too many arguments to function

--

From: Rusty Russell
Date: Monday, May 19, 2008 - 5:21 am

Ah, sorry, I misunderstood.  No, we don't have a threshold like this, we
have an all-or-nothing flag in each direction.

We have the ability to add new fields to the rings.  I've put it on my
TODO to benchmark what this does.  It may or may not help.  In this case,
notification when there are no more packets in xmit ring would be
sufficient.  We already kick the host when we fill a ring even if

The current problem is that hard_start_xmit gets called with a packet, and 
has no idea if there are more in the tx queue.  By having the driver control
the queue, it can at least tell that.

The question remains whether this will be sufficient for tx mitigation.  A
program may be about to write more to the socket, and we can't know that.
But it can hardly hurt.

In turn, sending out such packet bursts will have an effect on packet reuse.  

Cheers,
Rusty.



--

From: Avi Kivity
Date: Monday, May 19, 2008 - 6:26 am

Notification on ring full is too late with smp.  You need to warn the 
other side in advance.

The reason I'm interested in adjustable thresholds is that you can then 
to tx mitigation without (usually) suffering the worst-case latency when 


Right.

-- 
error compiling committee.c: too many arguments to function

--

From: Rusty Russell
Date: Monday, May 19, 2008 - 6:37 pm

Sure, linux/virtio_ring.h:

/* The Host uses this in used->flags to advise the Guest: don't kick me when
 * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
 * will still kick if it's out of buffers. */
#define VRING_USED_F_NO_NOTIFY  1
/* The Guest uses this in avail->flags to advise the Host: don't interrupt me
 * when you consume a buffer.  It's unreliable, so it's simply an
 * optimization.  */

No, you misunderstand; this is not a performance issue.  On xmit, the driver 
cleans up any old used packets before trying to send anyway.  So it doesn't 
care when xmit packets are used, and suppresses the 'used' interrupt on the 
xmit virtqueue.  Only if the xmit ring is full does it enable the xmit-used 
notification.  This is optimal.

The issue is that we *do* actually care when xmit packets are used: we're 
supposed to free them in a timely manner and if the packet flow stops, we 
don't.  By always sending a used interrupt when *all* packets are used, we 
would cover this case quite nicely without impacting the normal case of 

Right, this would be a threshold that the host would set, approx. "when you've 
put this many packets in the xmit ring, tell me" (the opposite direction of 
the discussion above).  Currently we will kick the host on the first packet, 
and qemu will suppress the notifications based on some timer and we'll notify 
it anyway if the ring fills (which is suboptimal).  With this technique the 
host could double the threshold up to some maximum percentage of the ring.

While I like the Xen scheme, we can do the same thing from within the guest 
with the existing scheme using an internal threshold.  We are always allowed 
to send "spurious" notifications to the host, so it can't break anything.

Added to TODO.

Thanks,
Rusty.
--

Previous thread: debug advice request by plamen.petrov on Wednesday, April 30, 2008 - 5:37 am. (1 message)

Next thread: [PATCH] iTCO_wdt: ICH9DO support by Gabriel C on Wednesday, April 30, 2008 - 7:51 am. (5 messages)