[PATCH] tun: orphan an skb on tx

Previous thread: none

Next thread: Re: [PATCH] vhost-net: fix vq_memory_access_ok error checking by Michael S. Tsirkin on Tuesday, April 13, 2010 - 8:01 am. (2 messages)
From: Michael S. Tsirkin
Date: Tuesday, April 13, 2010 - 7:59 am

The following situation was observed in the field:
tap1 sends packets, tap2 does not consume them, as a result
tap1 can not be closed. This happens because
tun/tap devices can hang on to skbs undefinitely.

As noted by Herbert, possible solutions include a timeout followed by a
copy/change of ownership of the skb, or always copying/changing
ownership if we're going into a hostile device.

This patch implements the second approach.

Note: one issue still remaining is that since skbs
keep reference to tun socket and tun socket has a
reference to tun device, we won't flush backlog,
instead simply waiting for all skbs to get transmitted.
At least this is not user-triggerable, and
this was not reported in practice, my assumption is
other devices besides tap complete an skb
within finite time after it has been queued.

A possible solution for the second issue
would not to have socket reference the device,
instead, implement dev->destructor for tun, and
wait for all skbs to complete there, but this
needs some thought, probably too risky for 2.6.34.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Yan Vugenfirer <yvugenfi@redhat.com>

---

Please review the below, and consider for 2.6.34,
and stable trees.

 drivers/net/tun.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 96c39bd..4326520 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
+	/* Orphan the skb - required as we might hang on to it
+	 * for indefinite time. */
+	skb_orphan(skb);
+
 	/* Enqueue packet */
 	skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
 	dev->trans_start = jiffies;
-- 
1.7.0.2.280.gc6f05
--

From: Herbert Xu
Date: Tuesday, April 13, 2010 - 8:12 am

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Jan Kiszka
Date: Tuesday, April 13, 2010 - 8:36 am

And before that, tap1 may not be able to send further packets to anyone
else on the bridge as its TX resources were blocked by tap2 - that's

Nice solution, thanks!

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--

From: Eric Dumazet
Date: Tuesday, April 13, 2010 - 9:40 am

After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
send one single frame. Is it OK ?

Back to the problem : tap1 cannot be closed.

Why ? because of refcounts ?

When a socket with inflight tx packets is closed, we dont block the
close, we only delay the socket freeing once all packets were delivered
and freed.



--

From: Jan Kiszka
Date: Tuesday, April 13, 2010 - 9:52 am

I think if that's a real issue, you have to apply traffic shaping to the
untrusted nodes. The existing flow-control scheme was fragile anyway as
you had to translate packet lengths on TX side to packet counts on RX.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--

From: Michael S. Tsirkin
Date: Tuesday, April 13, 2010 - 10:39 am

Yes :) This was always possible. Number of senders needed to flood
a receiver might vary depending on send/recv queue size
that you set. External sources can also fill your RX queue
if you let them. In the end, we need to rely on the scheduler for fairness,


Which is wrong, since this is under userspace control, so you get
unkillable processes.

-- 
MST
--

From: Eric Dumazet
Date: Tuesday, April 13, 2010 - 11:31 am

We do not get unkillable processes, at least with sockets I was thinking
about (TCP/UDP ones).

Maybe tun sockets can behave the same ?

Herbert Acked your patch, so I guess its OK, but I think it can be
dangerous.

Anyway my feeling is that we try to add various mechanisms to keep a
hostile user flooding another one.

For example, UDP got memory accounting quite recently, and we added
socket backlog limits very recently. It was considered not needed few
years ago.



--

From: Michael S. Tsirkin
Date: Tuesday, April 13, 2010 - 1:25 pm

Looks like that's what my patch does: ip_rcv seems to call



--

From: Eric Dumazet
Date: Tuesday, April 13, 2010 - 1:38 pm

Well, I was speaking of tx side, you speak of receiving side.
An external flood (coming from another domain) is another problem.

A sender might flood the 'network' inside our domain. How can we
reasonably limit the sender ?

Maybe the answer is 'We can not', but it should be stated somewhere, so
that someone can address this point later.



--

From: Michael S. Tsirkin
Date: Tuesday, April 13, 2010 - 1:43 pm

And whatever's done should ideally work for tap to IP
and IP to IP sockets as well, not just tap to tap.

-- 
MST
--

From: Herbert Xu
Date: Tuesday, April 13, 2010 - 5:58 pm

The tun socket accounting was never designed to stop it from
flooding another tun interface.  It's there to stop it from
transmitting above a destination interface TX bandwidth and
cause unnecessary packet drops.  It also limits the total amount
of kernel memory that can be pinned down by a single tun interface.

In this case, all we're doing is shifting the accounting from the
"hardware" queue to the qdisc queue.

So your ability to flood a tun interface is essentially unchanged.

BTW we do the same thing in a number of hardware drivers, as well
as virtio-net.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Wednesday, April 14, 2010 - 4:55 am

From: Herbert Xu <herbert@gondor.apana.org.au>

Right.  Although this reminds me about the whole SKB
orphaning on xmit issue that keeps coming back to haunt
us.

If there weren't odd references to the SKB's socket in
the packet scheduler et al. we could just orphan these
things right upon entry to the qdisc and not have to
add hacks like this to every driver.

In fact... maybe we can just do it in dev_hard_queue_xmit()
since we are out of the qdisc at that point.... but I guess
there might be weird drivers that want the SKB socket in
their ->xmit routine...  Ho hum.

In any event that's net-next-2.6 exploratory material, and I've
applied this patch to net-2.6, thanks!
--

From: Michael S. Tsirkin
Date: Wednesday, April 21, 2010 - 4:35 am

This is commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5 in net-2.6
Please cherry-pick this fix in stable kernels (2.6.32 and 2.6.33).

Thanks!

-- 
MST
--

From: Jan Kiszka
Date: Wednesday, April 21, 2010 - 4:46 am

Before I forget again:

Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
(namely the field test of our customer)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--

From: Michael S. Tsirkin
Date: Wednesday, April 21, 2010 - 4:45 am

From: Greg KH
Date: Wednesday, April 21, 2010 - 12:16 pm

David Miller queues up the patches for the network subsystem for the
stable trees, and then forwards them to me when he feels they are ready.

So I'll defer to him on this one and wait for it to come from him.

thanks,

greg k-h
--

From: Michael S. Tsirkin
Date: Tuesday, September 14, 2010 - 8:20 am

I looked at the macvtap driver and it seems that it should
have the below issue, same as tap.
Arnd?

--

Previous thread: none

Next thread: Re: [PATCH] vhost-net: fix vq_memory_access_ok error checking by Michael S. Tsirkin on Tuesday, April 13, 2010 - 8:01 am. (2 messages)