Re: csum offload and af_packet

Previous thread: [PATCH net-2.6 2/2] ipv6: ip6_route.h cleanup. by Rami Rosen on Sunday, August 10, 2008 - 11:08 pm. (2 messages)

Next thread: LRO restructuring? by Andrew Gallatin on Monday, August 11, 2008 - 6:30 am. (10 messages)
From: Rusty Russell
Date: Monday, August 11, 2008 - 12:44 am

This is, from one point of view, a dhclient bug.  It opens a packet socket, 
and we'd tell it that the checksum is incomplete via auxdata, but it doesn't 
have auxdata.

On the other hand, perhaps we should copy the packet and checksum it in this 
case?  Not many devices input packets with partial csums, so dhclient doesn't 
normally hit this.  Xen might though?

ie, in net/packet/af_packet.c's packet_recvmsg():

	if (pkt_sk(sk)->auxdata) {
		struct tpacket_auxdata aux;

		aux.tp_status = TP_STATUS_USER;
		if (skb->ip_summed == CHECKSUM_PARTIAL)
			aux.tp_status |= TP_STATUS_CSUMNOTREADY;
		aux.tp_len = PACKET_SKB_CB(skb)->origlen;
		aux.tp_snaplen = skb->len;
		aux.tp_mac = 0;
		aux.tp_net = skb_network_offset(skb);
		aux.tp_vlan_tci = skb->vlan_tci;

		put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
	}

add an "else if (skb->ip_summed == CHECKSUM_PARTIAL)" case...

Rusty.
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 2:51 am

If you're going to hack the guest kernel, then you might as well
fix the guest DHCP client.  However, we should default checksum
offload to off until we know that the client is capable of handling
it (to handle guests that haven't been fixed).

One easy way of doing this is to hook up the rx checksum offload
option in the guest with the tx offload option in the host.  In
other words, when rx offload is enabled in the guest we enable
tx offload in the host, and disable it vice versa.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Rusty Russell
Date: Monday, August 11, 2008 - 6:50 am

I think this is deeper than that.  This case is actually unusual, in that the 
packet really does arrive with a partial csum.  But usually, we're exposing 
an internal detail of our stack at this point.  Seems like we shouldn't if we 

We can trivially disable it in the guest or host; that's not the problem.  We 
can even disable csum offload just for UDP in the host.  But should we 
really?

Rusty.
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 5:32 pm

I disagree.  If you're using AF_PACKET you're asking to see the
bare details.  If you want to see the censored version you can

It's not about disabling it, it's about enabling it dynamically
once guest user-space is sure that *it* can handle this.

Cheers,`
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Monday, August 11, 2008 - 5:51 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

Unfortunately the censored version doesn't allow you to get at the
link level headers, which is what at least some of these applications
want.
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 5:58 pm

Yes that is certainly true for the DHCP server, but Rusty was
complaining about the DHCP client which certainly does not need
the LL headers on reception (although using it is definitely more
convenient since you need it on transmit anyway).

I agree that handling this in AF_PACKET is certainly possible,
and for that matter not extremely difficult.  However, my point is
that doing this for the purposes of virtualisation is completely
pointless.

The only time you need this is when you have an old guest that
you cannot modify.  If you could modify it you can always give it
a patched DHCP client (I've alread patched dhcp-client for Xen
so that should work properly for KVM/lguest if you're using the
latest version).

And if you can't modify it then patching AF_PACKET is completely
pointless since we need to do this in the guest kernel and not the
host kernel.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Ingo Oeser
Date: Tuesday, August 12, 2008 - 9:17 am

Are you talking about modifying the KVM client image?
There may be reasons, why this is impossible or at least highly undesired.

Before virtualisation developers of embedded stuff need to seal away
their whole machines for their development and test environment 
together with sample hardware.

Now using virtualisation they can at least virtualize their development
and test environment, save lots of cost and not risking that their sealed 
away hardware will not start again, when they need to fix a critical bug 
10-20 years later.

Now try to have them change the KVM client image :-)

Extreme special case, not worth optimizing for, I know. But that special
case is a legal one like any other special case we care about now.


Best Regards

Ingo Oeser
--

From: Herbert Xu
Date: Tuesday, August 12, 2008 - 4:37 pm

Oh I totally agree that there are lots of scenarios where you
want to have an unmolested guest image.  My point was that if
you're going to touch the guest kernel anyway you might as well
fix the guest user-space instead.

This is also why I've argued that the default should be to disable
TX checksums until the guest enables it so that old guests that
know nothing about this can continue to work.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Tuesday, August 12, 2008 - 5:55 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

However, I think Rusty may have a point.

The whole point of the auxdata thing was that if your tool is smart
you can get the notification that the checksum is going to be HW
computed and therefore keep capturing at a very low cost.

But for a stupid AF_PACKET user, like dhclient currently is, there is
no harm in COW'ing the packet header area and calculating the checksum
for them in this case.

I kind of hope that libpcap/tcpdump has been taught about auxdata
by now, has it? :-/

--

From: Herbert Xu
Date: Tuesday, August 12, 2008 - 6:09 pm

Right I'm not against adding this to AF_PACKET.

It's just that the virtualisation problem that started all this
is just as easily resolved by user-space.  In fact the dhcp-client
code in RHEL/Fedora has been patched to do exactly this a long time

Unfortunately no.  This also means that if we do do this in AF_PACKET,
then running a packet dump on a fast outbound interface may become more
expensive.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Tuesday, August 12, 2008 - 6:17 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

It seems pointless for us to add features like mmap() support and
auxdata if the most important client library and tool users of these
features never take it up.

libpcap and tcpdump have been notoriously difficult in this area.
--

From: Herbert Xu
Date: Tuesday, August 12, 2008 - 6:21 pm

Well auxdata at least is used by dhcp-common and therefore dhcp-client

Yes the disconnect between the kernel community and the rest is
large.  The only successful examples seems to be where the user-
space code is maintained by a kernel developer, e.g., iproute.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Tuesday, August 12, 2008 - 6:25 pm

From: Herbert Xu <herbert@gondor.apana.org.au>


Isn't that the truth...

Alexey was maintaining his own semi-fork of libpcap and tcpdump for
years for the same reasons.

Let's write a real packet capturing tool that will be fast and that we
can add quick support for new kernel features to.

I mean, what other alternative is there?  The pcap/tcpdump folks have
had 10+ years to get their act in gear.
--

From: Herbert Xu
Date: Tuesday, August 12, 2008 - 6:37 pm

He had a whole suite of things, some of which are now finally

Maybe we could start from Alexey's code base.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Patrick McHardy
Date: Wednesday, August 13, 2008 - 4:26 am

Actually the current CVS version understands auxdata and uses it to
reconstruct VLAN headers. CVS seems to be broken though, I can send
you a tarball with the current version if you want.
--

From: David Miller
Date: Sunday, August 17, 2008 - 4:08 pm

From: Patrick McHardy <kaber@trash.net>

At first I thought you were saying that the code in their current
CVS is broken, but now I understand that CVS itself is broken for
this project.

That's rediculious.

I'll see if I can put some time together to try and put some trees
together, please send me what you have Patrick, thanks.

If I'm successful I'll put GIT trees up on kernel.org for tcpdump
and libpcap which we can hack on.

Thanks guys.
--

From: Herbert Xu
Date: Sunday, August 17, 2008 - 6:10 pm

That's awesome! Thanks Dave!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Sunday, August 17, 2008 - 6:12 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

There is a libpcap.git up in my dir on kernel.org:

	master.kernel.org:/pub/scm/linux/kernel/git/davem/libpcap.git

It's just a bare 0.9.8 import plus a commit to get rid of all of the
rcsid strings.
--

From: Rusty Russell
Date: Monday, August 11, 2008 - 7:27 pm

Then should we insist the user set PACKET_AUXDATA?  Even then, the format of 
that cmsg will have to be enhanced as we change kernel internals.  Which is 
probably why you *don't* get to see the bare details: you get a flag 
saying "oh, I know the checksum is bad".  Without the csum_start/csum_offset 
fields you can't even calculate what it will be.

The dhcp client thing is a symptom which can be fixed, but are we doing the 
right thing?  (Tho for lguest this is a new problem with the current kernel, 

Oh, I see.  I'd have to think harder; I'm not sure if we have all the pieces 
at the moment for virtio or would need a boutique mechanism for this (usually 
the host doesn't change the features it offers, even if device resets).  May 
be easier to suppress csum offload for dhcp packets in the host...

Rusty.
--

Previous thread: [PATCH net-2.6 2/2] ipv6: ip6_route.h cleanup. by Rami Rosen on Sunday, August 10, 2008 - 11:08 pm. (2 messages)

Next thread: LRO restructuring? by Andrew Gallatin on Monday, August 11, 2008 - 6:30 am. (10 messages)