Re: LRO restructuring?

Previous thread: Re: [PATCH 0/9][RFC] KVM virtio_net performance by Rusty Russell on Monday, August 11, 2008 - 12:44 am. (19 messages)

Next thread: Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem by David Witbrodt on Monday, August 11, 2008 - 9:04 am. (2 messages)
From: Andrew Gallatin
Date: Monday, August 11, 2008 - 6:30 am

Hi,

You mentioned in the recent "Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI
initiator" thread that you were planning to restructure LRO to
preserve headers so as to make forwarding possible without totally
disabling LRO.

For lro_receive_frags() based LRO, it would be ideal to locate the
header in place in the frag via the mac_hdr argument to the
get_frag_header() callback.  Eg, I'm hoping that neither the driver
nor the LRO module will need to allocate extra memory per frame and
copy the headers to it in the common case when forwarding is
not enabled.  That would add quite a bit of overhead.

With respect to hardware LRO and headers:  Would it be possible
to notify the driver via some sort of callback whether the headers
are required?  I think most hardware LRO implementations are going
to collapse the headers, and having the option to fallback to software
LRO for forwarding might be needed for those devices which will throw
away the intermediate headers.

Last, have you considered simply allowing "inexact" forwarding, where
the ingress NIC is doing LRO and the egress nic is doing TSO?  You
loose exact framing information (eg, what you emit might not be framed
exactly as you receive it), but you can still do filtering, and the
host overhead is very low.

Thanks,

Drew
--

From: David Miller
Date: Monday, August 11, 2008 - 2:03 pm

From: Andrew Gallatin <gallatin@myri.com>

Intermediate nodes are not supposed to change the transport layer
checksum if at all possible, especially on routers.

Otherwise it is much more difficult to diagnose checksum errors,
and figure out what caused such an error.

When the router doesn't modify the checksum, we know it's an end-node.
Even a firewall only "adjusts" checksums based upon packet
modifications for NAT and such, which will preserve end-node created
errors.

So no this isn't really an option.

This is why Herbert wants to preserve the original headers,
we're not supposed to change them.
--

From: Andrew Gallatin
Date: Tuesday, August 12, 2008 - 4:50 am

Indeed.  Nor should they change lengths, or anything else.
Everything about this "inexact" forwarding is illegal as
hell.  However, you have to admit that it is an interesting hack :)

Drew
--

From: Herbert Xu
Date: Tuesday, August 12, 2008 - 7:14 pm

Solutions like this have been deployed.  For instance, many satellite
networks use transparent TCP proxies to mitigate the effect of large
latencies on older TCP stacks that don't have modern congestion
control algorithms.

Surprisingly there are actually very few problems.  The biggest
one (apart from scalability) is with non-TCP traffic masquerading
as TCP such as Cisco's VPN solution.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 5:50 pm

You don't have to save the whole thing, just save enough so we
can easily/exactly reconstruct it on output, i.e., save the lengths.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: David Miller
Date: Monday, August 11, 2008 - 5:54 pm

From: Herbert Xu <herbert@gondor.apana.org.au>

And the checksums :-)  As an intermediate node we don't want
to touch the checksum.

The length and the checksum is two u16 values, which would be
able to fit in a single 32-bit descriptor or something like
that.
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 6:00 pm

Yep.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Rick Jones
Date: Monday, August 11, 2008 - 6:30 pm

Even if it was verified I think you want to keep the checksums from the 
header.   Since an intermediate device isn't supposed to be peeking at 
the TCP part anyway, it wouldn't do to drop the segment ourselves, pass 
it along to be dropped by the ultimate reciever.  And if there is 
something amis in the verification or the regeneration, we don't want to 
  introduce silent data corruption.

Likely that also goes for the IP header checksum...

rick jones
--

From: David Miller
Date: Monday, August 11, 2008 - 6:39 pm

From: Rick Jones <rick.jones2@hp.com>

IP header is a little different, intermediate nodes should verify it
(and we do adjust it when  decrementing TTL).
--

From: Herbert Xu
Date: Monday, August 11, 2008 - 6:53 pm

Well I wasn't suggesting that it be dropped, but simply skip LRO
if the inbound packet fails the checksum check.

But yeah, it's only two bytes so we might as well always have it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

Previous thread: Re: [PATCH 0/9][RFC] KVM virtio_net performance by Rusty Russell on Monday, August 11, 2008 - 12:44 am. (19 messages)

Next thread: Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem by David Witbrodt on Monday, August 11, 2008 - 9:04 am. (2 messages)