Hi, You mentioned in the recent "Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator" thread that you were planning to restructure LRO to preserve headers so as to make forwarding possible without totally disabling LRO. For lro_receive_frags() based LRO, it would be ideal to locate the header in place in the frag via the mac_hdr argument to the get_frag_header() callback. Eg, I'm hoping that neither the driver nor the LRO module will need to allocate extra memory per frame and copy the headers to it in the common case when forwarding is not enabled. That would add quite a bit of overhead. With respect to hardware LRO and headers: Would it be possible to notify the driver via some sort of callback whether the headers are required? I think most hardware LRO implementations are going to collapse the headers, and having the option to fallback to software LRO for forwarding might be needed for those devices which will throw away the intermediate headers. Last, have you considered simply allowing "inexact" forwarding, where the ingress NIC is doing LRO and the egress nic is doing TSO? You loose exact framing information (eg, what you emit might not be framed exactly as you receive it), but you can still do filtering, and the host overhead is very low. Thanks, Drew --
From: Andrew Gallatin <gallatin@myri.com> Intermediate nodes are not supposed to change the transport layer checksum if at all possible, especially on routers. Otherwise it is much more difficult to diagnose checksum errors, and figure out what caused such an error. When the router doesn't modify the checksum, we know it's an end-node. Even a firewall only "adjusts" checksums based upon packet modifications for NAT and such, which will preserve end-node created errors. So no this isn't really an option. This is why Herbert wants to preserve the original headers, we're not supposed to change them. --
Indeed. Nor should they change lengths, or anything else. Everything about this "inexact" forwarding is illegal as hell. However, you have to admit that it is an interesting hack :) Drew --
Solutions like this have been deployed. For instance, many satellite networks use transparent TCP proxies to mitigate the effect of large latencies on older TCP stacks that don't have modern congestion control algorithms. Surprisingly there are actually very few problems. The biggest one (apart from scalability) is with non-TCP traffic masquerading as TCP such as Cisco's VPN solution. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
You don't have to save the whole thing, just save enough so we can easily/exactly reconstruct it on output, i.e., save the lengths. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
From: Herbert Xu <herbert@gondor.apana.org.au> And the checksums :-) As an intermediate node we don't want to touch the checksum. The length and the checksum is two u16 values, which would be able to fit in a single 32-bit descriptor or something like that. --
Yep. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Even if it was verified I think you want to keep the checksums from the header. Since an intermediate device isn't supposed to be peeking at the TCP part anyway, it wouldn't do to drop the segment ourselves, pass it along to be dropped by the ultimate reciever. And if there is something amis in the verification or the regeneration, we don't want to introduce silent data corruption. Likely that also goes for the IP header checksum... rick jones --
From: Rick Jones <rick.jones2@hp.com> IP header is a little different, intermediate nodes should verify it (and we do adjust it when decrementing TTL). --
Well I wasn't suggesting that it be dropped, but simply skip LRO if the inbound packet fails the checksum check. But yeah, it's only two bytes so we might as well always have it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | ixgbe: fix several counter register errata |
| Linux Kernel Mailing List | b43: fix build with CONFIG_SSB_PCIHOST=n |
| Linux Kernel Mailing List |
