logo
Published on KernelTrap (http://kerneltrap.org)

Linux: Window Scaling on the Internet

By Jeremy
Created Jun 14 2006 - 10:15

Mark Lord reported a problem with the upcoming 2.6.17 kernel being unable to access www.everymac.com [1]. He detailed a series of tests that isolated the problem down to a recent changeset titled, "set default max buffers from memory pool size", which goes on to explain, "this patch sets the maximum TCP buffer sizes (available to automatic buffer tuning, not to setsockopt) based on the TCP memory pool size. The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more than 1/128 of the memory pressure threshold." John Heffner explained that somewhere between Mark's server and the webserver was a broken box that needs to be fixed, "in the meantime, disabling window scaling will work around the problem for you."

Linux creator Linus Torvalds suggested, "well, arguably, we shouldn't necessarily have defaults that use window scaling, or we should have ways to recognize automatically when it doesn't work (which may not be possible). It's not like there aren't broken boxes out there, and it might be better to make the default buffer sizes just be low enough that window scaling simply isn't an issue." David Miller responded by pointing out that window scaling has been enabled by default for a long time, but has only been scaling the window by a factor of 1 or 2 until now, "it is impossible to fill a cross-continental connection without using window scaling. A 64K window is all you get without scaling. Big buffers are absolutely necessary, and as John Heffner showed this need is growing exponentially and not slowing down. 6 megabit downlink is pretty commonplace in the US, and the standard is much higher in well connected countries such as South Korea." He also explained that it's not possible to detect broken boxes and dynamically turn of window scaling after it's been nogatiated, "it's immutably active for the entire connection once enabled. Window scaling has been standardized and around for 14 years, RFC1323 was published in May of 1992. How much longer can we wait for it to be deployed properly? :-)"


From: Mark Lord [email blocked]
To: Linux Kernel [email blocked]
Subject: 2.6.17: networking bug??
Date:	Tue, 13 Jun 2006 10:08:51 -0400

Not bloody likely, I suppose.

But with 2.6.17-rc6, I am unable to talk to the webserver at www.everymac.com [2]
and with 2.6.16.18 (configured identically), this works just fine.

This is with a very simple text access: {"telnet www.everymac.com [3] 80", "GET /", "", ""}

Does that site work for anyone else here running 2.6.17-rc6 ??

I've tried it on three different machines (two Pentium-M boxes, and an AMD64 box/kernel),
all with the same results.  NFG with rc6, fine with earlier kernels.  Yes, they are all
behind the same Linux (2.6.16.xx) firewall, but that doesn't seem to bother anything.
Again, just switching to the older 2.6.16.xx kernels (or earlier) works fine.

I'm going insane!  Help!

Kernel .config from one of the machines is attached.


From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 10:26:17 -0400 Here are packet traces from the working (2.6.16) and non-working (2.6.17) kernels. The differences I see are widely varying "window sizes". What would cause this? Here's a partial trace of the working connection 2.6.16.18: IP silvy.localnet.32776 > zippy.localnet.domain: 50718+ A? www.everymac.com [4]. (34) IP zippy.localnet.domain > silvy.localnet.32776: 50718 1/5/5 A 216-145-246-23.rev.dls.net (234) IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: S 2933486277:2933486277(0) win 5840 <mss 1460,sackOK,timestamp 730285 0,nop,wscale 2> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: S 2545625510:2545625510(0) ack 2933486278 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134760199 730285> IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 730448 134760199> IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 730448 134760199> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760217 730448> IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 730626 134760217> IP silvy.localnet.32776 > zippy.localnet.domain: 24229+ A? www.everymac.com [5]. (34) IP zippy.localnet.domain > silvy.localnet.32776: 24229 1/5/5 A 216-145-246-23.rev.dls.net (234) IP silvy.localnet.56225 > 216-145-246-23.rev.dls.net.www: S 2943511062:2943511062(0) win 5840 <mss 1460,sackOK,timestamp 730932 0,nop,wscale 2> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56225: S 3806049331:3806049331(0) ack 2943511063 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134760264 730932> IP silvy.localnet.56225 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 731095 134760264> IP silvy.localnet.56225 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 731095 134760264> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56225: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760281 731095> IP silvy.localnet.56225 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 731274 134760281> IP silvy.localnet.32776 > zippy.localnet.domain: 55754+ A? adserver.kylemedia.com. (40) IP zippy.localnet.domain > silvy.localnet.32776: 55754 1/5/5 A 216-145-246-23.rev.dls.net (249) IP silvy.localnet.56226 > 216-145-246-23.rev.dls.net.www: S 2940109661:2940109661(0) win 5840 <mss 1460,sackOK,timestamp 731360 0,nop,wscale 2> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56226: S 388231707:388231707(0) ack 2940109662 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134760306 731360> And again, from the non-working connection 2.6.17-rc6-git2: IP silvy.localnet.32770 > zippy.localnet.domain: 44986+ A? www.everymac.com [6]. (34) IP zippy.localnet.domain > silvy.localnet.32770: 44986 1/5/5 A 216-145-246-23.rev.dls.net (234) IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: S 3000518105:3000518105(0) win 5840 <mss 1460,sackOK,timestamp 4294759165 0,nop,wscale 6> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: S 3368494549:3368494549(0) ack 3000518106 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134771817 4294759165> IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: . ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294760162 134771817> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 607 win 32798 <nop,nop,timestamp 134771918 4294760162> IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: F 607:607(0) ack 1 win 92 <nop,nop,timestamp 4294770176 134771817> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 608 win 32798 <nop,nop,timestamp 134772918 4294770176> IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: F 206:206(0) ack 608 win 32798 <nop,nop,timestamp 134772918 4294770176> IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: R 3000518713:3000518713(0) win 0 The client machine is "silvy", my firewall/dns box is "zippy", and 216-145-246-23 is www.everymac.com [7]. The differences begin really early in these traces, with 2.6.16.18 using a win size of 1460, and 2.6.17-rc6 using a win size of 92 ???
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 11:00:12 -0400 Mark Lord wrote: .. > The differences I see are widely varying "window sizes". > What would cause this? This is from (working) 2.6.16.18: > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760217 730448> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 730626 134760217> This is from (failing) 2.6.17-rc6-git2: > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: . ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294760162 134771817> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 607 win 32798 <nop,nop,timestamp 134771918 4294760162> Both kernels default to /proc/sys/net/ipv4/tcp_window_scaling == 1, and 2.6.16.18 works regardless of whether I turn it off/on again. But 2.6.17-rc6-git2 fails to work with the webserver at www.everymac.com [8] when /proc/sys/net/ipv4/tcp_window_scaling == 1. Setting this to 0 "fixes" the problem. BUG.
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 11:28:51 -0400 Mmm. I notice that 2.6.17 has a new sysctl related to this stuff: /proc/sys/net/ipv4/tcp_workaround_signed_windows It makes no difference whatsoever for me here when varied while /proc/sys/net/ipv4/tcp_window_scaling==1. The site www.everymac.com [9] is still not browseable until setting /proc/sys/net/ipv4/tcp_window_scaling===0. There's one other difference I see in the tcpdump traces. The first packets from each trace below show different values for "wscale". The old (working) kernels use "wscale 2", whereas 2.6.17 uses "wscale 6". In both cases, the value seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. This is from (working) 2.6.16.18: > > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: S 2933486277:2933486277(0) win 5840 <mss 1460,sackOK,timestamp 730285 0,nop,wscale 2> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: S 2545625510:2545625510(0) ack 2933486278 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134760199 730285> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760217 730448> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 730626 134760217> This is from (failing) 2.6.17-rc6-git2: > > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: S 3000518105:3000518105(0) win 5840 <mss 1460,sackOK,timestamp 4294759165 0,nop,wscale 6> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: S 3368494549:3368494549(0) ack 3000518106 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134771817 4294759165> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: . ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294760162 134771817> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 607 win 32798 <nop,nop,timestamp 134771918 4294760162> Something is broken somewhere.
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 12:58:05 -0400 .. > The site www.everymac.com [10] is still not browseable until > setting /proc/sys/net/ipv4/tcp_window_scaling===0. > > There's one other difference I see in the tcpdump traces. > The first packets from each trace below show different > values for "wscale". The old (working) kernels use "wscale 2", > whereas 2.6.17 uses "wscale 6". In both cases, the value > seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. Okay. More progress here. The calculation of the "wscale" values is based on the "tcp_rmem" sysctl numbers. The defaults for these *differ* between 2.6.16.18 and 2.6.17-rc*. 2.6.16: 4096 87380 174760 2.6.17: 4096 87380 2097152 If I change the tcp_rmem setting on 2.6.17 to match the old value, then the website www.everymac.com [11] becomes accessible again: echo 4096 87380 174760 > /proc/sys/net/ipv4/tcp_rmem Looking at diffs between 2.6.16 and 2.6.17, I see a big rework of the tcp_rmem code in linux/net/ipv4/tcp.c Looks like something got broken there, or possibly the wscale calculations have a bug that is only triggered by the new rmem values ??
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 13:22:35 -0400 Mark Lord wrote: > .. >> The site www.everymac.com [12] is still not browseable until >> setting /proc/sys/net/ipv4/tcp_window_scaling===0. >> >> There's one other difference I see in the tcpdump traces. >> The first packets from each trace below show different >> values for "wscale". The old (working) kernels use "wscale 2", >> whereas 2.6.17 uses "wscale 6". In both cases, the value >> seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. > > Okay. More progress here. The calculation of the "wscale" values > is based on the "tcp_rmem" sysctl numbers. > > The defaults for these *differ* between 2.6.16.18 and 2.6.17-rc*. > > 2.6.16: 4096 87380 174760 > 2.6.17: 4096 87380 2097152 > > If I change the tcp_rmem setting on 2.6.17 to match the old value, > then the website www.everymac.com [13] becomes accessible again: > > echo 4096 87380 174760 > /proc/sys/net/ipv4/tcp_rmem > > Looking at diffs between 2.6.16 and 2.6.17, I see a big rework > of the tcp_rmem code in linux/net/ipv4/tcp.c > > Looks like something got broken there, or possibly the wscale > calculations have a bug that is only triggered by the new rmem values ?? > Okay, here's the blob that broke it. > [TCP]: Set default max buffers from memory pool size > author John Heffner [email blocked] > Sat, 25 Mar 2006 09:34:07 +0000 (01:34 -0800) > committer David S. Miller [email blocked] > Sat, 25 Mar 2006 09:34:07 +0000 (01:34 -0800) > commit 7b4f4b5ebceab67ce440a61081a69f0265e17c2a > tree ac02c685ce23f2440fecbebaa5b55cd47947c03e tree > parent 2babf9daae4a3561f3264638a22ac7d0b14a6f52 commit | commitdiff > [TCP]: Set default max buffers from memory pool size > > This patch sets the maximum TCP buffer sizes (available to automatic > buffer tuning, not to setsockopt) based on the TCP memory pool size. > The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more > than 1/128 of the memory pressure threshold. > > Signed-off-by: John Heffner [email blocked] > Signed-off-by: David S. Miller [email blocked] John / David: Any ideas on what's gone awry here?
From: John Heffner [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 13:39:42 -0400 Mark Lord wrote: > John / David: Any ideas on what's gone awry here? > > Yes, you have some sort of a broken middlebox in your path (firewall, transparent proxy, or similar) that doesn't correctly handle window scaling. Check out this thread: <http://marc.theaimsgroup.com/?l=linux-netdev&m=114478312100641&w=2&gt [14];. The best thing you can do is try to find this broken box and inform its owner that it needs to be fixed. (If you can find out what it is, I'd be interested to know.) In the meantime, disabling window scaling will work around the problem for you. -John
From: Linus Torvalds [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 10:50:51 -0700 (PDT) On Tue, 13 Jun 2006, John Heffner wrote: > > The best thing you can do is try to find this broken box and inform its owner > that it needs to be fixed. (If you can find out what it is, I'd be interested > to know.) In the meantime, disabling window scaling will work around the > problem for you. Well, arguably, we shouldn't necessarily have defaults that use window scaling, or we should have ways to recognize automatically when it doesn't work (which may not be possible). It's not like there aren't broken boxes out there, and it might be better to make the default buffer sizes just be low enough that window scaling simply isn't an issue. I suspect that the people who really want/need window scaling know about it, and could be assumed to know enough to raise their limits, no? Linus
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 14:26:12 -0400 Linus Torvalds wrote: > > On Tue, 13 Jun 2006, John Heffner wrote: >> The best thing you can do is try to find this broken box and inform its owner >> that it needs to be fixed. (If you can find out what it is, I'd be interested >> to know.) In the meantime, disabling window scaling will work around the >> problem for you. > > Well, arguably, we shouldn't necessarily have defaults that use window > scaling, or we should have ways to recognize automatically when it > doesn't work (which may not be possible). > > It's not like there aren't broken boxes out there, and it might be better > to make the default buffer sizes just be low enough that window scaling > simply isn't an issue. > > I suspect that the people who really want/need window scaling know about > it, and could be assumed to know enough to raise their limits, no? Agreed. It's taken me over a month here to realize that the particular webserver in question (www.everymac.com [15]) wasn't "dead", but merely being blocked by my 2.6.17 kernel. All was fine with 2.6.16, as I discovered today. I wonder how many other "dead sites" there are out there, that will be shut off from people when they "upgrade" to 2.6.17 ? I'm a kernel hacker. Most users of 2.6.17 will not be. The default should be something that works "by default". Cheers
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 15:08:59 -0400 Mark Lord wrote: > Linus Torvalds wrote: > >> It's not like there aren't broken boxes out there, and it might be >> better to make the default buffer sizes just be low enough that window >> scaling simply isn't an issue. >> >> I suspect that the people who really want/need window scaling know >> about it, and could be assumed to know enough to raise their limits, no? > > Agreed. It's taken me over a month here to realize that the particular > webserver in question (www.everymac.com [16]) wasn't "dead", but merely being > blocked by my 2.6.17 kernel. All was fine with 2.6.16, as I discovered > today. > > I wonder how many other "dead sites" there are out there, > that will be shut off from people when they "upgrade" to 2.6.17 ? > > I'm a kernel hacker. Most users of 2.6.17 will not be. > The default should be something that works "by default". Further to this, the current behaviour is badly unpredictable. A machine could be working perfectly, not (noticeably) affected by this bug. And then the user adds another stick of RAM to it. Poof.. many sites from the internet stop responding. Obviously the RAM upgrade broke things.. must be bad RAM, right? Err.. no, the networking stack simply decided to become incompatible with certain sites, as a result of the user adding more RAM to their machine. BbD.
From: David Miller [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 14:26:03 -0700 (PDT) From: Mark Lord [email blocked] Date: Tue, 13 Jun 2006 15:08:59 -0400 > Err.. no, the networking stack simply decided to become incompatible > with certain sites, as a result of the user adding more RAM to their > machine. Let's discuss some facts. First, you are getting window scaling by default with the older kernel too. It's just a smaller window scale, using a shift value of say 1 or 2. What these broken middle boxes do is ignore the window scale entirely. So they don't apply a window scale to the advertised windows in each packet. Therefore, they think a smaller amount of window space is being advertised than really is. So they will silently drop packets they think is outside of this bogus window they've calculated. Now, when the window scale is smaller, the connection can still limp along, albeit slowly, making forward progress even in the face of such broken devices because half or a quarter of the window is still available. It will retransmit a lot, and the congestion window won't grow at all. When the window scale is larger, this middle box bug makes it such that not even one packet can fit into the miscalculated window and things wedge. The box thinks that your window is "94" instead of "94 << WINDOW_SCALE". I think OpenBSD's claim (they did have the bug and probably still do for all that I know) was that they wanted to make their firewalling "stateless". This is a bogus argument because by definition you cannot interpret the TCP window without having seen the initial connection startup where the parameters are negotiated, and in particular the window scale which will be used. And you want to say we should try to work around systems designed by people who think this is ok? :-) It is impossible to fill a cross-continental connection without using window scaling. A 64K window is all you get without scaling. Big buffers are absolutely necessary, and as John Heffner showed this need is growing exponentially and not slowing down. 6 megabit downlink is pretty commonplace in the US, and the standard is much higher in well connected countries such as South Korea. Also, as John Heffner mentioned, even if we could detect the broken boxes you can't just "turn off window scaling" after it's been negotiated. It's immutably active for the entire connection once enabled. Window scaling has been standardized and around for 14 years, RFC1323 was published in May of 1992. How much longer can we wait for it to be deployed properly? :-) So the broken boxes, which to be honest are few and far between these days, need to go, they really do.
From: Mark Lord [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 17:49:21 -0400 David Miller wrote: >.. > First, you are getting window scaling by default with the older > kernel too. It's just a smaller window scale, using a shift > value of say 1 or 2. > > What these broken middle boxes do is ignore the window scale > entirely. > > So they don't apply a window scale to the advertised windows in each > packet. Therefore, they think a smaller amount of window space is > being advertised than really is. So they will silently drop packets > they think is outside of this bogus window they've calculated. > > Now, when the window scale is smaller, the connection can still limp > along, albeit slowly, making forward progress even in the face of such > broken devices because half or a quarter of the window is still > available. It will retransmit a lot, and the congestion window won't > grow at all. > > When the window scale is larger, this middle box bug makes it such > that not even one packet can fit into the miscalculated window and > things wedge. The box thinks that your window is "94" instead of > "94 << WINDOW_SCALE". .. Unilaterally following the standard is all well and good for those who know how to get around it when a site becomes inaccessible, but not for Joe User. If it always fails, or always works, that's not such a big problem. I would never have complained if I had never been able to access the web sites in question. But since it IS working in 2.6.16, and got broken in 2.6.17, I'm bloody well going to complain. I suppose the most important objection to our current behaviour is that this behaviour *changes* when something totally unrelated (to Joe User) happens: adding or removing a stick of RAM. So I'm not against the window scaling, just against it's apparent randomness (to the vast majority who are not "in the know"). We should perhaps just have a fixed upper memory setting, as we currently do in 2.6.16, so that the behaviour is predictable. On a related note.. I wonder if we can choose better values for the window size, so that if the scale factor is ignored, we still end up with reasonably sized packets? So that the other box will not think our window is a mere "94" when the scale factor is lost? -ml
From: Rick Jones <rick.jones2@hp.com> Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 15:12:31 -0700 Mark From everything I have read so far (which admittedly hasn't been everything) it sounds like the firewall in question was a ticking timebomb. If 2.6.17 hadn't set it off, something else might very well have done so. Or, if you prefer another metaphore, 2.6.17 was simply the last in a series of straws on the back of the camel what was the firewall. Meta issues of whether or not the camel that is firewalls should have ever been allowed to poke its nose in the Internet Tent notwithstanding :) At the very least, the firewall, if it is going to be "stateless," has to strip the window scaling option from the SYN's that go past. Otherwise, I would be inclined to agree with David that the firewall is fundamentally broken. rick jones
From: David Miller [email blocked] Subject: Re: 2.6.17: networking bug?? Date: Tue, 13 Jun 2006 15:23:01 -0700 (PDT) From: Mark Lord [email blocked] Date: Tue, 13 Jun 2006 17:49:21 -0400 > I suppose the most important objection to our current behaviour > is that this behaviour *changes* when something totally unrelated > (to Joe User) happens: adding or removing a stick of RAM. We are pretty much required to choose the TCP memory parameters based upon how much physical memory is in the machine, and these parameters in-turn are inextricably linked to what kind of window scale we try to use for connections. The behavior is unfortunate, but more unfortunate are the boxes that create these problems in the first place. I believe their lifespan is quite limited. > We should perhaps just have a fixed upper memory setting, as we > currently do in 2.6.16, so that the behaviour is predictable. The change in 2.6.17 was exactly that we needed to increase this upper limit to ~4MB. > On a related note.. I wonder if we can choose better values for > the window size, so that if the scale factor is ignored, we still > end up with reasonably sized packets? So that the other box > will not think our window is a mere "94" when the scale factor > is lost? We have an algorithm that tries to pick something based upon the set of the values we might need to represent in the window field. If the scale is too high, you lose accuracy, since the lower bits get chopped off when the TCP header is being built and the computed window size is shifted down. So we try to pick the smallest scale necessary to represent the largest window size we might end up needing to advertise. A complication here is that we dynamically size both receive and send buffers in response to our growing knowledge of the connection's characteristics over time. So at the beginning we'll use a small buffer size, and as the congestion window grows we'll increase our buffer sizes to fill the pipe. This adds even more considerations for window scale selection, as you can imagine. One final word about window sizes. If you have a connection whose bandwidth-delay-product needs an N byte buffer to fill, you actually have to have an "N * 2" sized buffer available in order for fast retransmit to work.



Related Links:


Source URL:
http://kerneltrap.org/node/6723