Re: [PATCH net-next 13/17] tcp: kill eff_sacks "cache", the sole user can calculate itself

Previous thread: net-next-2.6 [Patch 0/2] dccp: Fix issues in setting the MPS by Gerrit Renker on Saturday, February 28, 2009 - 1:38 am. (5 messages)

Next thread: none
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

Hi,

The first one is the retrans leak fix for net-2.6, rest for
net-next (IMHO).

The series includes two fixes which are a result of my adventure
to measure large window loss recovery performance. I still have
some work to do in that area but the simplest in-order behaviors
now easily scales beyond 1GbE even on a low-end machine :-).

--
 i.



--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

There's conflicting assumptions in shifting, the caller assumes
that dupsack results in S'ed skbs (or a part of it) for sure but
never gave a hint to tcp_sacktag_one when dsack is actually in
use. Thus DSACK retrans_out -= pcount was not taken and the
counter became out of sync. Remove obstacle from that information
flow to get DSACKs accounted in tcp_sacktag_one as expected.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
---
 net/ipv4/tcp_input.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a6961d7..c28976a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1374,7 +1374,8 @@ static u8 tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
 
 static int tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
 			   struct tcp_sacktag_state *state,
-			   unsigned int pcount, int shifted, int mss)
+			   unsigned int pcount, int shifted, int mss,
+			   int dup_sack)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
@@ -1410,7 +1411,7 @@ static int tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
 	}
 
 	/* We discard results */
-	tcp_sacktag_one(skb, sk, state, 0, pcount);
+	tcp_sacktag_one(skb, sk, state, dup_sack, pcount);
 
 	/* Difference in this won't matter, both ACKed by the same cumul. ACK */
 	TCP_SKB_CB(prev)->sacked |= (TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS);
@@ -1561,7 +1562,7 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb,
 
 	if (!skb_shift(prev, skb, len))
 		goto fallback;
-	if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss))
+	if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))
 		goto out;
 
 	/* Hole filled allows collapsing with the next as well, this is very
@@ -1580,7 +1581,7 @@ static struct sk_buff *tcp_shift_skb_data(struct ...
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Backtracking to sacked skbs is a horrible performance killer
since the hint cannot be advanced successfully past them...
...And it's totally unnecessary too.

In theory this is 2.6.27..28 regression but I doubt anybody
can make .28 to have worse performance because of other TCP
improvements.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f6f61b3..2471cd4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2065,7 +2065,7 @@ begin_fwd:
 			goto begin_fwd;
 
 		} else if (!(sacked & TCPCB_LOST)) {
-			if (hole == NULL && !(sacked & TCPCB_SACKED_RETRANS))
+			if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
 				hole = skb;
 			continue;
 
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

It is possible that lost_cnt_hint gets underflow in
tcp_clean_rtx_queue because the cumulative ACK can cover
the segment where lost_skb_hint points to only partially,
which means that the hint is not cleared, opposite to what
my (earlier) comment claimed.

Also I don't agree what I ended up writing about non-trivial
case there to be what I intented to say. It was not supposed
to happen that the hint won't get cleared and we underflow
in any scenario.

In general, this is quite hard to trigger in practice.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c28976a..3f2f090 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3273,18 +3273,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 		if (tcp_is_reno(tp)) {
 			tcp_remove_reno_sacks(sk, pkts_acked);
 		} else {
+			int delta;
+
 			/* Non-retransmitted hole got filled? That's reordering */
 			if (reord < prior_fackets)
 				tcp_update_reordering(sk, tp->fackets_out - reord, 0);
 
-			/* No need to care for underflows here because
-			 * the lost_skb_hint gets NULLed if we're past it
-			 * (or something non-trivial happened)
-			 */
-			if (tcp_is_fack(tp))
-				tp->lost_cnt_hint -= pkts_acked;
-			else
-				tp->lost_cnt_hint -= prior_sacked - tp->sacked_out;
+			delta = tcp_is_fack(tp) ? pkts_acked :
+						  prior_sacked - tp->sacked_out;
+			tp->lost_cnt_hint -= min(tp->lost_cnt_hint, delta);
 		}
 
 		tp->fackets_out -= min(pkts_acked, tp->fackets_out);
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2471cd4..fa3c81a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1356,6 +1356,10 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
 	if (limit >= sk->sk_gso_max_size)
 		goto send_now;
 
+	/* Middle in queue won't get any more data, full sendable already? */
+	if ((skb != tcp_write_queue_tail(sk)) && (limit >= skb->len))
+		goto send_now;
+
 	if (sysctl_tcp_tso_win_divisor) {
 		u32 chunk = min(tp->snd_wnd, tp->snd_cwnd * tp->mss_cache);
 
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fa3c81a..3feab4d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -771,7 +771,6 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 
 	BUG_ON(len > skb->len);
 
-	tcp_clear_retrans_hints_partial(tp);
 	nsize = skb_headlen(skb) - len;
 	if (nsize < 0)
 		nsize = 0;
@@ -854,6 +853,12 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 			tcp_verify_left_out(tp);
 		}
 		tcp_adjust_fackets_out(sk, skb, diff);
+
+		if (tp->lost_skb_hint &&
+		    before(TCP_SKB_CB(skb)->seq,
+			   TCP_SKB_CB(tp->lost_skb_hint)->seq) &&
+		    (tcp_is_fack(tp) || TCP_SKB_CB(skb)->sacked))
+			tp->lost_cnt_hint -= diff;
 	}
 
 	/* Link BUFF into the send queue. */
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3feab4d..77af7fa 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1921,6 +1921,8 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	if (skb->len > cur_mss) {
 		if (tcp_fragment(sk, skb, cur_mss, cur_mss))
 			return -ENOMEM; /* We'll try again later. */
+	} else {
+		tcp_init_tso_segs(sk, skb, cur_mss);
 	}
 
 	tcp_retrans_try_collapse(sk, skb, cur_mss);
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was
puzzled by the fact that !tcp_is_fack(tp) leads to early return
near the beginning and the later on tcp_is_fack(tp) was still
used in an if condition. The later check was a left-over from
RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as
there wasn't clear way how to handle this particular check
cheaply in the spirit of RFC3517 (using only SACK blocks, not
holes + SACK blocks as with FACK). I sort of left it there as
a reminder but since it's confusing other people just remove
it and comment the missing-feature stuff instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
---
 net/ipv4/tcp_input.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3f2f090..125b451 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1178,10 +1178,18 @@ static void tcp_mark_lost_retrans(struct sock *sk)
 		if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS))
 			continue;
 
-		if (after(received_upto, ack_seq) &&
-		    (tcp_is_fack(tp) ||
-		     !before(received_upto,
-			     ack_seq + tp->reordering * tp->mss_cache))) {
+		/* TODO: We would like to get rid of tcp_is_fack(tp) only
+		 * constraint here (see above) but figuring out that at
+		 * least tp->reordering SACK blocks reside between ack_seq
+		 * and received_upto is not easy task to do cheaply with
+		 * the available datastructures.
+		 *
+		 * Whether FACK should check here for tp->reordering segs
+		 * in-between one could argue for either way (it would be
+		 * rather simple to implement as we could count fack_count
+		 * during the walk and do tp->fackets_out - fack_count).
+		 */
+		if (after(received_upto, ack_seq)) {
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
 			tp->retrans_out -= tcp_skb_pcount(skb);
 
-- ...
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Some comment about its current state added. So far I have
seen very few cases where the thing is actually useful,
usually just marginally (though admittedly I don't usually
see top of window losses where it seems possible that there
could be some gain), instead, more often the cases suffer
from L-marking spike which is certainly not desirable
(I'll bury improving it to my todo list, but on a low
prio position).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   63 +++++++++++++++++++++++++++++++-------------------
 1 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 125b451..03f5ede 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2461,6 +2461,44 @@ static int tcp_time_to_recover(struct sock *sk)
 	return 0;
 }
 
+/* New heuristics: it is possible only after we switched to restart timer
+ * each time when something is ACKed. Hence, we can detect timed out packets
+ * during fast retransmit without falling to slow start.
+ *
+ * Usefulness of this as is very questionable, since we should know which of
+ * the segments is the next to timeout which is relatively expensive to find
+ * in general case unless we add some data structure just for that. The
+ * current approach certainly won't find the right one too often and when it
+ * finally does find _something_ it usually marks large part of the window
+ * right away (because a retransmission with a larger timestamp blocks the
+ * loop from advancing). -ij
+ */
+static void tcp_timeout_skbs(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+
+	if (!tcp_is_fack(tp) || !tcp_head_timedout(sk))
+		return;
+
+	skb = tp->scoreboard_skb_hint;
+	if (tp->scoreboard_skb_hint == NULL)
+		skb = tcp_write_queue_head(sk);
+
+	tcp_for_write_queue_from(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+		if (!tcp_skb_timedout(sk, ...
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_timer.c |   23 ++++++++++-------------
 1 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 0170e91..b144a26 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -328,19 +328,16 @@ static void tcp_retransmit_timer(struct sock *sk)
 	if (icsk->icsk_retransmits == 0) {
 		int mib_idx;
 
-		if (icsk->icsk_ca_state == TCP_CA_Disorder ||
-		    icsk->icsk_ca_state == TCP_CA_Recovery) {
-			if (tcp_is_sack(tp)) {
-				if (icsk->icsk_ca_state == TCP_CA_Recovery)
-					mib_idx = LINUX_MIB_TCPSACKRECOVERYFAIL;
-				else
-					mib_idx = LINUX_MIB_TCPSACKFAILURES;
-			} else {
-				if (icsk->icsk_ca_state == TCP_CA_Recovery)
-					mib_idx = LINUX_MIB_TCPRENORECOVERYFAIL;
-				else
-					mib_idx = LINUX_MIB_TCPRENOFAILURES;
-			}
+		if (icsk->icsk_ca_state == TCP_CA_Disorder) {
+			if (tcp_is_sack(tp))
+				mib_idx = LINUX_MIB_TCPSACKFAILURES;
+			else
+				mib_idx = LINUX_MIB_TCPRENOFAILURES;
+		} else if (icsk->icsk_ca_state == TCP_CA_Recovery) {
+			if (tcp_is_sack(tp))
+				mib_idx = LINUX_MIB_TCPSACKRECOVERYFAIL;
+			else
+				mib_idx = LINUX_MIB_TCPRENORECOVERYFAIL;
 		} else if (icsk->icsk_ca_state == TCP_CA_Loss) {
 			mib_idx = LINUX_MIB_TCPLOSSFAILURES;
 		} else {
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 77af7fa..61445b5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1767,11 +1767,9 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *next_skb = tcp_write_queue_next(sk, skb);
 	int skb_size, next_skb_size;
-	u16 flags;
 
 	skb_size = skb->len;
 	next_skb_size = next_skb->len;
-	flags = TCP_SKB_CB(skb)->flags;
 
 	BUG_ON(tcp_skb_pcount(skb) != 1 || tcp_skb_pcount(next_skb) != 1);
 
@@ -1791,9 +1789,8 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 	/* Update sequence range on original skb. */
 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq;
 
-	/* Merge over control information. */
-	flags |= TCP_SKB_CB(next_skb)->flags; /* This moves PSH/FIN etc. over */
-	TCP_SKB_CB(skb)->flags = flags;
+	/* Merge over control information. This moves PSH/FIN etc. over */
+	TCP_SKB_CB(skb)->flags |= TCP_SKB_CB(next_skb)->flags;
 
 	/* All done, get rid of second SKB and account for it so
 	 * packet counting does not break.
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Similar to what is done elsewhere in TCP code when double
state checks are being done.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_htcp.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_htcp.c b/net/ipv4/tcp_htcp.c
index 937549b..9520eec 100644
--- a/net/ipv4/tcp_htcp.c
+++ b/net/ipv4/tcp_htcp.c
@@ -115,8 +115,7 @@ static void measure_achieved_throughput(struct sock *sk, u32 pkts_acked, s32 rtt
 		return;
 
 	/* achieved throughput calculations */
-	if (icsk->icsk_ca_state != TCP_CA_Open &&
-	    icsk->icsk_ca_state != TCP_CA_Disorder) {
+	if (!((1 << icsk->icsk_ca_state) & (TCPF_CA_Open | TCPF_CA_Disorder))) {
 		ca->packetcount = 0;
 		ca->lasttime = now;
 		return;
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h       |    1 +
 net/ipv4/tcp_bic.c      |   11 +----------
 net/ipv4/tcp_cong.c     |   21 ++++++++++++++-------
 net/ipv4/tcp_cubic.c    |   11 +----------
 net/ipv4/tcp_scalable.c |   10 ++--------
 net/ipv4/tcp_veno.c     |    7 +------
 net/ipv4/tcp_yeah.c     |    9 +--------
 7 files changed, 21 insertions(+), 49 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 218235d..0366a55 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -685,6 +685,7 @@ extern void tcp_get_allowed_congestion_control(char *buf, size_t len);
 extern int tcp_set_allowed_congestion_control(char *allowed);
 extern int tcp_set_congestion_control(struct sock *sk, const char *name);
 extern void tcp_slow_start(struct tcp_sock *tp);
+extern void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w);
 
 extern struct tcp_congestion_ops tcp_init_congestion_ops;
 extern u32 tcp_reno_ssthresh(struct sock *sk);
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index 7eb7636..3b53fd1 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -149,16 +149,7 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 		tcp_slow_start(tp);
 	else {
 		bictcp_update(ca, tp->snd_cwnd);
-
-		/* In dangerous area, increase slowly.
-		 * In theory this is tp->snd_cwnd += 1 / tp->snd_cwnd
-		 */
-		if (tp->snd_cwnd_cnt >= ca->cnt) {
-			if (tp->snd_cwnd < tp->snd_cwnd_clamp)
-				tp->snd_cwnd++;
-			tp->snd_cwnd_cnt = 0;
-		} else
-			tp->snd_cwnd_cnt++;
+		tcp_cong_avoid_ai(tp, ca->cnt);
 ...
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h      |    1 -
 include/net/tcp.h        |    1 -
 net/ipv4/tcp_input.c     |   15 ++-------------
 net/ipv4/tcp_minisocks.c |    3 +--
 net/ipv4/tcp_output.c    |   12 ++++++------
 5 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 0cd99e6..4b86ad7 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -218,7 +218,6 @@ struct tcp_options_received {
 		snd_wscale : 4,	/* Window scaling received from sender	*/
 		rcv_wscale : 4;	/* Window scaling to send to receiver	*/
 /*	SACKs data	*/
-	u8	eff_sacks;	/* Size of SACK array to send with next packet */
 	u8	num_sacks;	/* Number of SACK blocks		*/
 	u16	user_mss;  	/* mss requested by user in ioctl */
 	u16	mss_clamp;	/* Maximal mss, negotiated at connection setup */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0366a55..055e494 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -926,7 +926,6 @@ extern void tcp_done(struct sock *sk);
 static inline void tcp_sack_reset(struct tcp_options_received *rx_opt)
 {
 	rx_opt->dsack = 0;
-	rx_opt->eff_sacks = 0;
 	rx_opt->num_sacks = 0;
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 03f5ede..e4442a2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4099,7 +4099,6 @@ static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
 		tp->rx_opt.dsack = 1;
 		tp->duplicate_sack[0].start_seq = seq;
 		tp->duplicate_sack[0].end_seq = end_seq;
-		tp->rx_opt.eff_sacks = tp->rx_opt.num_sacks + 1;
 	}
 }
 
@@ -4154,8 +4153,6 @@ static void tcp_sack_maybe_coalesce(struct tcp_sock *tp)
 			 * Decrease num_sacks.
 			 */
 			tp->rx_opt.num_sacks--;
-			tp->rx_opt.eff_sacks = tp->rx_opt.num_sacks +
-					       ...
From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

In the pure assignment case, the earlier zeroing is
still in effect.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c  |    4 +---
 net/ipv4/tcp_output.c |    3 +--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e4442a2..464c8a4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4325,8 +4325,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 
 	TCP_ECN_accept_cwr(tp, skb);
 
-	if (tp->rx_opt.dsack)
-		tp->rx_opt.dsack = 0;
+	tp->rx_opt.dsack = 0;
 
 	/*  Queue data for delivery to the user.
 	 *  Packets in sequence go to the receive queue.
@@ -4445,7 +4444,6 @@ drop:
 		/* Initial out of order segment, build 1 SACK. */
 		if (tcp_is_sack(tp)) {
 			tp->rx_opt.num_sacks = 1;
-			tp->rx_opt.dsack     = 0;
 			tp->selective_acks[0].start_seq = TCP_SKB_CB(skb)->seq;
 			tp->selective_acks[0].end_seq =
 						TCP_SKB_CB(skb)->end_seq;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1555bb7..d5c7245 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -441,8 +441,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 			*ptr++ = htonl(sp[this_sack].end_seq);
 		}
 
-		if (tp->rx_opt.dsack)
-			tp->rx_opt.dsack = 0;
+		tp->rx_opt.dsack = 0;
 	}
 }
 
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 464c8a4..071917d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4248,8 +4248,7 @@ static void tcp_sack_remove(struct tcp_sock *tp)
 		this_sack++;
 		sp++;
 	}
-	if (num_sacks != tp->rx_opt.num_sacks)
-		tp->rx_opt.num_sacks = num_sacks;
+	tp->rx_opt.num_sacks = num_sacks;
 }
 
 /* This one checks to see if we can put data from the
-- 
1.5.6.5

--

From: Ilpo Järvinen
Date: Saturday, February 28, 2009 - 7:44 am

From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

copied was assigned zero right before the goto, so if (copied)
cannot ever be true.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 90b2f3c..d3f9bee 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -683,7 +683,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 
 	err = -EPIPE;
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
-		goto do_error;
+		goto out_err;
 
 	while (psize > 0) {
 		struct sk_buff *skb = tcp_write_queue_tail(sk);
@@ -854,7 +854,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 
 	err = -EPIPE;
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
-		goto do_error;
+		goto out_err;
 
 	while (--iovlen >= 0) {
 		int seglen = iov->iov_len;
-- 
1.5.6.5

--

From: David Miller
Date: Monday, March 2, 2009 - 4:03 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:03 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

I skipped this for the same reason I skipped patch #14.
This could be there to avoid dirtying that part of
the structure when it isn't necessary.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:03 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

I skipped this one.

These tests could be there to avoid dirtying a cacheline
when unnecessary.  And so unless we can prove the condition
always hits and we always do the write, we should keep
the tests there.
--

From: Ilpo Järvinen
Date: Monday, March 2, 2009 - 4:54 am

On Mon, 2 Mar 2009, David Miller wrote:

> From: "Ilpo J
From: David Miller
Date: Monday, March 2, 2009 - 4:57 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

That sounds like a good commit log entry for when you resubmit
this patch :-)
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied, nice work.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:02 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.

I can't believe this check was missing :-)
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:01 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied.
--

From: David Miller
Date: Sunday, March 1, 2009 - 1:22 am

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>

Applied, thanks.
--

Previous thread: net-next-2.6 [Patch 0/2] dccp: Fix issues in setting the MPS by Gerrit Renker on Saturday, February 28, 2009 - 1:38 am. (5 messages)

Next thread: none