Re: [PATCH 1/2] dccp: Minimise header option overhead in setting the MPS

Previous thread: [net-next PATCH 0/3] Introduce the Intel 82599 10 Gigabit Network Adapter family by Jeff Kirsher on Friday, February 27, 2009 - 6:44 pm. (9 messages)

Next thread: [PATCH 0/17]: tcp & more tcp by Ilpo Järvinen on Saturday, February 28, 2009 - 7:44 am. (35 messages)
From: Gerrit Renker
Date: Saturday, February 28, 2009 - 1:38 am

Hi Dave,

can you please consider this update, changes are summarized below and
patches have been in the test tree for a long while - it builds upon the
feature-negotiation infrastructure.

Gerrit


Patch summary:
--------------
The set fixes issues in setting the Maximum Packet Size (aka MSS/MTU).

Making efficient use of available packet space is even more important in
DCCP than in UDP, since fragmentation is discouraged as per RFC 4340, 14.


Patch #1: Computes MPS for common data-packet options and dynamically 
	  adapts this value depending on negotiated features.
Patch #2: Fixes a problem that arises for the MPS due to the overlap of
	  half-established and established connection phases in DCCP.


Patch stats:
------------
 ackvec.h |    3 +++
 dccp.h   |    5 ++++-
 output.c |   37 ++++++++++++++++++++++++++++---------
 3 files changed, 35 insertions(+), 10 deletions(-)


The set is also available for viewing online, at
http://eden-feed.erg.abdn.ac.uk/cgi-bin/gitweb.cgi?p=dccp_exp.git;a=commitdiff;h=6a58d...
--

From: Gerrit Renker
Date: Saturday, February 28, 2009 - 1:38 am

This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.

It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.

Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.

There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.

Further changes:
----------------
 Adjusted the type of 'cur_mps' to match the unsigned return type of the
 function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
---
 net/dccp/ackvec.h |    3 +++
 net/dccp/output.c |   22 ++++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -20,6 +20,9 @@
 /* We can spread an ack vector across multiple options */
 #define DCCP_MAX_ACKVEC_LEN (DCCP_SINGLE_OPT_MAXLEN * 2)
 
+/* Estimated minimum average Ack Vector length - used for updating MPS */
+#define DCCPAV_MIN_OPTLEN	16
+
 #define DCCP_ACKVEC_STATE_RECEIVED	0
 #define DCCP_ACKVEC_STATE_ECN_MARKED	(1 << 6)
 #define DCCP_ACKVEC_STATE_NOT_RECEIVED	(3 << 6)
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -161,21 +161,27 @@ unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu)
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
 	u32 ccmps = dccp_determine_ccmps(dp);
-	int cur_mps = ccmps ? min(pmtu, ccmps) : pmtu;
+	u32 cur_mps = ccmps ? min(pmtu, ccmps) : pmtu;
 
 	/* Account for header lengths and IPv4/v6 option overhead */
 	cur_mps -= (icsk->icsk_af_ops->net_header_len + icsk->icsk_ext_hdr_len +
 		    ...
From: Gerrit Renker
Date: Saturday, February 28, 2009 - 1:38 am

This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.

During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).

However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).

Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.

The solution chosen therefore is a two-phase strategy:

   If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
   to carry the options, and the feature-negotiation list is then flushed.

   This means that the server gets two Acks for one Response. If both Acks get
   lost, it is probably better to restart the connection anyway and devising yet
   another special-case does not seem worth the extra complexity.

The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.

The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/dccp.h   |    5 ++++-
 net/dccp/output.c |   15 ++++++++++++++-
 2 ...
From: David Miller
Date: Monday, March 2, 2009 - 4:08 am

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>

Applied.
--

From: David Miller
Date: Monday, March 2, 2009 - 4:08 am

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>

Applied.
--

Previous thread: [net-next PATCH 0/3] Introduce the Intel 82599 10 Gigabit Network Adapter family by Jeff Kirsher on Friday, February 27, 2009 - 6:44 pm. (9 messages)

Next thread: [PATCH 0/17]: tcp & more tcp by Ilpo Järvinen on Saturday, February 28, 2009 - 7:44 am. (35 messages)