[PATCH v2] xmit_compl_seq: information to reclaim vmsplice buffers

Previous thread: [PATCH net-next-2.6] ipv6: addrconf.h cleanups by Eric Dumazet on Tuesday, September 21, 2010 - 9:57 am. (2 messages)

Next thread: mv643xx pegasos breakage again by pacman on Wednesday, December 31, 1969 - 5:00 pm. (1 message)
From: Tom Herbert
Date: Tuesday, September 21, 2010 - 11:57 am

In this patch we propose to adds some socket API to retrieve the
 "transmit completion sequence number", essentially a byte counter
for the number of bytes that have been transmitted and will not be
retransmitted.  In the case of TCP, this should correspond to snd_una.

The purpose of this API is to provide information to userspace about
which buffers can be reclaimed when sending with vmsplice() on a
socket.

There are two methods for retrieving the completed sequence number:
through a simple getsockopt (implemented here for TCP), as well as
returning the value in the ancilary data of a recvmsg.

The expected flow would be something like:
   - Connect is created
   - Initial completion seq # is retrieved through the sockopt, and is
     stored in userspace "compl_seq" variable for the connection.
   - Whenever a send is done, compl_seq += # bytes sent.
   - When doing a vmsplice the completion sequence number is saved
     for each user space buffer, buffer_compl_seq = compl_seq.
   - When recvmsg returns with a completion sequence number in
     ancillary data, any buffers cover by that sequence number
     (where buffer_compl_seq < recvmsg_compl_seq) are reclaimed
     and can be written to again.
   - If no data is receieved on a connection (recvmsg does not
     return), a timeout can be used to call the getsockopt and
     reclaim buffers as a fallback.

Using recvmsg data in this manner is sort of a cheap way to get a
"callback" for when a vmspliced buffer is consumed.  It will work
well for a client where the response causes recvmsg to return.
On the server side it works well if there are a sufficient
number of requests coming on the connection (resorting to the
timeout if necessary as described above).

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 06edfef..3587082 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -69,6 +69,9 @@
 
 ...
From: Rick Jones
Date: Tuesday, September 21, 2010 - 1:15 pm

If there is a recvmsg(), how often will the data received implicitly/explicitly 
tell the application how much of the previously sent data has been acked?  A 
subsequent request from the client in a persistent (but not pipelined) HTTP 
session implicitly says the data of the previous response was ACKed no?  Is that 
simply too rare to rely upon?

On the bulk side, an application filling the (fixed size at least) socket buffer 
will "know" that the bytes sent a "socket buffer size ago" were ACKed because 
that is what makes room in the socket buffer for data right?

rick jones

ftp://ftp.cup.hp.com/dist/networking/briefs/copyavoid.pdf
--

From: Eric Dumazet
Date: Tuesday, September 21, 2010 - 2:38 pm

I am wondering if this part could be done outside of socket lock,
provided you latch tp->snd_una value right before release_sock();

u32 snd_una;
...
tcp_cleanup_rbuf(sk, copied);
TCP_CHECK_TIMER(sk);
snd_una = tp->snd_una;
release_sock(sk);
tcp_sock_xmit_compl_seq(msg, sk, snd_una);
return copied;



--

From: Eric Dumazet
Date: Tuesday, September 21, 2010 - 2:47 pm

and this check is not necessary or correct ?


->

 if (sock_flag(sk, SOCK_XMIT_COMPL_SEQ))
	put_cmsg(msg, SOL_SOCKET, SCM_XMIT_COMPL_SEQ,
		 sizeof(u32), &snd_una);



--

Previous thread: [PATCH net-next-2.6] ipv6: addrconf.h cleanups by Eric Dumazet on Tuesday, September 21, 2010 - 9:57 am. (2 messages)

Next thread: mv643xx pegasos breakage again by pacman on Wednesday, December 31, 1969 - 5:00 pm. (1 message)