Choppy TCP send performance

Previous thread: Email Admin Warning Alert !!! by Email Administrator on Friday, May 28, 2010 - 12:20 pm. (1 message)

Next thread: [PATCH] bnx2: Fix IRQ failures during kdump. by Michael Chan on Friday, May 28, 2010 - 8:24 pm. (9 messages)
From: Ivan Novick
Date: Friday, May 28, 2010 - 1:38 pm

Hello,

I am using RHEL5 and have 1 Gigabit NIC cards.

When doing a loop sending 128 KB blocks of data using TCP.  I am using
system tap to debug the performance and finding that:

90% of the send calls take about 100 micro seconds and 10% of the send
calls take about 10 miliseconds.  The average send time is about 1
milisecond

The 10% of the calls taking about 10 milliseconds seem to be
correlated with "sk_stream_wait_memory" calls in the kernel.

sk_stream_wait_memory seems to be called when the send buffer is full
and the next send call does not complete until the send buffer
utilization goes down from 4,194,304 bytes to 2,814,968 bytes.

This implies that the send that blocks on a full send buffer will not
complete until there is 1 meg of free space in the send buffer even
though the send could be accepted into the OS with only 128KB of free
space.

Do you think I am misinterpreting this data or is there a way to even
out the send calls so that they they are more even in duration: approx
1 milisecond per call.  Is there a parameter to reduce how much space
needs to be free in the send buffer before a blocking send call can
complete from user space?

Cheers,
Ivan
--

From: Eric Dumazet
Date: Friday, May 28, 2010 - 2:16 pm

static void sock_def_write_space(struct sock *sk)
{
...
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
...


Quick answer is : No, this is not tunable ( independantly than SNDBUF )

SO_SNDLOWAT is not implemented on linux, yet (its value is : 1).


Why would you want to wakeup your thread more than necessary ?



--

From: Ivan Novick
Date: Friday, May 28, 2010 - 2:35 pm

Cool.  This helps me understand what is happening.

My user thread wants to wake up as soon as the OS can accept my data
so that it can continue doing work and interact with other components
in the system.  This is an application issue, i can work around it now
that i have a better understanding of what the kernel is doing.

Cheers,
Ivan
--

From: Eric Dumazet
Date: Friday, May 28, 2010 - 3:00 pm

If you use poll() or select() before issuing your write(), I believe it
should be OK.



--

From: Ivan Novick
Date: Friday, May 28, 2010 - 3:23 pm

From my tests select will not return until the same threshold is met
of free space: if ((atomic_read(&sk->sk_wmem_alloc) << 1) <=
sk->sk_sndbuf

I got that from systemtap output

Cheers,
Ivan
--

From: Rick Jones
Date: Friday, May 28, 2010 - 3:08 pm

Unless you think your application will run over 10G, or over a WAN, you 
shouldn't need anywhere near the size of socket buffer you are getting via 
autotuning to be able to achieve "link-rate" - link rate with a 1GbE LAN 
connection can be achieved quite easily with a 256KB socket buffer.

The first test here is with autotuning going - disregard what netperf reports 
for the socket buffer sizes here - it is calling getsockopt() before connect() 
and before the end of the connection():

raj@spec-ptd2:~/netperf2_trunk$ src/netperf -H s9 -v 2 -l 30 -- -m 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s9.cup.hp.com 
(16.89.132.29) port 0 AF_INET : histogram
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  87380  16384 131072    30.01     911.50

Alignment      Offset         Bytes    Bytes       Sends   Bytes    Recvs
Local  Remote  Local  Remote  Xfered   Per                 Per
Send   Recv    Send   Recv             Send (avg)          Recv (avg)
     8       8      0       0 3.42e+09  131074.49     26090   11624.79 294176

Maximum
Segment
Size (bytes)
   1448


Histogram of time spent in send() call.
UNIT_USEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_USEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
HUNDRED_USEC  :    0:    3: 21578:  378:   94:   20:    3:    2:    0:    4
UNIT_MSEC     :    0:    4:    2:    0:    0:  780: 3215:    6:    0:    1
TEN_MSEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
HUNDRED_MSEC  :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_SEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_SEC       :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
 >100_SECS: 0
HIST_TOTAL:      26090


Next, we have netperf make an explicit setsockopt() call for 128KB socket 
buffers, which will get us 256K.  Notice that the ...
From: Ivan Novick
Date: Friday, May 28, 2010 - 3:28 pm

I am not sure i understand your historgram output.  But what i am
getting from your message is that my buffer may be too big.  If i
reduce the buffer like you are saying down to 256K send buffer than
the code that checks if select or send should block:

 if ((atomic_read(&sk->sk_wmem_alloc) << 1) <=
sk->sk_sndbuf

Would only block waiting for space of 128 KB free as compared to 1 Meg
free in my example.

Therefore reducing the max time for send calls (in theory).

Is this what you are getting at?

Cheers,
Ivan
--

From: Rick Jones
Date: Friday, May 28, 2010 - 3:57 pm

For example, 21811 of the send() calls were 1 <= time < 2 milliseconds.  2672 of 

Yes.

As for the select/poll stuff, if you have a thread that wants to get to 
something else, I would suggest marking the socket non-blocking, trying the 
send(), if it completes cool, if not, remember what didn't get sent, do the 
other thing(s) and come back.  If you find you have time to sit and wait, go 
ahead and call select/poll/epoll/whatever.

Or, if you want to make sure you wait in poll/select/whatnot no more than N 
units of time, and that length of time is within the abilities of the call, use 
the timeout parameter present in those.

rick jones
--

Previous thread: Email Admin Warning Alert !!! by Email Administrator on Friday, May 28, 2010 - 12:20 pm. (1 message)

Next thread: [PATCH] bnx2: Fix IRQ failures during kdump. by Michael Chan on Friday, May 28, 2010 - 8:24 pm. (9 messages)