RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Andy Chittenden
Date: Friday, August 6, 2010 - 2:30 am

> On Thu, 2010-08-05 at 15:55 +0100, Andy Chittenden wrote:

Hi Trond

Thanks for replying. It was debugging info: the printk()s show what the problem is. I was half expecting someone to pipe up "that isn't the correct way to fix this" and suggest another avenue to look at, or even, hopefully, come up with an alternative appropriate patch as I'm not an expert in this code: I don't know whether the sk_shutdown field being left set has implications elsewhere in the sunrpc code. FWIW I left the test case running overnight and have had only 50 such messages logged so it's not a heavy printk() load.


If I knew what sk_shutdown == 0 really corresponded to, I could well add a comment! :-). I just knew that in 2.6.26 we didn't see this problem and that in later kernels the connection abort sequence was being done conditionally and that the sk_shutdown flag being left set was making tcp_sendmsg return an error. So, putting two and two together, I've effectively just added another condition in which to abort the connection.

As nobody has objected to the essence of my patch, I'll attempt a new patch that changes those printk()s into dprintk() and drop in what I think are appropriate comments. So here's a revised patch:

# diff -up /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c net/sunrpc/xprtsock.c 
--- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c     2010-08-02 18:30:51.000000000 +0100
+++ net/sunrpc/xprtsock.c       2010-08-06 08:09:08.000000000 +0100
@@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
        if (!(xprt = xprt_from_sock(sk)))
                goto out;
        dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
-       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
+       dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown %d\n",
                        sk->sk_state, xprt_connected(xprt),
                        sock_flag(sk, SOCK_DEAD),
-                       sock_flag(sk, SOCK_ZAPPED));
+                       sock_flag(sk, SOCK_ZAPPED),
+                       sk->sk_shutdown);
 
        switch (sk->sk_state) {
        case TCP_ESTABLISHED:
@@ -1796,10 +1797,25 @@ static void xs_tcp_reuse_connection(stru
 {
        unsigned int state = transport->inet->sk_state;
 
-       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
-               return;
-       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
-               return;
+       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: TCP_CLOSEd and sk_shutdown set to %d\n",
+                       __func__, transport->inet->sk_shutdown);
+       }
+       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: ESTABLISHED/SYN_SENT "
+                               "sk_shutdown set to %d\n",
+                               __func__, transport->inet->sk_shutdown);
+       }
        xs_abort_connection(xprt, transport);
 }

Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>


--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to p ..., Andy Chittenden, (Fri Aug 6, 2:30 am)