Hello, We have an application that uses pthreads and (blocking) sockets. When the application runs with one single thread in separate processes (using fork()) we don't get any problem. However when it's multithreaded, we sometimes get stuck while poll()ing a socket (with events set to POLLIN). Even after the other side of the connection has closed its side of the connection, we are still stuck here. Adding a timeout only makes the poll() exit with 0, so we loop. In case we don't loop the next operation is a recv() which will block as well (which is consistent). It seems like nothing is longer received on the socket but it's difficult to verify with tcpdump since our server outputs something like 15MB at peek time with 150 hits per seconds. We have Shorewall installed and enabled, but what seems strange is that the problem depends on multithreading. It also occurs much more often on the 4 core machines than on a 2 core ones (both with Hyperthreading activated). We're using kernel 2.6.20-15-server (#2 SMP) provided by Ubuntu. Any tip on we could fix that or investigate further would be appreciated. After one month of debugging we're really out of solution now. Best, Nicolas --
Your usage pattern is a very common one, I highly doubt you are experiencing a kernel bug here or many people (including myself) would be complaining. Shorewall sounds like it might be suspect, are FIN's not coming in when the remote closes? You can look in the output of netstat to see what state the TCP is in, still ESTABLISHED? Have you tried just disabling the firewall to see if the problem disappears? Regards, Vito Caputo --
Yes, it's still ESTABLISHED, but we can't see the corresponding connection on the other machine while running netstat. I'm not a TCP expert, so I'm not sure in which case this can occur. I agree with your comment in general, except that we have been running the same application in single-thread environment for years without running into this very specific problem. The only logs we get in the dmesg are the following : either (a few everyday) : [10742708.006350] TCP: Treason uncloaked! Peer 213.209.177.218:32924/80 shrinks window 4049064122:4049064123. Repaired. Or (more often) : [10755036.856217] Shorewall:net2all:DROP:IN=eth0 OUT= MAC=00:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:00 SRC=60.238.83.204 DST=XX.XX.XX.43 LEN=404 TOS=0x00 PREC=0x00 TTL=114 ID=12366 PROTO=UDP SPT=1057 DPT=1434 LEN=384 Both SRC/DST IPs does not correspond to the connections that are stalled, since they occur on the local network. Best, Nicolas --
If the end that's blocking still has the TCP in ESTABLISHED state, and the other end doesnt have the TCP at all... you've already identified why the one end is still ESTABLISHED. ESTABLISHED state won't be left until the FIN is received from the other end, then entering CLOSE_WAIT state. When the other end of the TCP is _gone_ that leads me to believe a FIN will not be coming, hence the indefinite ESTABLISHED state. Why it's gone is a different question, maybe your problem is at the other end? The end initiating a shutdown has to enter FIN_WAIT_1 then FIN_WAIT_2, these transitions require the other side to leave ESTABLISHED (receive a Perhaps when you run in multicore/threaded you are stressing the network stacks at both ends more, including everything in-between? The threading vs. single process relationship is probably not causal, but just coincidental. What is the protocol? Are there any timeouts to take care of these situations? Do you schedule an alarm or use SO_RCVTIMEO to shutdown dead connections and free up consumed threads? TCP being reliable can block indefinitely, you can employ TCP keepalive to change indefinite to quite a long time. Regards, Vito Caputo --
Not sure why this should happen, since it's the same servers. What only change is part of the software that we are using to handle our server requests. It's either embedded in Apache 1.3 with fork() or a standalone multithread server which acts as Apache backend. So the only difference for networking is that we have additional Apache<->MT-Server communications, but they should be on 127.0.0.1 so I The protocol is MySQL. Since we had the problem with libmysqlclient, we reimplemented it again from scratch to make sure that it was not software-related. What happens at the protocol-level is the following : a) we connect to the server b) we make several requests and get answers back c) at some (random+rare) point - always after making a request - we're stuck while waiting for the answer. Sadly, this can happen inside a transaction while we hold the lock on some shared resource. This will lock the whole website until we run out of File Descriptor due to accept'ed pending connections. In that case we get an exception and the server (the multithread one, not MySQL) restarts, which release the lock. In some other cases when we don't hold a lock, the thread remains blocked in poll() as I described it. After a timeout (I think it's 28800 seconds) the MySQL server closes the connection. The client - which is waiting in poll() - does not have any timeout activated (it's relying on the mysql server). But it doesn't notice that the socket has been closed either. We investigated a lot about signals since poll() can also be interrupted by Garbage Collector and child process signals, but we correctly handle EINTR everywhere it's needed. So unless there's a possibility that interrupting poll() with a signal might somehow consume the data, this Sure. We could also use a client timeout, but we don't want to hold the lock more than required, and we can't make the difference between a given request that would take too much time to complete and a lost ...
Ok, funny thing is that we just found what is occurring... We had a process that was on a regular basis doing the following : conntrack -F This was done in order to prevent the table to grow too big, because we were reaching the maximum size as told by : /proc/sys/net/ipv4/netfilter/ip_conntrack_max and /proc/sys/net/ipv4/netfilter/ip_conntrack_count Seems like when there are active connections, this will break netfilter and stop delivering packets to the socket. At least I will have nice sleep tonight. Best, Nicolas --
Note that this solved your symptom, not your problem. You actually have two problems: 1) You rely on TCP to detect a lost connection even by a side that will never transmit any data. TCP simply does not do this. If you are not trying to send data, you are not assured that a lost connection will be detected. (You either need a timeout, or you need to send or dribble some data, depending on the protocl.) 2) You hold a lock on a shared resource while you wait for a reply over a network. If this is a low-level "block and wait indefinitely" lock, this will cause many threads to line up behind a slow/stuck thread. The right fix depends on your circumstances, but you need to use a synchronization primitive that is suitable. (You need to be able to use multiple connections or defer operations without holding a thread.) With both of these bugs, you are vulnerable to precisely the scenario you observed. The TCP connection close packets were lost (in this case due to premature expiration of the connnection tracking, but other things can do it, such as the server rebooting), TCP could not detect the lost connection because you never sent any data, so one thread blocked forever, and other threads got in line behind it. DS --
I agree with both points, but I can't modify the MySQL protocol to implement that. For (1) I can't add the timeout since I have no way to differentiate between a lost connection and a request that takes time to execute. I'll maybe check if the protocol allow pings while waiting for the request result, but I'm not sure it does. For (2) the shared resources is on the database side, not on the server side. It's the transaction that have some rows locked. I have no solution for that. Best, Nicolas --
Sure you can. For example, you can run a proxy on both the server and the client, with the two proxies speaking a protocol that carries the MySQL protocol. The protocol between the server and the client can include two types of messages, one being regular data (which the proxies pass to the server and client software) and one being a ping (which the proxies use internally to decide when to drop their connections). Each proxy can 'ping' the other as often as required and drop both connections if the ping fails to go through. This will ensure that your program detects a connection loss rapidly. That doesn't fit your problem description. Presumably the server detected the loss of the connection and so would have released any resources it was holding that were associated with it. The problem in this case was that the Good luck. DS --
Not only you can, but you *must*. Any service assuming infinite timeout is deemed to fail. If you know that one request can take as long as one minute for instance, then use a 2 minutes timeout. The day all requests will be automatically cleaned up because of a failed firewall between client and server, you'll be happy not to have to come there and restart the service to flush them. There's a huge difference between using a very large timeout and none at all. Willy --
