Re: NFS regression? Odd delays and lockups accessing an NFS export.

Previous thread: scanner interface proposal was: [TALPA] Intro to a linux interface for on access scanning (fwd) by david on Sunday, August 17, 2008 - 6:25 pm. (10 messages)

Next thread: [PATH -mm] Fix a race condtion of oops_in_progress by Huang Ying on Sunday, August 17, 2008 - 7:03 pm. (6 messages)
From: Grant Coady
Date: Sunday, August 17, 2008 - 7:02 pm

Hi there,

I've been using NFS here for years, lately there's something odd going on 
since about a month or so.  Previously reported last month:
http://www.gossamer-threads.com/lists/linux/kernel/951419?page=last

Now with 2.6.27-rc3 on one of the client boxes I get a complete stall 
at odd times when accessing the server's exported directory, cannot 
see a pattern to it.  Eventually recovers after a Ctrl-C.  Nothing in 
the server or client log files.  Not easy to reproduce either.

The server runs 2.6.26.2 at the moment.

Server config, etc: http://bugsplatter.id.au/kernel/boxen/deltree/
Client config, etc: http://bugsplatter.id.au/kernel/boxen/pooh/

Grant.
--

From: Athanasius
Date: Monday, August 18, 2008 - 11:50 am

--uc35eWnScqDcQrv5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


  I wonder if this is what I've been seeing.  I've been otherwise too
busy to properly report it, thinking that *someone* else must also be
seeing it and it's being worked on, else it's a subtle configuration
problem my end.

  I first started seeing this with 2.6.26 on the client end, 2.6.25.10 on
the server end.  Things last worked fine with that same server version
and 2.6.25.10 on the client end.  The affected export is:

/home/users     192.168.1.161(rw,no_root_squash,sync,no_subtree_check)

using the in-kernel NFS server, not the userspace one.

With the mount on the client being:

192.168.1.162:/home/users       /home/users nfs defaults,rw,nfsvers=3D3,rsi=
ze=3D8192,wsize=3D8192,nosuid,nodev,soft,intr 0 0

It used to have a 'nolock' in it, but I took that out and saw no
difference.  The one possible solid clue as to what's happening is this,
after I turned on all the lock etc checking in a 2.6.26.2 kernel:

Aug 12 16:10:28 emelia kernel: [  361.851316] INFO: task firefox-bin:5716 b=
locked for more than 120 seconds.
Aug 12 16:10:28 emelia kernel: [  361.851326] "echo 0 > /proc/sys/kernel/hu=
ng_task_timeout_secs" disables this message.
Aug 12 16:10:28 emelia kernel: [  361.851332] firefox-bin   D f7e0a018  555=
6  5716   5696
Aug 12 16:10:28 emelia kernel: [  361.851348]        e24f3ddc 00000046 c2cc=
2e40 f7e0a018 c0640030 c0642e40 c0642e40 c0642e40=20
Aug 12 16:10:28 emelia kernel: [  361.851382]        e2ec1018 e2ec1270 c2cc=
2e40 00000001 c2cc2e40 e24f3db8 00000046 c2c0b680=20
Aug 12 16:10:28 emelia kernel: [  361.851406]        e2e40da0 c2c0b680 e2ec=
1270 fffef1e4 00000001 00000046 c2c0b670 c2c0b670=20
Aug 12 16:10:28 emelia kernel: [  361.851435] Call Trace:
Aug 12 16:10:28 emelia kernel: [  361.851449]  [<c01e9d71>] nfs_wait_bit_ki=
llable+0x2a/0x2e
Aug 12 16:10:28 emelia kernel: [  361.851462]  [<c041ee67>] ...
From: Trond Myklebust
Date: Monday, August 18, 2008 - 12:19 pm

Your lockdep trace basically shows that the rpc layer is blocking for
some reason. Could you please try to reproduce the problem, and then do

	echo 0 >/proc/sys/sunrpc/rpc_debug

...and see what the output from 'dmesg' shows?

Cheers
  Trond

--

From: J. K. Cliburn
Date: Monday, August 18, 2008 - 12:37 pm

I see you're running the atl1 driver on your client.  I'm chasing an 
intermittent bug in that driver that seems to be affected by certain 
offload parameters.  Try disabling TSO and see if things improve.

ethtool -K eth0 tso off

Let me know if it helps.

Jay
--

From: Athanasius
Date: Monday, August 18, 2008 - 4:13 pm

Indeed it does, and indeed tso was on by default.  I booted up once
after first trying this, thinking I had things set to apply it
automatically, but hadn't and ran into the exact same problem.  This 2nd
time around with the setting definitely applied things appear to be
working normally.  At least I can actually start Firefox up and have it
work (lots of /home/users/... access over NFS to the server).

thanks,

-Ath
--=20
- Athanasius =3D Athanasius(at)miggy.org / http://www.miggy.org/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME
From: Frank Filz
Date: Tuesday, May 12, 2009 - 1:27 pm

Was this ever resolved? We're seeing a similar hangup in a 2.6.27
kernel.

Thanks

Frank Filz


--

From: Jay Cliburn
Date: Tuesday, May 12, 2009 - 5:05 pm

On Tue, 12 May 2009 13:27:58 -0700

The TSO problem in the atl1 driver was never resolved, and TSO was turned off by default with commit 82c26a9d117f0178b8c1b33429014b6d99c470f6 in the 2.6.27 cycle.  It was also added to the -stable tree at 2.6.26.4.

http://lkml.org/lkml/2008/9/3/251

Jay
--

From: Trond Myklebust
Date: Monday, August 18, 2008 - 12:20 pm

Please try to reproduce the hang, then do

	echo 0 >/proc/sys/sunrpc/rpc_debug

and send the output from 'dmesg'...

Cheers
  Trond

--

From: Grant Coady
Date: Tuesday, August 19, 2008 - 6:10 pm

Hi Trond,


dmesg and logs from client machine showed nothing, dmesg from server:

-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--

(nothing, nada, zilch after what looks like the header line)

Situation, first use of 'df' after reboot, client machine hung for many 
seconds before the /home/common export showed up, this is not repeatable. 
Mounting another export and doing 'df' again had no delays.

Client is running 2.6.27-rc3-git5 (x86_64) at the moment, configs, etc 
at: http://bugsplatter.id.au/kernel/boxen/pooh64/

Server as above.  Am I supposed to have some debugging turned on in the 
.configs?

Grant.  
--

From: Grant Coady
Date: Wednesday, August 20, 2008 - 4:17 pm

It's not NFS, I just had WinXP box stall completely (no KB NumLock 
response) on a PuTTY (ssh) terminal to a linux box running 2.6.26.3 
for about five or ten seconds, when it came back to operation the 
PuTYY toolbar icon was flashing.  Dunno which terminal as I had four 
open at the time, but they all to a server box running 2.6.26.3 with 
multiple Intel pro/100 (e100) NICs.

I think this may be to do with the e100 network driver instead, as a 
few weeks ago I started compiling it in rather than as a module like 
I've been doing for years (in response to the firmware (microcode) 
being dropped on the floor last month issue).  This box is the server 
(deltree) that all linux boxes mount a /home/common NFS export from.

All machines have the Intel pro/100 NIC, going to recompile the lot 
back to modules, see if that fixes the problem.

Grant.
--

From: Ian Campbell
Date: Friday, August 22, 2008 - 3:23 am

I've also been seeing some NFS related lockups, although I'm not sure if
they are the same as the one in this thread or not. Client is 2.6.26
(Debian's kernel) and Server is 2.6.25 (also a Debian kernel, but from
backports.org).

rpc_debug on the server gives nothing, on the client gives:
[144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
[144741.637997]  3439 0004 0080    -11 f3f48200 100003 f7770000        0 xprt_sending fa0ae88e fa0bddf4
[144741.637997]  3438 0001 00a0      0 f77f2a00 100003 f77700d0    15000 xprt_pending fa0ae88e fa0bddf4

There are no processes running with pid 3439 3438 (I don't think it's
that sort of pid though).

mounts points are:
hopkins:/storage/music /storage/music nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
hopkins:/storage/mythtv/recordings /var/lib/mythtv/recordings nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
hopkins:/var/lib/mythvideo /var/lib/mythvideo nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
hopkins:/storage/home/ijc /home/ijc nfs rw,vers=3,rsize=131072,wsize=131072,namlen=255,hard,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
and all seem to be effected.

It hasn't happened this time (yet) but usually I get a hung task
backtrace like this:
Aug  4 06:27:28 iranon kernel: [137969.382277] INFO: task mythbackend:3161 blocked for more than 120 seconds.
Aug  4 06:27:28 iranon kernel: [137969.382287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  4 06:27:28 iranon kernel: [137969.382291] mythbackend   D 00005dfc     0  3161      1
Aug  4 06:27:28 iranon kernel: ...
From: Trond Myklebust
Date: Friday, August 22, 2008 - 11:08 am

That's probably also a networking device driver issue candidate: your
RPC task is queued up waiting to be sent.


The 'pid' is an internal RPC cookie that just serves to identify and
track specific RPC requests.

Cheers
  Trond

--

From: Ian Campbell
Date: Friday, August 22, 2008 - 11:13 am

# ethtool -i eth0
driver: e1000
version: 7.3.20-k2-NAPI
firmware-version: N/A
bus-info: 0000:01:0a.0

Adding CC's to peeps listed in MAINTAINERS

I have to reboot the system now (or else there will be no TV to watch

Right, thanks.

Ian.

--=20
Ian Campbell

Colors may fade.
From: John Ronciak
Date: Friday, August 22, 2008 - 12:33 pm

There is nothing indicating that the NIC/driver is causing any sort of
problem here, at least not with what has been presented so far.  When
the NFS mount isn't working is the networking still active and
working?

-- 
Cheers,
John
--

From: Ian Campbell
Date: Friday, August 22, 2008 - 1:00 pm

So far as I can tell, yes. I can login via ssh so long as the user
doesn't have NFS $HOME, I haven't tried much else and the box isn't
locked up at the moment, I'd bet it's fine though.

Ian.

--=20
Ian Campbell

"On a normal ascii line, the only safe condition to detect is a 'BREAK'
- everything else having been assigned functions by Gnu EMACS."
(By Tarl Neustaedter)
From: John Ronciak
Date: Friday, August 22, 2008 - 2:15 pm

Then it's highly unlikely that the NIC/driver is causing your problem.
 We'll continue to monitor this thread waiting for any info that this
is somehow NIC/driver related.

Good luck.



-- 
Cheers,
John
--

From: Trond Myklebust
Date: Friday, August 22, 2008 - 2:23 pm

...and the server? Something is preventing that RPC payload from being
delivered...

Trond

--

From: Ian Campbell
Date: Friday, August 22, 2008 - 2:37 pm

I can ssh to the server fine. The same server also serves my NFS home
directory to the box I'm writing this from and I've not seen any trouble
with this box at all, it's a 2.6.18-xen box.

If I downgrade the problematic box to 2.6.24 then the hangs do not
occur. They do occur with 2.6.25. (sorry that's a critical bit of
information which I stupidly forgot to mention up til now).

Ian.

--=20
Ian Campbell

It is much easier to be critical than to be correct.
		-- Benjamin Disraeli
From: Trond Myklebust
Date: Friday, August 22, 2008 - 2:56 pm

OK... Are you able to reproduce the problem reliably?

If so, can you provide me with a binary tcpdump or wireshark dump? If
using tcpdump, then please use something like

   tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049

Please also try to provide a netstat dump of the current TCP connections
as soon as the hang occurs:

  netstat -t

Cheers
  Trond

--

From: Ian Campbell
Date: Friday, August 22, 2008 - 3:41 pm

It usually happens in around a day, but I can't make it happen at will
so that I can arrange to be present at the time. It has usually locked

I'll try leaving this going overnight but using -C and -W to limit the

Will do it ASAP after it happens.

Ian.
--=20
Ian Campbell

revision 1.17.2.7
date: 2001/05/31 21:32:44;  author: branden;  state: Exp;  lines: +1 -1
ARRRRGH!!  GOT THE G** D*** SENSE OF A F******* TEST BACKWARDS!
From: Ian Campbell
Date: Sunday, August 24, 2008 - 11:53 am

--=-6Pq3VpgHjmuNrH70TChL
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable


Aug 24 18:08:59 iranon kernel: [168839.556017] nfs: server hopkins not resp=
onding, still trying
but I wasn't around until 19:38 to spot it.

netstat when I got to it was:

Proto Recv-Q Send-Q Local Address           Foreign Address         State  =
   =20
tcp        0      0 localhost.localdo:50891 localhost.localdom:6543 ESTABLI=
SHED
tcp        1      0 iranon.hellion.org.:ssh azathoth.hellion.:52682 CLOSE_W=
AIT=20
tcp        0      0 localhost.localdom:6543 localhost.localdo:50893 ESTABLI=
SHED
tcp        0      0 iranon.hellion.org.:837 hopkins.hellion.org:nfs FIN_WAI=
T2 =20
tcp        0      0 localhost.localdom:6543 localhost.localdo:41831 ESTABLI=
SHED
tcp        0      0 localhost.localdo:13666 localhost.localdo:59482 ESTABLI=
SHED
tcp        0      0 localhost.localdo:34288 localhost.localdom:6545 ESTABLI=
SHED
tcp        0      0 iranon.hellion.org.:ssh azathoth.hellion.:48977 ESTABLI=
SHED
tcp        0      0 iranon.hellion.org.:ssh azathoth.hellion.:52683 ESTABLI=
SHED
tcp        0      0 localhost.localdom:6545 localhost.localdo:34288 ESTABLI=
SHED
tcp        0      0 localhost.localdom:6543 localhost.localdo:50891 ESTABLI=
SHED
tcp        0      0 localhost.localdo:50893 localhost.localdom:6543 ESTABLI=
SHED
tcp        0      0 localhost.localdo:41831 localhost.localdom:6543 ESTABLI=
SHED
tcp        0     87 localhost.localdo:59482 localhost.localdo:13666 ESTABLI=
SHED
tcp        1      0 localhost.localdom:6543 localhost.localdo:41830 CLOSE_W=
AIT=20

(iranon is the problematic host .4, azathoth is my desktop machine .5, hopk=
ins is the NFS server .6)

tcpdumps are pretty big. I've attached the last 100 packets captured. If
you need more I can put the full file up somewhere.

-rw-r--r-- 1 root   root   1.3G Aug 24 17:57 dump.out0
-rw-r--r-- 1 root   root   536M Aug 24 19:38 dump.out1

Ian.

--=20
Ian Campbell

Prizes are for ...
From: Trond Myklebust
Date: Sunday, August 24, 2008 - 12:17 pm

socket, when the client closes its side. You therefore end up getting
stuck in the FIN_WAIT2 state (as netstat clearly shows above).

Is the server keeping the client in this state for a very long period?

Cheers
  Trond

--

From: Trond Myklebust
Date: Sunday, August 24, 2008 - 12:19 pm

BTW: the RPC client is closing the socket because it detected no NFS
activity for 5 minutes. Did you expect any NFS activity during this
time?

Trond

--

From: Ian Campbell
Date: Sunday, August 24, 2008 - 3:09 pm

(added some quoting from previous mail to save replying twice)


Well, it had been around an hour and a half on this occasion. Next time
it happens I can wait longer but I'm pretty sure I've come back from
time away and it's been wedged for at least a day. How long would you

It's a mythtv box so at times where no one is watching anything and
there isn't anything to record I expect NFS activity is pretty minimal.

Ian.

--=20
Ian Campbell

Can anyone remember when the times were not hard, and money not scarce?
From: Trond Myklebust
Date: Sunday, August 24, 2008 - 3:15 pm

The server should ideally start to close the socket as soon as it
receives the FIN from the client. I'll have a look at the code.

Trond

--

From: Ian Campbell
Date: Monday, August 25, 2008 - 2:59 am

The server is a 2.6.25 kernel from Debian backports.org. I was planning
to downgrade to the 2.6.24 "Etch and a half" kernel but was holding off
while this thread was in progress. If you think it would be a useful
data point I can do it sooner rather than later.

It's changing the client kernel which seems to make a difference since
running 2.6.24 against this server has given me no trouble. I guess the
newer client could expose a bug in the server though.

Ian.
--=20
Ian Campbell

I can't understand why a person will take a year or two to write a
novel when he can easily buy one for a few dollars.
		-- Fred Allen
From: Tom Tucker
Date: Monday, August 25, 2008 - 9:04 am

I don't think it should matter how long the connection stays in FIN WAIT,
the client should reconnect anyway.

Since the client seems to be the variable, I would think it might be an
issue with the client reconnect logic?

That said, 2.6.25 is when the server side transport switch logic went in.


--

From: Trond Myklebust
Date: Monday, August 25, 2008 - 9:54 am

Which would break the replay cache, since we can't reconnect using the

No.

Trond

--

From: Tom Tucker
Date: Monday, August 25, 2008 - 1:15 pm

Yeah, you're right. I got confused about which host was stuck in FIN_WAIT2.
It's the client as you pointed out earlier.

Thanks,

--

From: J. Bruce Fields
Date: Tuesday, August 26, 2008 - 12:27 pm

Any chance you could help Trond figure out why the server might be doing
this?

If not, I'll get to it, but not as soon as I should.

--b.
--

From: Tom Tucker
Date: Wednesday, August 27, 2008 - 7:43 am

Sure. I've actually tried to reproduce it here unsuccessfully.

As a starter, I would suggest turning on transport debugging:

# echo 256 > /proc/sys/sunrpc/rpc_debug

Here are my thoughts on how it is supposed to work. Trond if this doesn't match
your understanding, please let me know.

For the case where the client closes the connection first
because it's timeout is shorter (5min vs. 6), the sequence of events should be:


client					server

close
sends FIN goes to FIN-WAIT-1
					receives FIN replies with ACK
					and goes to CLOSE-WAIT
receives ACK goes to FIN-WAIT-2
					calls tcp_state_change callback on socket
					svc_tcp_state_change sets XPT_CLOSE on
					the transport and enqueues transport for
					servicing by svc thread.

					thread gets dequeued, calls svc_recv
					svc_recv sees XPT_CLOSE and calls
					svc_xprt_destroy that closes the socket

					TCP sends FIN/ACK after close
receives FIN/ACK goes to TIME-WAIT

since state is TIME-WAIT and
reuse port is set, we can re-connect

There's a couple places we could be getting stuck:

- The state-change callback never gets called.
- The state-change callback gets called but svsk is null and it ignores the event
- svc_tcp_state_change enqueues the transport, but due to a race bug, the transport
   doesn't actually get queued and since there is no activity it never closes
- something else

The message log with RPCDBG_SVCXPRT (256) enabled as shown above would help figure
out where it's happening.

If Ian is willing to create the log (or already has one), I'm certainly willing to
look at it.


--

From: Ian Campbell
Date: Saturday, August 30, 2008 - 8:47 am

It produced only the following (is that what was expected?):

[146866.448112] -pid- proc flgs status -client- -prog- --rqstp- -timeout -r=
pcwait -action- ---ops--
[146866.448112] 30576 0001 00a0      0 f77a1600 100003 f7903340    15000 xp=
rt_pending fa0ba88e fa0c9df4
[146866.448112] 30577 0004 0080    -11 f77a1600 100003 f7903000        0 xp=
rt_sending fa0ba88e fa0c9df4

Ian.
--=20
Ian Campbell

Life is a concentration camp.  You're stuck here and there's no way
out and you can only rage impotently against your persecutors.
		-- Woody Allen
From: J. Bruce Fields
Date: Sunday, August 31, 2008 - 12:30 pm

It's normal to get something like that when you turn it on, yes (unless
someone else spots anything odd about that...) but what's really needed
is to turn this on and then reproduce the problem--it's the debugging
output that goes to the logs during the problem that'll be interesting.

--b.
--

From: Ian Campbell
Date: Sunday, August 31, 2008 - 12:44 pm

That's what I did. The first time I did the echo I just got the header
line, then I waited for the repro and since there had been no further
logging I ran the echo again and got the three lines above.

Sounds like you expected there to be more and ongoing logging?

--=20
Ian Campbell

kernel, n.:
	A part of an operating system that preserves the medieval
	traditions of sorcery and black art.
From: J. Bruce Fields
Date: Sunday, August 31, 2008 - 12:46 pm

Yes.  It's the server side we're interested in, by the way--are you
collecting this deubgging on the client?

--b.
--

From: Ian Campbell
Date: Sunday, August 31, 2008 - 12:49 pm

Er, yeah, since the client appeared to be the problem.

I'll try again on the server!

Ian.
--=20
Ian Campbell

The descent to Hades is the same from every place.
		-- Anaxagoras
From: Tom Tucker
Date: Sunday, August 31, 2008 - 12:51 pm

Thanks Ian,


--

From: Tom Tucker
Date: Sunday, August 31, 2008 - 12:51 pm

Looks like you ran this on the client. Sorry, Ian, I should have been more 
specific. You need to modify the rpc_debug file on the server.


--

From: Ian Campbell
Date: Sunday, August 31, 2008 - 2:18 pm

I tried this on the server but it's pretty crippling (the server is
quite weedy, 300MHz K3 or something).

I'll leave it logging overnight since things should be pretty quiet (and
that's often when the problem occurs) but if there's a less aggressive
setting than 256 but which would still be useful I could leave it on
permanently until the problem happens.

Ian.
--=20
Ian Campbell

Marriages are made in heaven and consummated on earth.
		-- John Lyly
From: Tom Tucker
Date: Monday, September 1, 2008 - 10:20 am

Thanks Ian. Unfortunately, that's as fine grained as it gets. 256 

--

From: Ian Campbell
Date: Monday, September 1, 2008 - 10:46 am

Doh, of course, I was thinking of 255...

No repro today. I'll let you know when I see something.

Ian.

--=20
Ian Campbell

History is nothing but a collection of fables and useless trifles,
cluttered up with a mass of unnecessary figures and proper names.
		-- Leo Tolstoy
From: Ian Campbell
Date: Wednesday, September 10, 2008 - 1:40 am

Still no repro with the debugging enabled I'm afraid.

I'm wondering if the performance hit makes it unlikely to trigger since
everything slows to a crawl. I haven't been able to leave it on 24/7
since it makes all the NFS clients nearly unusable but I've had it on
overnight and when there's no one around etc, which is when the hang
generally seems to occur.

I forgot to re-enable it one evening and, perhaps inevitably, it crashed
overnight. I turned the debugging on and captured the following. It's
not much since syslog had given up for some reason, probably when the
disk filled up with logs. 192.168.1.5 is a non-problematic client so I
don't think the log will be useful since the problematic client
(192.168.1.4) isn't mentioned..

Ian.

vc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
[4009900.094587] svc: TCP record, 140 bytes
[4009900.098604] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
[4009900.102587] svc: TCP complete record (140 bytes)
[4009900.106592] svc: transport ee5fd600 put into queue
[4009900.110613] svc: got len=140
[4009900.114645] svc: transport ee5fd600 busy, not enqueued
[4009900.118752] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
[4009900.122636] svc: socket c1888960 TCP data ready (svsk ee5fd600)
[4009900.126588] svc: transport ee5fd600 busy, not enqueued
[4009900.130601] svc: transport ee5fd600 busy, not enqueued
[4009900.134593] svc: server eefc4000 waiting for data (to = 900000)
[4009900.162711] svc: transport ee5fd600 dequeued, inuse=1
[4009900.166598] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
[4009900.170594] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
[4009900.174611] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
[4009900.178592] svc: TCP record, 136 bytes
[4009900.182606] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
[4009900.186592] svc: TCP complete record (136 bytes)
[4009900.190599] svc: transport ee5fd600 put into queue
[4009900.194605] svc: got ...
From: J. Bruce Fields
Date: Friday, September 12, 2008 - 3:43 pm

OK, thanks for the attempt.  We should figure out how to tone down those
dprintk's.

I'll try to take a closer look at this, but I'm travelling starting
tomorrow, and realistically may not find time to look at this for a
couple weeks.  If someone else gets to it before me, great....

--

From: Tom Tucker
Date: Friday, September 12, 2008 - 4:15 pm

Iain sadly yes. There's a thread stuck holding the BUSY bit or a thread 
failed to clear the bit properly
(maybe an error path). Data continues to arrives, but the transport 
never gets put back on the queue
because it's BUSY bit is set.  In other words, this is a different error 
than the one we've been chasing.

If I sent you a patch, could you rebuild the kernel and give it a whirl? 
Also, can you give me a
kernel.org relative commit-id or tag for the kernel that you're using?

Thanks,

--

From: Ian Campbell
Date: Saturday, September 13, 2008 - 1:57 am

I sure could. I'm using the Debian kernel which is currently at 2.6.26.3
(pkg version 2.6.26-4) although I have an update to 2.6.26.4 (via pkg
2.6.26-5) pending.

If I'm going to build my own I'll start with current git
(a551b98d5f6fce5897d497abd8bfb262efb33d2a) and repro there before trying
your patch.

--=20
Ian Campbell

No rock so hard but that a little wave
May beat admission in a thousand years.
		-- Tennyson
From: Ian Campbell
Date: Monday, September 15, 2008 - 10:48 pm

(dropping the e1000 guys, it seems unnecessary to keep spamming them
with this issue when it's unlikely to be anything to do with them. I've
left their list on for now so they know. I'd suggest dropping it from
any replies.)


FYI I've repro'd with=20
        commit a551b98d5f6fce5897d497abd8bfb262efb33d2a
        Merge: d1c6d2e... 50bed2e...
        Author: Linus Torvalds <torvalds@linux-foundation.org>
        Date:   Thu Sep 11 11:50:15 2008 -0700
       =20
            Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
           =20
            * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
              sg: disable interrupts inside sg_copy_buffer
which was the latest git a few days back.

I'm going to start bisecting between v2.6.25 and v2.6.26. There's 173
commits in fs/nfs* net/sunrpc in that interval so with a day per test I
should have something next week...

Ian.
--=20
Ian Campbell

Vini, vidi, Linux!
		-- Unknown source
From: Tom Tucker
Date: Tuesday, September 16, 2008 - 4:38 am

Iain:

I'm assuming you'll do this in advance of any patch from me? I was 
simply going to add printk to the various shutdown paths and see if we 
could get some finer grained debug output since the generic transport 
debug output was too verbose.

Let me know and I'll help if I can,

--

From: Ian Campbell
Date: Tuesday, September 16, 2008 - 8:03 am

I'm actually Ian. (I don't actually mind that much, the other way round


You mean logging on the client or server side? I'm currently bisecting
client. I'm happy to add some extra prints there if you point me in the
direction of the code paths of interest.

Ian.
-- 
Ian Campbell
Current Noise: Anathema - Leave No Trace

I haven't lost my mind; I know exactly where I left it.

--

From: Tom Tucker
Date: Tuesday, September 16, 2008 - 8:58 am

From: Ian Campbell
Date: Tuesday, September 16, 2008 - 9:24 am

Thanks. It's running 2.6.25.7 although I can massage a more recent patch
if needed. (I'm a bit less keen to run a variety of random kernels on
that machine).

Ian.

-- 
Ian Campbell
Current Noise: The Ocean - Protoerozoic - Rhyacian - Untimely Meditations

We may not return the affection of those who like us, but we always respect
their good judgement.

--

From: Ian Campbell
Date: Tuesday, September 23, 2008 - 12:59 am

I've found that the problem was backported into the stable stream since
I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
is quite useful since there are only 3 relevant looking changesets in
that range. I will bisect between these before confirming the culprit on
mainline.

I've had a private reply from someone who is seeing the same issue with
2.6.26.5 against a server running 2.6.18 (Debian Etch) which I think
largely discounts the server from being at fault.

Ian.
--=20
Ian Campbell

Everything ends badly.  Otherwise it wouldn't end.
From: Ian Campbell
Date: Tuesday, September 23, 2008 - 4:33 am

It reports:

        daedfbe2a67628a40076a6c75fb945c60f608a2e is first bad commit
        commit daedfbe2a67628a40076a6c75fb945c60f608a2e
        Author: Trond Myklebust <Trond.Myklebust@netapp.com>
        Date:   Wed Jun 11 17:39:04 2008 -0400
       =20
            NFS: Ensure we zap only the access and acl caches when setting =
new acls
           =20
            commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream
           =20
            ...and ensure that we obey the NFS_INO_INVALID_ACL flag when re=
trieving the
            acls.
           =20
            Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
            Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

I'm just about to build f41f741838480aeaa3a189cff6e210503cf9c42d and the
one before and try those.

I'm not using ACLs as far as I am aware.

Ian.

--=20
Ian Campbell

If we were meant to get up early, God would have created us with alarm cloc=
ks.
From: J. Bruce Fields
Date: Tuesday, September 23, 2008 - 10:03 am

Could you double-check that this is reproduceable with this commit
applied, and not reproduceable when it's not?

I suppose it's not impossible that this could be triggering the problem
in some very roundabout way, but it seems a bit out of left field--so I
wonder whether one of the bisection points could have gotten marked good

I think commands like "ls" try to get posix acls these days, so it's
possible that the nfs3_proc_getacl code at least might be getting
called.  Why that would matter I can't see.

--b.
--

From: Ian Campbell
Date: Friday, September 26, 2008 - 8:37 am

I've reproduced with exactly commit
f41f741838480aeaa3a189cff6e210503cf9c42d on trunk and am now running
2e96d2867245668dbdb973729288cf69b9fafa66 which is the changeset

It's possible, the good case is naturally quite hard to establish with
100% certainty. I declared v2.6.26 OK after an uptime of 4 days and 19
hours, compared with failure normally within 1-2 days. It's possible I
was premature in doing so. I'll run 2e96d2867 for at least a full week
before reporting back.

Ian.
--=20
Ian Campbell

Unix is mature OS, windows is still in diapers and they smell badly.
		-- Rafael Skodlar=20
From: Ian Campbell
Date: Friday, September 26, 2008 - 11:17 am

Or it'll reproduce after just a few hours :-/

Looks like a false negative on v2.6.26 -- I'll retest from there.

Ian.

--=20
Ian Campbell

That secret you've been guarding, isn't.
From: J. Bruce Fields
Date: Friday, September 26, 2008 - 8:54 pm

OK, so apologies, but this has been a long thread, and maybe we could
use a summary of the symptoms and the results so far.  I think you said
2.6.24 or .25 was the last you're *positive* was good?

--b.
--

From: Ian Campbell
Date: Saturday, September 27, 2008 - 3:16 am

2.6.24.x was good 2.6.25.y was not. The Debian packages don't include
the stable release numbers in their version so I'm unsure of x and y,
Judging from the changelog entries I think 2.6.24.3 and 2.6.25.10.=20

Since I have been building my own kernels I have seen repros with:

a551b98d5f6fce5897d497abd8bfb262efb33d2a Merge branch 'for-linus' of
git://git.kernel.dk/linux-2.6-block (was tip of git at the time I tested
it a while ago)

c02a79dedc7f3c3d4fdbb5eb2000cacea5df4cde v2.6.26.5

daedfbe2a67628a40076a6c75fb945c60f608a2e NFS: Ensure we zap only the
access and acl caches when setting new acls (backport of
f41f741838480aeaa3a189cff6e210503cf9c42d to 2.6.26.y stream, identified
by bisect, possibly/probably incorrectly)

f41f741838480aeaa3a189cff6e210503cf9c42d NFS: Ensure we zap only the
access and acl caches when setting new acls (original trunk commit of
above)

2e96d2867245668dbdb973729288cf69b9fafa66 NFS: Fix a warning in
nfs4_async_handle_error (commit before
f41f741838480aeaa3a189cff6e210503cf9c42d)

I did not see the problem with bce7f793daec3e65ec5c5705d2457b81fe7b5725
(v2.6.26) in 4 days 19 hours. Due to the apparent mis-bisection leading
to daedfbe2a67628a40076a6c75fb945c60f608a2e this result is in doubt, I'm
currently retesting this kernel.

The bisect was restricted to paths under fs/nfs* and net/sunrpc. I've
gone over the range v2.6.26..daedfbe2a67628a40076a6c75fb945c60f608a2e
and nothing jumps out at me. If my retest of 2.6.26 shows it to be OK
again I'll bisect over the non-NFS changesets in that range.

The symptoms are a hang of any access to an NFS mounted path, although
the rest of the system continues to work.=20

Other potentially interesting info cribbed from earlier postings:

While the hang is occurring rpc_debug on the server gives nothing, on
the client gives:
[144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -r=
pcwait -action- ---ops--=20
[144741.637997]  3439 0004 0080    -11 f3f48200 100003 f7770000    0 ...
From: Roger Heflin
Date: Monday, August 25, 2008 - 2:39 pm

Ian,

Do you have a recording group setup on the NFS partition that mythtv is
going to be accessing?

I have seen similar funny stuff happen, it used to happened around 2.6.22*
(on each end), and quit happening around 2.6.24* and now has started happening
again with 2.6.25* on both ends.

Similar to what you have the only thing I see is "NFS server not responding" and 
  restarting the NFS server end (/etc/init.d/nfs restart) appears to get things to
continue on the NFS client.    No other messages appear on either end that 
indicate that anything is wrong, other non-nfs partitions on the client work 
find, the machine is still up, and the NFS server is still up and fine, and 
after a restart things will again work for a while (hours or days).

                             Roger
--

From: Grant Coady
Date: Monday, August 25, 2008 - 1:23 pm

^^^^^^^^-->  typo?

man tcpdump:
       -s     Snarf  snaplen bytes of data from each packet rather than the default of
              68 (with SunOS's NIT, the minimum is actually 96).  68 bytes is adequate


I've reverted the NFS server to 2.6.24.7 -- inconclusive results for me 
'cos NFS stalls seem so random.

Grant.
--

From: Trond Myklebust
Date: Monday, August 25, 2008 - 3:11 pm

No. The intention was to record _all_ the info in the packet for
analysis, not just random header info.

Trond

--

From: Grant Coady
Date: Monday, August 25, 2008 - 5:29 pm

Hi Trond,

My tcpdump seems to have a 16 bit snaplen counter:

~# tcpdump -w /tmp/dump.out -s 65535 host deltree and port 2049
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C0 packets captured
4 packets received by filter
0 packets dropped by kernel

~# tcpdump -w /tmp/dump.out -s 65536 host deltree and port 2049
tcpdump: invalid snaplen 65536

~# tcpdump --version
tcpdump version 3.9.8
libpcap version 0.9.8


So I'm now using:
~# tcpdump -w /tmp/dump.out -s 65535 -C 10 -W 100 host deltree and port 2049
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

to get a 1GB round-robin trace buffer, I can stop the trace when problem 
noticed, as it is so long between delay/stall happenings.  Then I'll try 
to trigger the thing.

Is this the correct style of trace you are expecting?

~$ /usr/sbin/tcpdump -r /tmp/dump.out00
reading from file /tmp/dump.out00, link-type EN10MB (Ethernet)
10:13:49.719781 IP pooh64.mire.mine.nu.2156510591 > deltree.mire.mine.nu.nfs: 116 access fh 0,1/218104576 001f
10:13:49.720215 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.2156510591: reply ok 124 access c 001f
10:13:49.720225 IP pooh64.mire.mine.nu.984 > deltree.mire.mine.nu.nfsd: . ack 1649405551 win 5840
10:13:49.720288 IP pooh64.mire.mine.nu.2173287807 > deltree.mire.mine.nu.nfs: 136 readdirplus fh 0,1/218104576 512 bytes @ 0
10:13:49.742450 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.2173287807: reply ok 1460 readdirplus


Is there some test suite I can use?  Compiling kernels over NFS worked 
fine yesterday, apart from the fastest box' make complaining about clock 
skew.  The kernel booted though, so that was okay.  

Guess it's back to the interactive editing over NFS and see if the thing 
manifest the delay/stalls again, I'm on the .27-rc4-git4 kernel as soon 
as it compiles for the client, NFS server is 2.6.24.7 at the moment.

Grant.
--

From: Muntz, Daniel
Date: Monday, August 25, 2008 - 5:59 pm

Try '-s 0', from tcpdump(8): "Setting snaplen to 0 means use the
required length to catch whole packets."

  -Dan

-----Original Message-----
From: Grant Coady [mailto:grant_lkml@dodo.com.au] 
Sent: Monday, August 25, 2008 5:29 PM
To: Trond Myklebust
Cc: Grant Coady; Ian Campbell; John Ronciak;
linux-kernel@vger.kernel.org; neilb@suse.de; bfields@fieldses.org;
linux-nfs@vger.kernel.org; Jeff Kirsher; Jesse Brandeburg; Bruce Allan;
PJ Waskiewicz; John Ronciak; e1000-devel@lists.sourceforge.net
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS
export.

On Mon, 25 Aug 2008 18:11:12 -0400, Trond Myklebust

Hi Trond,

My tcpdump seems to have a 16 bit snaplen counter:

~# tcpdump -w /tmp/dump.out -s 65535 host deltree and port 2049
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size
65535 bytes ^C0 packets captured
4 packets received by filter
0 packets dropped by kernel

~# tcpdump -w /tmp/dump.out -s 65536 host deltree and port 2049
tcpdump: invalid snaplen 65536

~# tcpdump --version
tcpdump version 3.9.8
libpcap version 0.9.8


So I'm now using:
~# tcpdump -w /tmp/dump.out -s 65535 -C 10 -W 100 host deltree and port
2049
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size
65535 bytes

to get a 1GB round-robin trace buffer, I can stop the trace when problem
noticed, as it is so long between delay/stall happenings.  Then I'll try
to trigger the thing.

Is this the correct style of trace you are expecting?

~$ /usr/sbin/tcpdump -r /tmp/dump.out00
reading from file /tmp/dump.out00, link-type EN10MB (Ethernet)
10:13:49.719781 IP pooh64.mire.mine.nu.2156510591 >
deltree.mire.mine.nu.nfs: 116 access fh 0,1/218104576 001f
10:13:49.720215 IP deltree.mire.mine.nu.nfs >
pooh64.mire.mine.nu.2156510591: reply ok 124 access c 001f
10:13:49.720225 IP pooh64.mire.mine.nu.984 > deltree.mire.mine.nu.nfsd:
. ack 1649405551 win 5840
10:13:49.720288 IP pooh64.mire.mine.nu.2173287807 >
deltree.mire.mine.nu.nfs: 136 ...
From: Grant Coady
Date: Monday, August 25, 2008 - 6:06 pm

I booted 2.6.27-rc4-git4a and started the tcpdump trace as described in 
last email, got an immediate stall which cleared after 10-20 seconds, on 
running a small script exported from the NFS server as user:

~$ cat /home/common/bin/boot-report
#! /bin/bash
#
version=$(uname -r)

dmesg >/home/common/dmesg-$version

# 2006-12-15 no more filter .config
#grep = /boot/config-$version >/home/common/config-$version
cp /boot/config-$version /home/common


The script had not created the file on the server when the command prompt 
returned, during the stall I also opened another client terminal and ran 
'df', it stalled too, and both client terminals came back to prompt after 
the 10-20s delay.  Then I hit Ctrl-C on the tcpdump terminal to prepare 
this report.

The trace is only 27 lines, deltree is the server, pooh64 the client:
grant@pooh64:~$ /usr/sbin/tcpdump -r /tmp/dump.out00
reading from file /tmp/dump.out00, link-type EN10MB (Ethernet)
10:42:14.061410 IP pooh64.mire.mine.nu.0 > deltree.mire.mine.nu.nfs: 0 null
10:42:14.062691 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: S 2746115414:2746115414(0) ack 3659136038 win 5840 <mss 1460,nop,nop,sackOK>
10:42:14.062708 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 1 win 5840
10:42:14.062750 IP pooh64.mire.mine.nu.1392480046 > deltree.mire.mine.nu.nfs: 112 getattr fh 0,1/218104576
10:42:14.062765 IP pooh64.mire.mine.nu.1409257262 > deltree.mire.mine.nu.nfs: 112 fsstat fh 0,1/218104576
10:42:14.062944 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: . ack 113 win 5840
10:42:14.062951 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: . ack 225 win 5840
10:42:14.063873 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1392480046: reply ok 116 getattr DIR 40755 ids 1000/10 sz 448
10:42:14.063881 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 117 win 5840
10:42:14.063890 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1409257262: reply ok 88 fsstat tbytes 8422359040 fbytes ...
Previous thread: scanner interface proposal was: [TALPA] Intro to a linux interface for on access scanning (fwd) by david on Sunday, August 17, 2008 - 6:25 pm. (10 messages)

Next thread: [PATH -mm] Fix a race condtion of oops_in_progress by Huang Ying on Sunday, August 17, 2008 - 7:03 pm. (6 messages)