Hi,
With both Centos 5.1 (2.6-18.21) & 5.0 NFS clients I am seeing a lock-up during read() calls. from crash I can see that the program is hung at sync_page()
I have 2 sets of program. Program -1 monitors program-2.
Program -1 periodically every 10 sec does NFS statfs. If a response doesn't comeback within 5 sec after 2 retries will kill program-2.
Program -2 does follows
fd = open(foo.bar, O_RDWR,0)
read(fd, buf, 256, 0); <= This locks-up
while (1){
fd-x = open(foo.bar.tmp, O_CREAT|O_RDWR, S_IWXRU)
write(fd-x, buf, 256, 0);
fdatasync(fd-x)
close(fd-x)
rename(foo.bar.tmp, foo.bar)
sleep 5
}
Now when the Program -2 is executing, pull the ethernet cable on the NFS server. Seeing that the Program-1 will not get a response for statfs, it will do a kill -9 of program-2. then do a umount -l and then after a min bring the NFS server up, remount the backend vol & restart the program-2. At this time it will lock-up at sync_page during the one and only read() call.
This is easily reproducible within a couple of tries within an hour or two. With nfs_debug turned on we couldn't reproduce this. My NFS client is 2 x Quad core 2 GHz
X86_64 with 16GB of memory. I have other CPU hoggers on the box which pegs the cpu at 750%. In addition I have couple of rsync sessions copying from one directory to another on the same NFS server. (rsync does lots of rename) My NFS server is a 800MHz laptop this is to simulate slow NFS server conditions.
Do anyone have a suggestion of how to workaround this issue or some surgical patches. Since after ths sync() lock-up subsequent lsof or any filesystem op to that directory
locks up the system. only way to recover is to reboot the box
Thanks in Advance
Nalan
why umount?
why do you umount -l in between? nfs itself should be able to detect the server went online again and happily continue. the you-dont-know-how-many half-dead nfs mounts you produce with umount -l may be a problem here. are there other programs (like your shell) having references to the fs?
which nfs mount flags do you use? in particular do you use soft or hard, intr or nointr, ac or noac? and what are your settings for timeo and retrans?
Mount options are -o
Mount options are -o rw,soft.
Thanks
Nalan