Hello, I wonder if the nfs-stuff is considered to be solved, because I still see strange things. During boot my machine sometimes (approx one out of two times) hangs with the output pasted below on Sysctl-l. The irq I'm not 100% sure it's related, but at least it seems to hang in nfs_readdir. (When the serial irq happend that triggered the sysrq the program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.) This is on 2.6.37-rc8 plus some patches for machine support on an ARM machine. Best regards Uwe [ 2700.100000] SysRq : Show State [ 2700.100000] task PC stack pid father [ 2700.100000] init S c0285d80 0 1 0 0x00000000 [ 2700.100000] Backtrace: [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c) [ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0) [ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38) [ 2700.100000] r8:c0034088 r7:00000072 r6:00000001 r5:0000001b r4:0140b228 [ 2700.100000] kthreadd S c0285d80 0 2 0 0x00000000 [ 2700.100000] Backtrace: [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006a30c>] (kthreadd+0x70/0xfc) [ 2700.100000] [<c006a29c>] (kthreadd+0x0/0xfc) from [<c004f4d8>] (do_exit+0x0/0x658) [ 2700.100000] ksoftirqd/0 S c0285d80 0 3 2 0x00000000 [ 2700.100000] Backtrace: [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c0052714>] (run_ksoftirqd+0x5c/0x110) [ 2700.100000] [<c00526b8>] (run_ksoftirqd+0x0/0x110) from [<c006a294>] (kthread+0x8c/0x94) [ 2700.100000] r8:00000000 r7:c00526b8 r6:00000000 r5:c7843f1c r4:c7859fac [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658) [ 2700.100000] r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c [ 2700.100000] kworker/0:0 S c0285d80 0 4 2 0x00000000 [ 2700.100000] Backtrace: [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from ...
Please cc the poor hapless NFS people too, who probably otherwise
wouldn't see it. And Arnd just in case it might be locking-related.
Trond, any ideas? The sysrq thing does imply that it's stuck in some
busy-loop in fs/nfs/dir.c, and line 647 is get_cache_page(), which in
turn implies that the endless loop is either the loop in
readdir_search_pagecache() _or_ in a caller. In particular, the
EBADCOOKIE case in the caller (nfs_readdir) looks suspicious. What
protects us from endless streams of EBADCOOKIE and a successful
uncached_readdir?
Linus
--
There is nothing we can do to protect ourselves against an infinite loop if the server (or underlying filesystem) is breaking the rules w.r.t. cookie generation. It should be possible to recover from all other situations. IOW: if the server generates non-unique cookies, then we're screwed. Fixing that particular problem is impossible since it is basically a variant of the halting problem. That was why I asked which filesystem is being exported in my previous reply. The point of 'uncached_readdir' is to resolve a cookie that was previously valid, but has since been invalidated; usually that is due to the file having been unlinked. If it succeeds, it should result in a new set of valid entries being posted to the 'filldir' callback, and a new cookie being set in the filp->private (i.e. we should have made progress). If it fails, we exit, as you can see. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com --
On Thu, Dec 30, 2010 at 10:24 AM, Trond Myklebust
Umm. Sure there is. Just make sure that you return the uncached entry
to user space, rather than loop forever.
Looping forever in kernel space is a bad idea. How about just changing
the "continue" into a "break" for the "uncached readdir returned
success".
No halting problems, no excuses. There is absolutely _no_ excuse for
an endless loop in kernel mode. Certainly not "the other end is
incompetent".
EVERYBODY is incompetent sometimes. That just means that you must
never trust the other end too much. You can't say "we require the
server to be sane in order not to lock up".
Linus
--
uncached_readdir is not really a problem. The real problem is filesystems that generate "infinite directories" by producing looping combinations of cookies. IOW: I've seen servers that generate cookies in a sequence of a form vaguely resembling 1 2 3 4 5 6 7 8 9 10 11 12 3... (with possibly a thousand or so entries between the first and second copy of '3') The kernel won't loop forever with something like that (because eventually filldir() will declare it is out of buffer space), but userland has a halting problem: it needs to detect that every sys_getdents() call it is making is generating another copy of the We should never get an endless loop in _kernel mode_ with the current Unfortunately we must. Call it an NFS protocol failure, but it really boils down to the fact that POSIX readdir() generates a data stream with no well-defined concept of an offset. As a result, each and every filesystem has their own interesting ways of generating cookies to represent that 'offset'. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com --
On Thu, Dec 30, 2010 at 11:25 AM, Trond Myklebust But if we don't have any lseek's, the readdir cache should trivially take care of this by just incrementing the page_index, and we should return to user space the (eventually ending) sequence, even if there are duplicate numbers. (Also, I suspect that "page_index" should not be a page index, but a position, and then you the "search_for_pos()" should use that instead of the file_pos/current_index thing, but that's a detail that would show up only when you have duplicate cookies within one page worth of directory caches) And if the server really sends us an infinite stream of entries, then that's fine - at least we give to user space the infinite entries that were given to us, instead of _generating_ an infinite stream from what was a finite - but broken - stream). So it seems wrong that the directory caching code resets page_index to the start when it then does an uncached readdir. That seems wrong. I'm sure there's some reason for it, but wouldn't it be nice if the rule for page_index was that it starts off at zero, and only gets Ok, so that' obviously broken, but it's then _doubly_ broken to turn that long broken sequence into an _endless_ broken sequence. And I agree that when user space sees such an endless broken sequence, it's a real stopping problem for user space. But in the absense of lseek, it should _never_ be a problem for the kernel itself, afaik. The kernel should happily return just the broken sequence. No? So then perhaps the solution is to just remove the resetting of page_index in the uncached_readdir() function? Make sure that the page_index is monotonically increasing for any readdir(), and you protect against turning a bad sequence into an endless sequence. Of course, lseek() will have to reset page_index to zero, and if somebody does an lseek() on the directory, then the duplicate '3" entry in the cookie sequence will inevitably be ambiguous, but that really is unavoidable. And rare. People ...
Ccing linux-nfs@vger.kernel.org What filesystem are you exporting on the server? What is the NFS version? Is this nfsroot, autofs or an ordinary nfs mount? In short, how can we reproduce this? Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com --
Hi Trond, Yeah, good idea. I had that ~2min after sending my report during This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a directory on a different partition. I don't know the filesystem of my homedir as it resides on a server I have no access to, but I asked the admin, so I can follow up with this info later (I'd suspect ext3, too). The real root directory is on ext3 (rw,noatime). The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1. nfs-related kernel parameters are ip=dhcp root=/dev/nfs nfsroot=192.168.23.2:/home/ukl/nfsroot/tx28,v3,tcp I hope this answers your questions. If not, please ask. I tried without the symlink and saw some different errors, e.g. starting splashutils daemon.../etc/rc.d/S00splashutils: line 50: //sbin/fbsplashd.static: Unknown error 521 (this is the init script that hung before) and [ 6.160000] NFS: server 192.168.23.2 error: fileid changed [ 6.160000] fsid 0:c: expected fileid 0x33590a4, got 0x4d11bedc but no hang as before. So maybe it's related to the symlink? I don't know if testing that further would help or just waste of my time, so please let me know if I can help you and how. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | --
If that matters, kernel is linux-image-2.6.32-5-amd64 (2.6.32-29) This still applies Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | --
I'm having trouble reproducing this with my own nfsroot setup (which is just a 'fedora 13 live' disk with NetworkManager turned firmly off). However looking back at your report, you said that when you remove the symlink, you get an error message of the form: "starting splashutils daemon.../etc/rc.d/S00splashutils: line 50: //sbin/fbsplashd.static: Unknown error 521" Error 521 is EBADHANDLE, which basically means your client got a corrupted filehandle. The 'fileid changed' thing also indicates some form of corruption. The question is whether this is something happening on the server or the client. Does an older client kernel boot without any trouble? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com --
