On Mon, Sep 17, 2007 at 11:23:46PM +0100, Nix wrote:Actually, they're nothing to do with malloc failures--the message printed here is misleading, and isn't even an error; it gets printed whenever an upcall to mountd is made. The problem is almost certainly a problem with kernel<->mountd communication--the kernel depends on mountd to answer questions about exported filesystems as part of the fh_verify code. It's just a shot in the dark, but you might try the latest nfs-utils (get the latest out of git://linux-nfs.org/nfs-utils if you're already on the most recent your distro will give you). Or just apply the following--which did fix a problem whose symptoms varied depending on libc behavior. If that doesn't work, I'd try strace -s0 `pidof rpc.mountd` and also look at the contents of /proc/net/rpc/nfsd.fh/contents. --b. commit dd087896285da9e160e13ee9f7d75381b67895e3 Author: J. Bruce Fields <bfields@citi.umich.edu> Date: Thu Jul 26 16:30:46 2007 -0400 Use __fpurge to ensure single-line writes to cache files On a recent Debian/Sid machine, I saw libc retrying stdio writes that returned write errors. The result is that if an export downcall returns an error (which it can in normal operation, since it currently (incorrectly) returns -ENOENT on any negative downcall), then subsequent downcalls will write multiple lines (including the original line that received the error). The result is that the server fails to respond to any rpc call that refers to an unexported mount point (such as a readdir of a directory containing such a mountpoint), so client commands hang. I don't know whether this libc behavior is correct or expected, but it seems safest to add the __fpurge() (suggested by Neil) to ensure data is thrown away. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> diff --git a/support/nfs/cacheio.c b/support/nfs/cacheio.c index a76915b..9d271cd 100644 --- a/support/nfs/cacheio.c +++ b/support/nfs/cacheio.c @@ -17,6 +17,7 @@ #include <nfslib.h> #include <stdio.h> +#include <stdio_ext.h> #include <ctype.h> #include <unistd.h> #include <sys/types.h> @@ -111,7 +112,18 @@ void qword_printint(FILE *f, int num) int qword_eol(FILE *f) { + int err; + fprintf(f,"\n"); + err = fflush(f); + /* + * We must send one line (and one line only) in a single write + * call. In case of a write error, libc may accumulate the + * unwritten data and try to write it again later, resulting in a + * multi-line write. So we must explicitly ask it to throw away + * any such cached data: + */ + __fpurge(f); return fflush(f); } -
| Natalie Protasevich | [BUG] New Kernel Bugs |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Patrick McHardy | [NET_SCHED 00/04]: External SFQ classifiers/flow classifier |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
