Re: NFS (& amd?) dysfunction descending a hierarchy

Previous thread: Why does host and dig show a ; as \;? by Larry Rosenman on Tuesday, December 9, 2008 - 9:36 am. (7 messages)

Next thread: Laptop w/ dark screen & no (working) keyboard by David Wolfskill on Tuesday, December 9, 2008 - 4:01 pm. (8 messages)
From: David Wolfskill
Date: Tuesday, December 9, 2008 - 12:01 pm

I was able to reproduce the external symptoms of the failure running
CURRENT as of yesterday, using "rm -fr" of a copy of a recent
/usr/ports hierachy on an NFS-mounted file system as a test case.
However, I believe the mechanism may be a bit different -- while
still being other than what I would expect.

One aspect in which the externally-observable symptoms were different
(under CURRENT, vs. RELENG_7) is that under CURRENT, once the error
condition occurred, the NFS client machine was in a state where it
merely kept repeating

	nfs server pid848@fbsd-build:/volume: not responding

until I logged in as root & rebooted it.


Here's a cut/paste of the kdump from the ktrace of the amd(8) process
under CURRENT, showing where the master amd(8) process (pid 848)
forks a child (4126) to try the unmount:

   848 amd      1228846258.722953 CALL  gettimeofday(0x8078e48,0)
   848 amd      1228846258.722964 RET   gettimeofday 0
   848 amd      1228846258.722982 CALL  sigprocmask(SIG_BLOCK,0xbfbfeaec,0x=
bfbfeadc)
   848 amd      1228846258.722993 RET   sigprocmask 0
   848 amd      1228846258.723003 CALL  fork
   848 amd      1228846258.730250 RET   fork 4126/0x101e
   848 amd      1228846258.730405 CALL  sigprocmask(SIG_SETMASK,0xbfbfeadc,=
0)
  4126 amd      1228846258.730252 RET   fork 0
  4126 amd      1228846258.730456 CALL  getpid
  4126 amd      1228846258.730467 RET   getpid 4126/0x101e
  4126 amd      1228846258.730493 CALL  unmount(0x2825f340,<invalid>0)
   848 amd      1228846258.730422 RET   sigprocmask 0
   848 amd      1228846258.730595 CALL  gettimeofday(0x8078e48,0)
   848 amd      1228846258.730608 RET   gettimeofday 0
=2E..
   848 amd      1228846258.914814 CALL  sigprocmask(SIG_SETMASK,0xbfbfeba0,=
0)
   848 amd      1228846258.914826 RET   sigprocmask 0
   848 amd      1228846258.914838 CALL  select(0x400,0xbfbfec40,0,0,0xbfbfe=
cd8)
  4126 amd      1228846259.090428 RET   unmount 0
  4126 amd      1228846259.090492 CALL  ...
From: Rick Macklem
Date: Wednesday, December 10, 2008 - 9:30 am

The different behaviour for -CURRENT could be the newer RPC layer that
was recently introduced, but that doesn't explain the basic problem.

All I can think of is to ask the obvious question. "Are you using
interruptible or soft mounts?" If so, switch to hard mounts and see
if the problem goes away. (imho, neither interruptible nor soft mounts
are a good idea. You can use a forced dismount if there is a crashed
NFS server that isn't coming back anytime soon.)

If you are getting this with hard mounts, I'm afraid I have no idea
what the problem is, rick.

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
From: David Wolfskill
Date: Wednesday, December 10, 2008 - 9:50 am

=46rom examination of /etc/amd* -- I don't see how to get mount(8) or
amq(8) to report it -- it appears that we are using interruptible
mounts, as we always have.

The point is that the behavior has changed in an unexpected way.  And
I'm not so sure that the use of a forced dismount is generally
available, as it would require logging in to the NFS client first, which
may be difficult if the NFS server hosting non-root home directories is
failing to respond and direct root login via ssh(1) is not permitted (as

What concerns me is that even if the attempted unmount gets EBUSY, the
user-level process descending the directory hierarchy is getting ENOENT
trying to issue fstatfs() against an open file descriptor.

I'm having trouble figuring out any way that makes any sense.

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
From: Kostik Belousov
Date: Wednesday, December 10, 2008 - 10:06 am

Basically, the problem is that NFS uses shared lookup, and this allows
for the bug where several negative namecache entries are created for
non-existent node. Then this node gets created, removing only the first
negative namecache entry. For some reasons, vnode is reclaimed; amd'
tasting of unmount is a good reason for vnode to be reclaimed.

Now, you have existing path and a negative cache entry. This was
reported by Peter Holm first, I listed relevant revisions that
should fix this in previous mail.
From: David Wolfskill
Date: Thursday, December 11, 2008 - 3:53 pm

Well, I messed up the machine I had been using for testing, and needed
to wait for IT to do something to it since I don't have physical or
console access to it.

So after I happened to demonstrate the effect using my desktop  -- which
had been running RELENG_7_1, sources updated as of around 0400 hrs.
US/Pacific -- I decided to go ahead and update the desktop to RELENG_7_1
as of this morning (which had the commit to sys/kern/vfs_cache.c), then
test.

It still failed, apparently in the same way; details below.

First, here's a list of the files that were changed:

U lib/libarchive/archive_read_support_format_iso9660.c
U lib/libarchive/archive_string.c
U lib/libarchive/archive_string.h
U lib/libc/gen/times.3
U lib/libc/i386/sys/pipe.S
U lib/libc/i386/sys/reboot.S
U lib/libc/i386/sys/setlogin.S
U lib/libutil/Makefile
U lib/libutil/kinfo_getfile.c
U lib/libutil/kinfo_getvmmap.c
U lib/libutil/libutil.h
U share/man/man4/bce.4
U share/man/man5/Makefile
U share/man/man5/fstab.5
U share/man/man5/nullfs.5
U sys/amd64/Makefile
U sys/boot/forth/loader.conf.5
U sys/dev/ale/if_ale.c
U sys/dev/bce/if_bce.c
U sys/dev/cxgb/cxgb_main.c
U sys/dev/cxgb/common/cxgb_ael1002.c
U sys/dev/cxgb/common/cxgb_t3_hw.c
U sys/dev/cxgb/common/cxgb_xgmac.c
U sys/dev/re/if_re.c
U sys/fs/nullfs/null_vnops.c
U sys/kern/Make.tags.inc
U sys/kern/kern_descrip.c
U sys/kern/kern_proc.c
U sys/kern/vfs_cache.c
U sys/netinet/in_pcb.h
U sys/pci/if_rlreg.h
U sys/sys/sysctl.h
U sys/sys/user.h
U sys/ufs/ufs/ufs_quota.c
U usr.bin/procstat/Makefile
U usr.bin/procstat/procstat_files.c
U usr.bin/procstat/procstat_vm.c
U usr.bin/tar/util.c
U usr.bin/tar/test/Makefile
U usr.bin/tar/test/test_strip_components.c
U usr.bin/tar/test/test_symlink_dir.c
U usr.bin/xargs/xargs.1
U usr.sbin/mtree/mtree.c

We see that sys/kern/vfs_cache.c is, indeed, among them.  And:

dwolf-bsd(7.1-P)[5] grep '\$FreeBSD' /sys/kern/vfs_cache.c
__FBSDID("$FreeBSD: src/sys/kern/vfs_cache.c,v 1.114.2.3 2008/12/09 16:20:5=
8 kib ...
From: Kostik Belousov
Date: Friday, December 12, 2008 - 6:41 am

=3D20124614, mode=3Ddrwxr-xr-x , nlink=3D3, uid=3D9874, gid=3D929, rdev=3D0=
, atime=3D1228844788, stime=3D1227555769, ctime=3D1228845828.326650000, bir=
=3D20124614, mode=3Ddrwxr-xr-x , nlink=3D3, uid=3D9874, gid=3D929, rdev=3D0=
, atime=3D1228844788, stime=3D1227555769, ctime=3D1228845828.326650000, bir=

But is this error transient or permanent ? I.e., would restart of rm
successful or failing ?

Anyway, this error looks different too.
From: David Wolfskill
Date: Friday, December 12, 2008 - 7:36 am

In a test yesterday, it took 3 attempts (each attempt being an
invocations of "rm -fr ~bspace/ports") to actually complete removal of
the hierarchy.

Please note that:

* Done on a locally-mounted file systen (vs. NFS), a single invocation
  is sufficient and terminates normally.  Each of the above-cited
  attempts but the last terminated with a status code of 1 (as well as
  a whine that one or more subdirectories was not empty -- this, as a
  result of "rm" getting inconsistent information about the status of the
  file system).

* Done on either a locally- or NFS-mounted file system in FreeBSD 6.x, a
  single invocation is sufficient and terminates normally.


?  From the earlier-posted results in 7.x?  Not that I can tell.  In
each case, the amd(8) child process is forked to attempt an unmount(),
tries it, gets EBUSY, and exits.  Meanwhile, rm(1) is descending a
directory tree.  It had performed a readdir(), and had been unlinking
files and performing rmdir() against empty subdirectories.  It
encounters an entry, issues stat(), finds that it's a subdirectory,
open()s it, gets an FD, issues fstat(), gets results that match those of
the earlier stat(), issues fcntl() against the FD (which returns 0),
tries to issue fstatfs() against the FD *that is still open*, and gets
told ENOENT.

It does differ from the behavior in 8-CURRENT, in that the amd(8) child
process in 8-CURRENT does not appear to get EBUSY.  The behavior from
rm(1)'s perspective is very similar, though.

If it would help, I could try getting a ktrace from a 6.x system, but I
expect it will be very boring: the amd(8) child process should get EBUSY
(as it does in 7.x), and nothing else should happen, since the unmount()
attempt failed.  And since it failed, rm(1) doesn't get told
inconsistent information, so things Just Work.

I admit that I'm no expert on VFS or much of the rest of the kernel,
for that matter.  But what I have observed happening in recent 7.x
is both wrong and a ...
Previous thread: Why does host and dig show a ; as \;? by Larry Rosenman on Tuesday, December 9, 2008 - 9:36 am. (7 messages)

Next thread: Laptop w/ dark screen & no (working) keyboard by David Wolfskill on Tuesday, December 9, 2008 - 4:01 pm. (8 messages)