On the FreeBSD hacker mailing list, Jordan Hubbard commented on some serious issues with NFS, posting a tool called 'fsx' - originally developed for the NeXT OS - that was ideal for finding them. Matt Dillon was quite impressed by the tool and immediately started playing with it. In very little time, he presented a number of major fixes...
After submitting a string of these major bug fixes, Matt commented, "What I really love about this program is that the problems are so repeatable. So far the same failure occurs at exactly the same place, every time. It makes it unbelievably easy to track the bugs down. His last comment noted that he was going to try an overnight test with the tool, "I give it about a 70% chance of surviving", this in regards to a bug he found and fixed in the middle of the VM layer, "The fix is easy, but a little scary due to being right smack in the middle of the VM system".
Also included in this thread are some basic tips on better NFS performance. Many of the related emails follow...
From: Jordan Hubbard
To: hackers at FreeBSD.ORG
Subject: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 08:23:56 -0800 (PST)
It came up in a meeting today at Apple just how fragile the BSD NFS
implementation was before significant work was put in to stabilizing it,
and in that discussion came up a little test tool written originally by
Avie Tevanian and subsequently improved by one of the folks here.
This tool basically tries to do everything it can (legally) to confuse an
NFS server. It seeks around, does I/O to and truncates/changes the size
of a test file, all while doing everything it can to detect data corruption
or other signs of misbehavior which might result from out-of-order replies
or any other previously-observed NFS pathology. Very few NFS implementations
apparently survive this test and FreeBSD's is no exception. The sources are
provided below, courtesy of Avie, for the education and enjoyment(?) of
anyone who's motivated to play with (or even pretends to understand) NFS.
Usage:
cc fsx.c -o fsx
./fsx /some/nfs/mounted/scratchfile
[ ** kaboom! ** ]
I'm also trying to determine which of the fixes Apple has made to NFS might
be adapted to FreeBSD, something which is made more difficult by the fact
that much of the code was taken straight from 4.4 Lite some time back and
both operating systems have diverged significantly since then. Anyone
really keen on investigating this further themselves on it can also go to
http://www.opensource.apple.com/projects/darwin and register themselves
online (it's easy) to access the Darwin CVS repository, the module in
question being "xnu" (the Darwin kernel). Thanks.
- Jordan
From: Matthew Dillon
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 11:31:25 -0800 (PST)
Oooh. Very cool! I'll start messing with it (oops, that's going to
make both Paul and Alfred annoyed with me!)
-Matt
From: Mike Smith
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 15:45:53 -0800
I should point out that FSX can be used against any filesystem, and
that there are reports locally (at Apple) that it's great for killing
FreeBSD machines. I wasn't able to reproduce this when I tried, but I
may not have let it run long enough.
> :I'm also trying to determine which of the fixes Apple has made to NFS might
> :be adapted to FreeBSD, something which is made more difficult by the fact
> :that much of the code was taken straight from 4.4 Lite some time back and
> :both operating systems have diverged significantly since then.
Many of the key issues in making OS X NFS work were related to its
interaction with the UBC and the subtly different VFS semantics,
although the same issues probably exist in different form in the
FreeBSD code. I get dragged into some really shocking corridor
discussions every now and then. 8)
= Mike
From: Matthew Dillon
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 15:49:44 -0800 (PST)
:I should point out that FSX can be used against any filesystem, and
:that there are reports locally (at Apple) that it's great for killing
:FreeBSD machines. I wasn't able to reproduce this when I tried, but I
:may not have let it run long enough.
Well, I already found and tracked down a softupdates bug revealed
by this code... definite server panic, especially if an NFSv2 mount
is used.
I shot some mail off to Kirk with a proposed fix for that.
With both NFSv2 and NFSv3 I get user-process bus errors with the
truncate()/mmap() combinations being used. I'm tracking that down
now.
-Matt
From: Poul-Henning Kamp
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 21:39:11 +0100
In message [above] Jordan Hubbard writes:
>Usage:
> cc fsx.c -o fsx
> ./fsx /some/nfs/mounted/scratchfile
> [ ** kaboom! ** ]
The only thing I get is a math exception because "closeprob" is zero
since no -c option was given.
Can you provide some sample parameters please ?
From: Jordan Hubbard
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 14:56:48 -0800
Hmmm, how strange, now that I look at the code it's obvious that a
divide by zero will occur with a zero closeprob and the docs state the
default to be "infinity", which is obviously not the case. The
strange part is that I ran this on freebsd.apple.com, which is running
4.4-stable, with one parameter (the filename) exactly as I pasted in
the usage instructions before. Perhaps all this time spent living
next to the Macintosh in my office has induced that copy of FreeBSD to
be more "friendly" and mask simple math errors. :-)
In any case, -c 1 appears to work just fine.
- Jordan
From: Jordan Hubbard
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 16:01:45 -0800
> I should point out that FSX can be used against any filesystem, and
> that there are reports locally (at Apple) that it's great for killing
> FreeBSD machines. I wasn't able to reproduce this when I tried, but I
> may not have let it run long enough.
Oh, it blows freebsd.apple.com right out of the water with a kernel
panic after running for just 3 seconds from an OS X box on the same
LAN segment. :)
- Jordan
From: Peter Wemm
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 17:13:44 -0800
To be clear, what exactly are you doing?
It sounds like you're exporting something from freebsd, mounting it on OSX
and running this tool on OSX against the filesystem exported from freebsd ?
If so, What mount options? NFSv2 or v3?
Cheers,
-Peter
From: Jordan Hubbard
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 20:19:08 -0800
That is correct. As to the NFS options used, I honestly couldn't say
since I'm getting at the filesystem through Netinfo and that's handled
by OS X's automount daemon, that having no relation whatsoever to AMD
and hence no amd.conf file or anything else I can easily look at to
determine how it's being mounted. Maybe Mike knows more about how to
find this out - he's not in management. :)
- Jordan
From: Matthew Dillon
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 17:51:31 -0800 (PST)
I found a second bug... nfs truncation code race.
I've enclosed both patches below. NFS truncation race first, softupdates
bug second. The patches are against -stable.
There are still more bugs... the nfstest code is seeing data corruption
on read. It looks like another truncation bug. I'm tracking it down.
-Matt
From: Geoff Mohler
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 23:08:05 -0800 (PST)
I suppose while were on the topic..
Are there any hidden secrets to eeking out more performance from the BSD
NFS client (other than version types and the normal fstab tweaks).
Im the CS Labs manager at NetApp..and Im always trying to store away a
secret here or there when someone comes to me with a problem in the field.
FreeBSD since v2..rock on!
From: Matthew Dillon
Subject: Re: NFS: How to make FreeBSD fall on its face in one easy step
Date: Wed, 12 Dec 2001 22:59:35 -0800 (PST)
* Make sure you don't have packet loss in your network (test with larger
packets, aka ping -s 8192 rather then just ping, and perhaps test with
a pattern (-p)).
* Run a sufficient number of nfsd's on the server side, depending on
load. 4 or 8 is typical.
* Run nfsiod's on the client side. I usually run 4. This will drastically
improve read-ahead and, for example, can bump linear read speeds on a
100BaseTX network from 7 MBytes/sec to 11 MBytes/sec (full saturation).
* Use NFS version 3 when possible (this is the default)
* Sometimes playing around with the various attribute cache timeouts
(see 'man mount_nfs') helps. Sometimes it doesn't.
For extreme performance there are some zero-copy patches floating around
which have not been integrated into the main tree. Generally, though,
your NFS performance is going to be ultimately limited by your server's
disk performance.
-Matt
Matthew Dillon
From: Matthew Dillon
Subject: Found NFS data corruption bug...
(was Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Wed, 12 Dec 2001 22:08:09 -0800 (PST)
Ok, here is the latest patch for -stable. Note that Kirk comitted a
slightly modified version of the softupdates fix to -current already
(the VOP_FSYNC stuff), which I will be MFCing in 3 days.
This still doesn't fix all the problems the nfstest program that Jordan
posted finds, but it sure runs a hellofalot longer now before reporting
an error. 10,000+ tests now before failing (NFSv2 and NFSv3).
Bugs fixed:
* Possible SMP database corruption due to vm_pager_unmap_page()
not clearing the TLB for the other cpu's.
* When flusing a dirty buffer due to B_CACHE getting cleared,
we were accidently setting B_CACHE again (that is, bwrite() sets
B_CACHE), when we really want it to stay clear after the write
is complete. This resulted in a corrupt buffer.
* We have to call vtruncbuf() when ftruncate()ing to remove
any buffer cache buffers. This is still tentitive, I may
be able to remove it due to the second bug fix.
* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
to set n_size before calling nfs_vinvalbuf or the NFS code
may recursively vnode_pager_setsize() to the original value
before the truncate. This is what was causing the user mmap
bus faults in the nfs tester program.
* Fix to softupdates (old version)
There are some general comments in there too. After I do more tests
and cleanups (maybe remove the vtruncbuf()) I will port it all to
-current, test, and commit. So far the stuff is simple enough that
a 3-day MFC will probably suffice.
All I can say is... holy shit!
-Matt
From: Geoff Mohler
To: Matthew Dillon
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Wed, 12 Dec 2001 22:34:17 -0800 (PST)
Are any of these client-side performance upgrades as well as bug fixes?
From: Matthew Dillon
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Wed, 12 Dec 2001 22:21:19 -0800 (PST)
No, just bug fixes. The softupdates bug fix is server-side. All
the other bug fixes are client side (so far).
-Matt
From: Mike Silbersack
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Thu, 13 Dec 2001 01:37:47 -0500 (EST)
Does the softupdates fix affect normal ffs operations as well?
From: Matthew Dillon
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Wed, 12 Dec 2001 22:51:59 -0800 (PST)
Yes, we believe so. It's a bug in ftruncate()'s interaction
with softupdates.
-Matt
Matthew Dillon
From: David Greenman
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Wed, 12 Dec 2001 22:49:27 -0800
Very cool. Good job!
-DG
From: Matthew Dillon
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Thu, 13 Dec 2001 00:00:37 -0800 (PST)
Thanks! I'm slowly whacking the bugs. I just fixed another one...
vtruncbuf() handles the buffers beyond the file EOF but doesn't handle
the buffer straddling the truncation point, so I had to augment the
NFS client's truncation code to deal with that. With that fixed the
tester program got to 34483 operations before finding a problem.
Hopefully I'm in the home stretch now :-)
What I really love about this program is that the problems are so
repeatable. So far the same failure occurs at exactly the same place,
every time. It makes it unbelievably easy to track the bugs down.
I think I can make it perfect. I'll post another patch tomorrow.
-Matt
Matthew Dillon
From: Jordan Hubbard
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Thu, 13 Dec 2001 00:19:53 -0800
That's awesome... I'd hoped this program might help you find a few
things, but I never expected you to find so many bugs in NFS
so... quickly! I certainly didn't expect you to tickle any local
filesystem problems either. :)
> I think I can make it perfect. I'll post another patch tomorrow.
Thanks. With 4.5 imminent these improvements are, to say state the
flagrantly obvious, very timely indeed.
- Jordan
From: Matthew Dillon
Subject: Re: Found NFS data corruption bug... (was
Re: NFS: How to make FreeBSD fall on its face in one easy step )
Date: Thu, 13 Dec 2001 02:58:28 -0800 (PST)
@#$@#$ crap. I think I found a dirty-mmap edge case with truncation.
It requires a change to vm_page_set_validclean(), which of course is
one of the core routines in the VM system.
Basically what happens is that ftruncate() calls vnode_pager_setsize()
which eventually calls vm_page_set_validclean().
If you happened to mmap() the truncation point shared R+W and
dirty it, then truncate to something that isn't a multiple DEV_BSIZE..
for example, if you were to truncate to an offset of '10', and a buffer
has not been instantiated or marked dirty for the block yet, then the
truncate operation will clear the dirty bit on the page and your 10
bytes of dirty data will never get synced and will disappear if the page
is freed.
vm_page_set_validclean() needs to set the valid bits and clear
the dirty bits associated with (base,size) within the page. If base and/or
size is unaligned then the valid and dirty bits encompass the bits
associated with any overlapping DEV_BSIZEd chunks. This is fine for
setting valid, but not correct when clearing dirty. Only dirty bits for
DEV_BSIZE chunks that are fully enclosed in the range can be cleared.
The fix is easy, but a little scary due to being right smack in the
middle of the VM system.
--
In anycase, I think I got it licked. I'm going to run this nfs tester
program overnight on a local filesystem, NFSv2, and NFSv3 mount. Cross
your fingers! If it survives I'll start comitting to -current tomorrow.
I give it about a 70% chance of surviving.
-Matt