Same distro, x86_64, similar servers.
I'm not sure if the two cases I am seeing are exactly the same problem,
but on the log crunching boxes, system time seems proportional to
nfs_inode_cache and nfs_inode_cache just keeps growing forever; however,
if I stop the load and unmount the NFS mount points, all of the
nfs_inode_cache objects do actually go away (after umount finishes).
It seems the shrinker callback might not be working as intended here.
On the shared server case, the crazy spinlock contention from all of the
flusher processes happens suddenly and overloads the boxes for 10-15
minutes, and then everything recovers. Over 21 of these boxes, they
each have about 500k-700k nfs_inode_cache objects. The log cruncher hit
3.3 million nfs_inode_cache objects before I unmounted.
Are your boxes repeating this behaviour at any predictable interval?