login
Header Space

 
 

magical mounts

November 30, 2007 - 6:56am
Submitted by Anonymous on November 30, 2007 - 6:56am.
Linux

We are running linux servers over nfsroot to a solaris nfs server. When the nfs server reboots, planned or unplanned ;), the linux servers start behaving very strange. Stats on dirs or changing into directories or accessing files in directories create NEW mounts .

here an example: these are the mounts of a normal system
root@QL0053:~# mount
rootfs on / type rootfs (rw)
none on /sys type sysfs (rw)
none on /proc type proc (rw)
udev on /dev type tmpfs (rw)
10.100.255.251:/stor/nfsroot/machines/10.100.0.53 on / type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
10.100.255.251:/stor/nfsroot/machines/10.100.0.53 on /dev/.static/dev type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
tmpfs on /var/run type tmpfs (rw)
tmpfs on /var/lock type tmpfs (rw)
devpts on /dev/pts type devpts (rw)
tmpfs on /dev/shm type tmpfs (rw)
tmpfs on /var/run type tmpfs (rw)
tmpfs on /var/lock type tmpfs (rw)
fileserver.qlayer.com:/stor/qinstall/vapps on /mnt/vapps type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=fileserver.qlayer.com)

System is working great until the NFS server reboots... then, after nfs shares are back up:
accessing any directory creates shadow mountpoints...
these mounts get added, just by logging in :-(

10.100.255.251:/stor/nfsroot/machines/10.100.0.53/usr/lib/locale on /usr/lib/locale type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
10.100.255.251:/stor/nfsroot/machines/10.100.0.53/usr/share/locale on /usr/share/locale type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
10.100.255.251:/stor/nfsroot/machines/10.100.0.53/lib/terminfo/x on /lib/terminfo/x type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)

root@QL0053:~# mkdir test
root@QL0053:~# touch test/test.txt
root@QL0053:~# mount
---SNIP---
10.100.255.251:/stor/nfsroot/machines/10.100.0.53/root/test on /root/test type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
---SNAP---

root@QL0053:~# find .
.
./test
./test/tmp
./test/tmp/test.txt
./test/test.txt
./.Xauthority
./.profile
./.ipython
./.ipython/ipy_user_conf.py
./.ipython/ipythonrc
./.viminfo
./.VirtualBox
./.VirtualBox/xpti.dat
./.VirtualBox/VirtualBox.xml
./.VirtualBox/compreg.dat
./.ssh
./.bash_history
./.bashrc
./.subversion
./.subversion/auth
./.subversion/auth/svn.simple
./.subversion/auth/svn.username
./.subversion/auth/svn.ssl.server
./.subversion/README.txt
./.subversion/servers
./.subversion/config

and then : only SOME dirs get mounted SOMETIMES... here one more

10.100.255.251:/stor/nfsroot/machines/10.100.0.53/root/.ipython on /root/.ipython type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)

root@QL0053:~# find /boot/
/boot/
/boot/xen-3.1.gz
/boot/vmlinuz-2.6.22.12-xen-32b
/boot/System.map-2.6.22.12-vbox-32b
/boot/xen-3.gz
/boot/config-2.6.22.12-xen-32b
/boot/vmlinuz-2.6.22.12-vbox-32b
/boot/vmlinuz-2.6.22.12-xen-32b-xenU
/boot/System.map-2.6.22.12-xen-32b
/boot/xen.gz
/boot/grub
/boot/grub/menu.lst
/boot/nfsrootrd-2.6.22.12-vbox-32b
/boot/memtest86+.bin
/boot/config-2.6.22.12-vbox-32b
/boot/xen-3.1.0.gz
/boot/xen-syms-3.1.0
/boot/nfsrootrd-2.6.22.12-xen-32b
/boot/vmlinux-syms-2.6.22.12-xen-32b
----adds---
10.100.255.251:/stor/nfsroot/machines/10.100.0.53/boot/grub on /boot/grub type nfs (rw,vers=3,rsize=1048576,wsize=1048576,hard,nolock,proto=tcp,timeo=70,retrans=10,sec=sys,addr=10.100.255.251)
---this---

we tried disabling the automounter in the kernel (built a new one) but nothing helps...
we have to reboot the linux nfsroot machines to make the problems go away.
But sometimes, on some machines, suddenly, without warning or a reboot of the nfs server these strange mountproblems reappear...
Linux kernels are 2.6.16, 2.6.20, 2.6.22 (vanilla or xen, same problem) ...
NFS server is a Solaris express Build 68. the NFS shares are on a zpool and are clones of a snapshot.

[root@FileServer-11:42 AM-~]# uname -a
SunOS FileServer 5.11 snv_68 i86pc i386 i86pc Solaris
[root@FileServer-11:42 AM-~]# zfs get all stor/nfsroot/machines/10.100.0.53
NAME PROPERTY VALUE SOURCE
stor/nfsroot/machines/10.100.0.53 type filesystem -
stor/nfsroot/machines/10.100.0.53 creation Fri Nov 30 8:28 2007 -
stor/nfsroot/machines/10.100.0.53 used 238M -
stor/nfsroot/machines/10.100.0.53 available 1.35T -
stor/nfsroot/machines/10.100.0.53 referenced 1.10G -
stor/nfsroot/machines/10.100.0.53 compressratio 1.00x -
stor/nfsroot/machines/10.100.0.53 mounted yes -
stor/nfsroot/machines/10.100.0.53 origin stor/nfsroot/master/q-layer/1.5.0/A07/330/32/qos@SNAP -
stor/nfsroot/machines/10.100.0.53 quota none default
stor/nfsroot/machines/10.100.0.53 reservation none default
stor/nfsroot/machines/10.100.0.53 recordsize 128K default
stor/nfsroot/machines/10.100.0.53 mountpoint /stor/nfsroot/machines/10.100.0.53 default
stor/nfsroot/machines/10.100.0.53 sharenfs rw=@10.100.0.53/32,root=@10.100.0.53/32 local
stor/nfsroot/machines/10.100.0.53 checksum on default
stor/nfsroot/machines/10.100.0.53 compression off default
stor/nfsroot/machines/10.100.0.53 atime on default
stor/nfsroot/machines/10.100.0.53 devices on default
stor/nfsroot/machines/10.100.0.53 exec on default
stor/nfsroot/machines/10.100.0.53 setuid on default
stor/nfsroot/machines/10.100.0.53 readonly off default
stor/nfsroot/machines/10.100.0.53 zoned off default
stor/nfsroot/machines/10.100.0.53 snapdir hidden default
stor/nfsroot/machines/10.100.0.53 aclmode groupmask default
stor/nfsroot/machines/10.100.0.53 aclinherit secure default
stor/nfsroot/machines/10.100.0.53 canmount on default
stor/nfsroot/machines/10.100.0.53 shareiscsi off default
stor/nfsroot/machines/10.100.0.53 xattr on default
stor/nfsroot/machines/10.100.0.53 copies 1 default

So: it looks to me that when state is lost after a reboot of the nfs server, the nfsclients (and only on nfsroot mountpoints) start doing that. We've been searching through the kernel sources, but are not knowledgable enough to figure it out ourselves... someone an idea? Can I provide more info?

We have been searching like mad to see if someone has already had the same issue, but to no avail... we are the only ones :-(

anything interesting in syslog / dmesg on any of the machines?

July 6, 2008 - 7:59am
sileNT (not verified)

anything interesting in syslog / dmesg on any of the machines?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary