> On Tue, 29 Apr 2008, J. Bruce Fields wrote:
>
> > On Tue, Apr 29, 2008 at 01:54:54PM -0700, Andrew Morton wrote:
> > > On Tue, 29 Apr 2008 11:42:48 +0800
> > > "Bryan Wu" <cooloney@kernel.org> wrote:
> > >
> > > > Hi folk,
> > > >
> > > > This days I am digging into this LTP bug reported on our Blackfin test
> > > > machine, but I think it is general for other system.
> > > >
https://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&t...
> > > >
> > > > And I also found Kumar Gala reported this similar bug before.
> > > >
http://lkml.org/lkml/2007/11/14/388
> > > >
> > > > 1, when opening and creating a new on ramfs/tmpfs, the dentry->d_count
> > > > will be added one as below:
> > > > --
> > > > ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> > > > {
> > > > |_______struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
> > > > |_______int error = -ENOSPC;
> > > >
> > > > |_______if (inode) {
> > > > |_______|_______if (dir->i_mode & S_ISGID) {
> > > > |_______|_______|_______inode->i_gid = dir->i_gid;
> > > > |_______|_______|_______if (S_ISDIR(mode))
> > > > |_______|_______|_______|_______inode->i_mode |= S_ISGID;
> > > > |_______|_______}
> > > > |_______|_______d_instantiate(dentry, inode);
> > > > |_______|_______dget(dentry);|__/* Extra count - pin the dentry in core */
> > > > |_______|_______error = 0;
> > > > |_______|_______dir->i_mtime = dir->i_ctime = CURRENT_TIME;
> > > > |_______}
> > > > |_______return error;
> > > > }
> > > > --
> > > > The dget(dentry) call introduces an extra count, why?
> > > > it is the same in tmpfs.
> > >
> > > Because those dentries have no backing store. Their sole existance is in
> > > the dentry cache which is normally reclaimable. But we can't reclaim these
> > > dentries because there is nowhere from where they can be reestablished.
> > >
> > > > 2, when calling fcntl(fd, F_SETLEASE,F_WRLCK), it will return -EAGAIN
> > > > --
> > > > |_______if ((arg == F_WRLCK)
> > > > |_______ && ((atomic_read(&dentry->d_count) > 1)
> > > > |_______|_______|| (atomic_read(&inode->i_count) > 1)))
> > > > |_______|_______goto out;
> > > > --
> > >
> > > Sucky heuristic.
> >
> > Yes.
> >
> > >
> > > > because the dentry->d_count will be 2 not 1. I tested ext2 on Blackfin, it is 1.
> > > >
> > > > 3, so I guess maybe the dget(dentry) of ramfs_mknod is useless. But
> > > > after remove this dget(),
> > > > the ramfs can not be mounted as rootfs at all.
> > >
> > > Interesting. Presumably it got reclaimed synchronously somehow.
> > >
> > > > Is the bug in generic_setlease() or in the ramfs/tmpfs inode create function?
> > > >
> > > > Of course, simply remove the test '((atomic_read(&dentry->d_count) >
> > > > 1)' can workaround this issue.
> > >
> > > I guess we should make the generic_setlease() heuristic smarter.
> > >
> > > Of course the _reason_ for that heuristic is uncommented and lost in time.
> > > And one wonders what locking prevents it from being totally racy, and if
> > > "none", what happens when the race hits. Sigh.
>
> i'm not sure which particular kernel version we're talking about,
> but i think the intent was to rely on the BKL. i noticed a couple cases
> where this didn't actually hold -- e.g., bruce has a patch queued to move
> the kmalloc in generic_setlease() so that it precedes the
> d_count/i_writecount checks and covers the race he mentions below. an
> earlier patch closed a similar thing with get_write_access() when handling
> a truncate, etc.
>
> fyi (well, not you, bruce): relatedly, i have a set of patches to
> introduce per-inode lease enabling/disabling (so we can fully implement
> NFSv4.0 file delegations and v4.1 directory delegations) which expand the
> cases where leases are broken (e.g. unlink) and which make it somewhat
> more explicit when it's safe to lease/break. perhaps the next merge
> window for the first set of them.
>
>
> > Yes, I think the race is:
> >
> > 1. generic_setlease(., F_WRLCK, .) checks d_count and i_count,
> > both are 1.
> >
> > 2. a read open comes in, calls break_lease which finds no lease
> > and continues happily on.
> >
> > 3. generic_setlease() sets the write lease.
> >
> > The most likely consequences are that a local reader gets out-of-date
> > data for a file that a Samba client has modified.
> >
> > I suppose that re-checking the d_count and i_count after step 3 might
> > close the race.
>
> as things currently stand, i believe that race can only happen if
> the leaser is blocking on the kmalloc.