login
Header Space

 
 

Re: nfs: infinite loop in fcntl(F_SETLKW)

Previous thread: Re: file offset corruption on 32-bit machines? by Michal Hocko on Thursday, April 10, 2008 - 9:55 am. (22 messages)

Next thread: iowait stability regressed from kernel 2.6.22 under heavy load of multi-thread aio writing/reading by rae l on Friday, April 11, 2008 - 5:17 am. (4 messages)
To: <neilb@...>, <Trond.Myklebust@...>
Cc: <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 10, 2008 - 3:51 pm

Another infinite loop, this one involving both client and server.

Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns
NLM_LCK_DENIED.

On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever.

I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for
blocking locks, as NLM_LCK_BLOCKED is for the contended case.  For
testing the lock leave NLM_LCK_DENIED as EAGAIN.  That still could be
misleading, but at least there's no infinite loop in that case.

I've minimally tested this patch to verify that it cures the lockup,
and that simple blocking locks keep working.

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
---
 fs/lockd/clntproc.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux/fs/lockd/clntproc.c
===================================================================
--- linux.orig/fs/lockd/clntproc.c	2008-04-02 13:34:57.000000000 +0200
+++ linux/fs/lockd/clntproc.c	2008-04-10 21:23:46.000000000 +0200
@@ -536,6 +536,9 @@ again:
 		up_read(&amp;host-&gt;h_rwsem);
 	}
 	status = nlm_stat_to_errno(resp-&gt;status);
+	/* Don't return EAGAIN, as that would make fcntl_setlk() loop */
+	if (status == -EAGAIN)
+		status = -ENOLCK;
 out_unblock:
 	nlmclnt_finish_block(block);
 	/* Cancel the blocked request if it is still pending */

--
To: Miklos Szeredi <miklos@...>
Cc: <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 10, 2008 - 5:02 pm

Wait. There is something really weird going on here.

According to the spec, LCK_DENIED means 'the request failed' (i.e.
ENOLCK is definitely correct)

OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary
failures, the first because the server had a resource problem, and the
second because the server rebooted and is in the grace period (i.e.
EAGAIN would appear to be more appropriate). See

http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02

AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by
swapping the return values for NLM_LCK_DENIED and
NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD.

The problem is that there appears to be a similar confusion on the Linux
server side in nlmsvc_lock(). :-(

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
--
To: Miklos Szeredi <miklos@...>
Cc: <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 10, 2008 - 5:07 pm

Duh... Sorry, EAGAIN is indeed the correct return value for fcntl() when
the lock attempt failed. I should have reread the manpage/posix spec
before replying.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
--
To: Miklos Szeredi <miklos@...>, Marc Eshel <eshel@...>, Dr. J. Bruce Fields <bfields@...>
Cc: <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 10, 2008 - 5:20 pm

OK. So the correct fix here should really be applied to fcntl_setlk().
There is absolutely no reason why we should be looping at all if the
filesystem has a -&gt;lock() method.

In fact, this looping behaviour was introduced recently in commit
7723ec9777d9832849b76475b1a21a2872a40d20. Marc, Bruce, why was this
done, and how are filesystems now supposed to behave?

--
To: Trond Myklebust <trond.myklebust@...>
Cc: Miklos Szeredi <miklos@...>, Marc Eshel <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 10, 2008 - 5:54 pm

Apologies, that was indeed a behavioral change introduced in a commit

The assumption must have been that -EAGAIN could only mean that we
needed to keep blocking, and hence was a nonsensical return from a
filesystem lock method that waited itself for the lock to become
available--such a method would return 0, -EINTR (or some more exotic
error), or continue waiting.

If we agree that EAGAIN is actually a legimate error to return from a
blocking lock, then, yes, we need take -&gt;lock() back out of this loop.

And I don't think there's any real reason we need the new behavior.  So
we should probably revert that--I'll take a closer look tomorrow....

--b.
--
To: <bfields@...>
Cc: <trond.myklebust@...>, <miklos@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Sunday, April 13, 2008 - 4:28 am

Another complaint about this series: using EINPROGRESS to signal
asynchronous locking looks really fishy.  How does the filesystem
know, that the caller wants to do async locking?  How do we make sure,
that the filesystem (like fuse or 9p, which "blindly" return the error
from the server) doesn't return EINPROGRESS even when it's _not_ doing
an asynchronous lock?

I think it would have been much cleaner to have a completely separate
interface for async locking, instead of trying to cram that into
f_op-&gt;lock().

Would that be possible to fix now?  Or at least make EINPROGRESS a
kernel-internal error value (&gt;512), to make it that it has a special
meaning for the _kernel only_?

Thanks,
Miklos
--
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 14, 2008 - 1:19 pm

The caller sets a fl_grant callback.  But I guess the interesting
question is how the caller knows that the filesystem is really going to

Right, there's no safeguard there--if fuse returns EINPROGRESS, then
we'll wait for a grant callback that's not going to come.  It should
time out, so that's not a total disaster, but still.



Perhaps so.

The behavior of lockd will still depend to some degree on the exact
error returned from the filesystem--e.g. if you return -EAGAIN from
-&gt;lock() without later calling -&gt;fl_grant() it will cause some
unexpected delays.  (Though again clients will eventually give up and
poll for the lock.)

--b.
--
To: <bfields@...>
Cc: <miklos@...>, <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 14, 2008 - 5:15 pm

OK, so the semantics of vfs_lock_file() are now:

1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention
2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on
   contention
3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on
   contention:
   a) either return -EINPROGRESS and call fl_grant when granted
   b) or return -EAGAIN  and call fl_notify when the lock needs retrying

As a first step, how about doing the following:

For 3) use LOCK_FILE_ASYNC as a return value.  AFAICS there's no
reason to distinguish between a) and b).  For 1) leave the -EAGAIN
error, since in that case no lock was queued.

Here's an untested patch.  Comments?

Miklos


---
 fs/gfs2/locking/dlm/plock.c |    2 +-
 fs/lockd/svclock.c          |   13 ++++---------
 fs/locks.c                  |   20 ++++++++++----------
 include/linux/fs.h          |   15 +++++++++++++++
 4 files changed, 30 insertions(+), 20 deletions(-)

Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c	2008-04-14 22:19:57.000000000 +0200
+++ linux-2.6/fs/locks.c	2008-04-14 22:52:06.000000000 +0200
@@ -784,8 +784,10 @@ find_conflict:
 		if (!flock_locks_conflict(request, fl))
 			continue;
 		error = -EAGAIN;
-		if (request-&gt;fl_flags &amp; FL_SLEEP)
-			locks_insert_block(fl, request);
+		if (!(request-&gt;fl_flags &amp; FL_SLEEP))
+			goto out;
+		error = FILE_LOCK_ASYNC;
+		locks_insert_block(fl, request);
 		goto out;
 	}
 	if (request-&gt;fl_flags &amp; FL_ACCESS)
@@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod
 			error = -EDEADLK;
 			if (posix_locks_deadlock(request, fl))
 				goto out;
-			error = -EAGAIN;
+			error = FILE_LOCK_ASYNC;
 			locks_insert_block(fl, request);
 			goto out;
   		}
@@ -1040,7 +1042,7 @@ int posix_lock_file_wait(struct file *fi
 	might_sleep ();
 	for (;;) {
 		error = posix_lock_file(filp, fl, NULL);
-		if ((error != -EAGAIN) || !(fl-&gt;fl_flags...
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Tuesday, April 15, 2008 - 2:58 pm

Not quite; the original idea was that we didn't care about the blocking
lock case, since there's already a lock manager callback for that.
(It's arguably a minor abuse for us to return NLM_LCK_BLOCKED and then
do a grant later even in the case where there's no conflict, but it
works.)  So we only changed the nonblocking case.

Which may be sacrificing elegance for expediency, and I'm not opposed to
fixing that in whatever way seems best.  As a start, we could document

I'd put it this way (after a quick check of the code to convince myself
I'm remembering this right...):

  1) If FL_SLEEP, then return 0 if granted, and on contention either:
     a) block, or
     b) return -EAGAIN, and call fl_notify when the lock should be
        retried.
  2) If !FL_SLEEP, then either
     a) return a result now, or
     b) return -EINPROGRESS, then call fl_grant with the result.

In each case, b) is only of course allowed if the corresponding callback
is defined.  If you want to support NFS exports, then you should also
take option b) when possible.  (Though note only 1b is *required* to
avoid deadlocks; 2b is basically just a performance enhancement.  We
could perhaps even ditch 2b if we fixed lockd's BKL reliance and made it
multithreaded.)

The names can be confusing, since fl_notify generally results in
granting a previously blocked lock, whereas fl_grant just returns the
result (positive or negative) of a single lock request.  They have the
names they do because of the difference in their semantics:

	- With fl_grant the filesystem says: I'm giving you the final
	  result of the lock operation.  In particular, if I'm telling
	  you it succeeded, that means I've already granted you the
	  lock; don't ask me about it again.

	- With fl_notify the filesystem says: something just happened to
	  this lock.  I'm not guaranteeing anything, I'm just telling
	  you this would be a good time to try the lock again.  Do it
	  and maybe you'll get lucky!

This difference is why the fl_grant...
To: <bfields@...>
Cc: <miklos@...>, <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Wednesday, April 16, 2008 - 12:28 pm

Gfs2 seems to return -EINPROGRESS regardless of the FL_SLEEP flag:




No problem.  Here it is again with some cosmetic fixes and testing.

Miklos
--


Use a special error value FILE_LOCK_DEFERRED to mean that a locking
operation returned asynchronously.  This is returned by

  posix_lock_file() for sleeping locks to mean that the lock has been
  queued on the block list, and will be woken up when it might become
  available and needs to be retried (either fl_lmops-&gt;fl_notify() is
  called or fl_wait is woken up).

  f_op-&gt;lock() to mean either the above, or that the filesystem will
  call back with fl_lmops-&gt;fl_grant() when the result of the locking
  operation is known.  The filesystem can do this for sleeping as well
  as non-sleeping locks.

This is to make sure, that return values of -EAGAIN and -EINPROGRESS
by filesystems are not mistaken to mean an asynchronous locking.

This also makes error handling in fs/locks.c and lockd/svclock.c
slightly cleaner.

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
---
 fs/gfs2/locking/dlm/plock.c |    2 +-
 fs/lockd/svclock.c          |   13 ++++---------
 fs/locks.c                  |   28 ++++++++++++++--------------
 include/linux/fs.h          |    6 ++++++
 4 files changed, 25 insertions(+), 24 deletions(-)

Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c	2008-04-16 17:33:49.000000000 +0200
+++ linux-2.6/fs/locks.c	2008-04-16 17:38:01.000000000 +0200
@@ -784,8 +784,10 @@ find_conflict:
 		if (!flock_locks_conflict(request, fl))
 			continue;
 		error = -EAGAIN;
-		if (request-&gt;fl_flags &amp; FL_SLEEP)
-			locks_insert_block(fl, request);
+		if (!(request-&gt;fl_flags &amp; FL_SLEEP))
+			goto out;
+		error = FILE_LOCK_DEFERRED;
+		locks_insert_block(fl, request);
 		goto out;
 	}
 	if (request-&gt;fl_flags &amp; FL_ACCESS)
@@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod
 			error = -EDEADLK;
 			if ...
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Thursday, April 17, 2008 - 6:26 pm

Oops, you're right; in FL_SLEEP case fs/lockd/svclock.c:nlmsvc_lock() 
returns NLM_LCK_BLOCKED.  I believe it'll get an fl_grant() callback
after that and do a grant call back to the client, but I haven't
checked....

Note, as has been pointed out by Mark Snitzer
(http://article.gmane.org/gmane.linux.nfs/17801/), this limits the kind
of error reporting the filesystem can do--if it needs to block on the
initial lock call, it has to commit to queueing up, and eventually

Thanks!  Ping me in a couple days if I haven't made any comments.  From
a quick skim the GFS2 change and the error return change both seem
reasonable.

I need to a real GFS2 testing setup....  (Did you test GFS2 locking
specifically?)

--
To: <bfields@...>
Cc: <miklos@...>, <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Friday, April 18, 2008 - 8:47 am

OK, but that AFAICS is a lock manager implementation issue, not an API

No gfs2, only ext3.

Thanks,
Miklos
--
To: <bfields@...>
Cc: <trond.myklebust@...>, <miklos@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Friday, April 11, 2008 - 3:12 pm

Yeah, that patch looks totally wrong.  It's not generally a good idea
to do a loop where the exit condition depends on something you don't
control.  And error values from filesystem methods are typically like
that.  For example with fuse, the error code could come from an
unprivileged userspace process.

I didn't realize this aspect of the bug previously, because I
concentrated on the lockd inconsistency.

Btw, why hasn't this work been posted on -fsdevel prior to merging

EAGAIN for a blocking lock is nonsensical, so my original patch could
still make sense.  But that's no longer a regression, and not all that
important.

Miklos
--
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Saturday, April 12, 2008 - 8:08 pm

So, does this patch on its own fix the problem you saw?

Any extra eyes welcome....

--b.

commit e56100676b9ea3b2d5f3e937c3ce8a5149cffb84
Author: J. Bruce Fields &lt;bfields@citi.umich.edu&gt;
Date:   Sat Apr 12 18:12:15 2008 -0400

    locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs
    
    Miklos Szeredi found the bug:
    
    	"Basically what happens is that on the server nlm_fopen() calls
    	nfsd_open() which returns -EACCES, to which nlm_fopen() returns
    	NLM_LCK_DENIED.
    
    	"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
    	which in will cause fcntl_setlk() to retry forever."
    
    So, for example, opening a file on an nfs filesystem, changing
    permissions to forbid further access, then trying to lock the file,
    could result in an infinite loop.
    
    And Trond Myklebust identified the culprit, from Marc Eshel and I:
    
    	7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
    	generic/filesystem switch from setlock code"
    
    That commit claimed to just be reshuffling code, but actually introduced
    a behavioral change by calling the lock method repeatedly as long as it
    returned -EAGAIN.
    
    We assumed this would be safe, since we assumed a lock of type SETLKW
    would only return with either success or an error other than -EAGAIN.
    However, nfs does can in fact return -EAGAIN in this situation, and
    independently of whether that behavior is correct or not, we don't
    actually need this change, and it seems far safer not to depend on such
    assumptions about the filesystem's -&gt;lock method.
    
    Therefore, revert the problematic part of the original commit.  This
    leaves vfs_lock_file() and its other callers unchanged, while returning
    fcntl_setlk and fcntl_setlk64 to their former behavior.
    
    Signed-off-by: J. Bruce Fields &lt;bfields@citi.umich.edu&gt;
    Cc: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
    Cc: Trond Myklebust &lt;trond.mykleb...
To: <bfields@...>
Cc: <miklos@...>, <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Sunday, April 13, 2008 - 4:13 am

Yes.  With the patch applied, the test program returns "lockf:
Resource temporarily unavailable" instead of hanging.

Thanks,
Miklos
--
To: Linus Torvalds <torvalds@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>, Miklos Szeredi <mszeredi@...>, <stable@...>
Date: Monday, April 14, 2008 - 3:03 pm

From: J. Bruce Fields &lt;bfields@citi.umich.edu&gt;

Miklos Szeredi found the bug:

	"Basically what happens is that on the server nlm_fopen() calls
	nfsd_open() which returns -EACCES, to which nlm_fopen() returns
	NLM_LCK_DENIED.

	"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
	which in will cause fcntl_setlk() to retry forever."

So, for example, opening a file on an nfs filesystem, changing
permissions to forbid further access, then trying to lock the file,
could result in an infinite loop.

And Trond Myklebust identified the culprit, from Marc Eshel and I:

	7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
	generic/filesystem switch from setlock code"

That commit claimed to just be reshuffling code, but actually introduced
a behavioral change by calling the lock method repeatedly as long as it
returned -EAGAIN.

We assumed this would be safe, since we assumed a lock of type SETLKW
would only return with either success or an error other than -EAGAIN.
However, nfs does can in fact return -EAGAIN in this situation, and
independently of whether that behavior is correct or not, we don't
actually need this change, and it seems far safer not to depend on such
assumptions about the filesystem's -&gt;lock method.

Therefore, revert the problematic part of the original commit.  This
leaves vfs_lock_file() and its other callers unchanged, while returning
fcntl_setlk and fcntl_setlk64 to their former behavior.

Signed-off-by: J. Bruce Fields &lt;bfields@citi.umich.edu&gt;
Tested-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: Marc Eshel &lt;eshel@almaden.ibm.com&gt;
---
 fs/locks.c |   48 ++++++++++++++++++++++++++++--------------------
 1 files changed, 28 insertions(+), 20 deletions(-)

This is appropriate for 2.6.25 (as well as older stable kernels--the bug
was introduced in 2.6.22).

--b.

diff --git a/fs/locks.c b/fs/locks.c
index d83fab1..43c0af2 100644
--- a/fs/locks.c
+++ b/...
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 14, 2008 - 1:07 pm

OK, thanks!

--b.
--
To: Miklos Szeredi <miklos@...>
Cc: <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Friday, April 11, 2008 - 3:19 pm

To: <bfields@...>
Cc: <miklos@...>, <trond.myklebust@...>, <eshel@...>, <neilb@...>, <akpm@...>, <linux-nfs@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Friday, April 11, 2008 - 3:22 pm

Ah, sorry.  I was only looking at the last half year of archives :)

Miklos
--
Previous thread: Re: file offset corruption on 32-bit machines? by Michal Hocko on Thursday, April 10, 2008 - 9:55 am. (22 messages)

Next thread: iowait stability regressed from kernel 2.6.22 under heavy load of multi-thread aio writing/reading by rae l on Friday, April 11, 2008 - 5:17 am. (4 messages)
speck-geostationary