Re: [NFSD OOPS] 2.6.23-rc1-git10

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andrew Clayton <andrew@...>
Cc: <linux-kernel@...>, <nfs@...>
Date: Thursday, August 2, 2007 - 11:53 pm

On Thursday August 2, andrew@digital-domain.net wrote:

It seems you have found a race between shutting down and starting up
nfsd.
You md array was probably going very slowly due to playing with
stripe_cache_size, and that stretched things out long enough to loose
the race.

NFSD is shut down by something similar to
   killall nfsd

which sends signals to all the threads, but doesn't wait for them to
exit.
The first line in the log is mountd exiting.  nfsd should have all
been signalled at the same time.
The second line comes from the nfsd program trying to start up the
nfsd threads again, and getting an error because there were some
already running.
The third is a similar issue with lockd.

The 4th and 5th show the last thread exiting.   Only it isn't really
the last thread.  By this stage some more nfsd threads have started
up.  There was probably a backlog of requests as things were running
slowly so as soon as a new nfsd thread was started, it accepted a
connection and created a new temporary socket.  This would have been
after the thread that thought it was the last thread, thought it had
closed the last temp socket.
This caused the BUG.

As part of shutting down, the thread that thought it was last cleared
out the read-ahead info cache.  When the new threads tried to use it,
they all suffered general protection faults.

So clearly we need some proper locking around thread start-up and
shutdown.  We had previous relied on lock_kernel, but that isn't
really good enough for this.   I'll try to figure out the best way to
fix it.  Meanwhile, I doubt the problem will recur.

Thanks for the report.

NeilBrown

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [NFSD OOPS] 2.6.23-rc1-git10, Neil Brown, (Thu Aug 2, 11:53 pm)
Re: [NFSD OOPS] 2.6.23-rc1-git10, Andrew Clayton, (Fri Aug 3, 4:27 am)