On Thursday August 2, andrew@digital-domain.net wrote:It seems you have found a race between shutting down and starting up nfsd. You md array was probably going very slowly due to playing with stripe_cache_size, and that stretched things out long enough to loose the race. NFSD is shut down by something similar to killall nfsd which sends signals to all the threads, but doesn't wait for them to exit. The first line in the log is mountd exiting. nfsd should have all been signalled at the same time. The second line comes from the nfsd program trying to start up the nfsd threads again, and getting an error because there were some already running. The third is a similar issue with lockd. The 4th and 5th show the last thread exiting. Only it isn't really the last thread. By this stage some more nfsd threads have started up. There was probably a backlog of requests as things were running slowly so as soon as a new nfsd thread was started, it accepted a connection and created a new temporary socket. This would have been after the thread that thought it was the last thread, thought it had closed the last temp socket. This caused the BUG. As part of shutting down, the thread that thought it was last cleared out the read-ahead info cache. When the new threads tried to use it, they all suffered general protection faults. So clearly we need some proper locking around thread start-up and shutdown. We had previous relied on lock_kernel, but that isn't really good enough for this. I'll try to figure out the best way to fix it. Meanwhile, I doubt the problem will recur. Thanks for the report. NeilBrown -
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Amit K. Arora | [RFC] Heads up on sys_fallocate() |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
