Re: [2.6.34-rc2 NFS4 oops] open error path failure...

Previous thread: [PATCH] dma-mapping: Remove WARN_ON in dma_free_coherent by Stefan Bader on Monday, March 29, 2010 - 11:09 am. (4 messages)

Next thread: [git pull] m68k updates for 2.6.34 by Geert Uytterhoeven on Monday, March 29, 2010 - 11:52 am. (2 messages)
From: Daniel J Blueman
Date: Monday, March 29, 2010 - 11:36 am

Hi Trond,

When open fails and should return EPERM [1], instead we see an oops
[2]. I see this on 2.6.34-rc1 and -rc2 mainline; NFS4 server is
mainline 2.6.33.1.

Let me know if you can't reproduce it and I'll provide some analysis
from this end.

Thanks,
  Daniel

--- [1]

$ touch /tmp/foo
$ sudo chmod 0 /tmp/foo
$ cat /tmp/foo
cat: /tmp/foo: Permission denied

$ touch /net/users/daniel/foo
$ sudo chmod 0 /net/users/daniel/foo
$ cat /net/users/daniel/foo
Killed

--- [2]

BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
IP: [<ffffffff8115036e>] nameidata_to_filp+0x1e/0x70
PGD 840d3067 PUD 841eb067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 4
Modules linked in: usb_storage nls_iso8859_1 nls_cp437 vfat fat
binfmt_misc rfcomm sco bnep l2cap kvm_intel kvm microcode btusb
bluetooth arc4 ecb uvcvideo iwlagn videodev iwlcore v4l1_compat
v4l2_compat_ioctl32 mac80211

Pid: 20644, comm: perf Tainted: G        W  2.6.34-rc2-317cd #2
0KM426/Studio 1557
RIP: 0010:[<ffffffff8115036e>]  [<ffffffff8115036e>] nameidata_to_filp+0x1e/0x70
RSP: 0018:ffff88008fd31d38  EFLAGS: 00010292
RAX: ffff88012c1825a0 RBX: fffffffffffffff3 RCX: 0000000000000041
RDX: 0000000000008000 RSI: 0000000000000024 RDI: ffff88008fd31e18
RBP: ffff88008fd31d48 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000008000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88012f8c3180
FS:  00007fd781585700(0000) GS:ffff880010800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000000b CR3: 00000000842a3000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process perf (pid: 20644, threadinfo ffff88008fd30000, task ffff88012c1825a0)
Stack:
 ffff88008fd31e18 ffff88008fd31e18 ffff88008fd31da8 ...
From: Al Viro
Date: Monday, March 29, 2010 - 12:03 pm

Joy...  ERR_PTR(-EPERM) in nd.intent.file, and whoever had called
lookup_instantiate_filp() hadn't bothered to check the return value.

OK, I think I see what's going on.  Replace
                                lookup_instantiate_filp(nd, (struct dentry *)state, NULL);
                                return 1;
with
                                lookup_instantiate_filp(nd, (struct dentry *)state, NULL);
                                return state;
in fs/nfs/nfs4proc.c:nfs4_open_revalidate() and see if everything works
properly (or just lose the lookup_instantiate_filp() in there and simply
return state).

--

From: Daniel J Blueman
Date: Monday, March 29, 2010 - 12:21 pm

Hi Al,


That did the trick!

Looks like I should have reported this when I first encountered it,
but I guess it goes to show there needs to be some NFS validation
(LTP?). I'm still trying to chase down a rare NFSv4 dentry "Stale NFS
file handle" issue I've seen.

Thanks,
  Daniel
-- 
Daniel J Blueman
--

From: Trond Myklebust
Date: Monday, March 29, 2010 - 2:22 pm

So this raises a point. Originally, the d_revalidate() call was required
to return a boolean 0 or 1. Nowadays it allows the filesystem to return
an error value instead.

Should we therefore rewrite the NFS implementation to propagate errors
like ESTALE (when it means the parent directory is gone), EACCES, EPERM
and EIO instead of the current behaviour of just dropping the dentry and
hence forcing a lookup?

Trond

--

From: Daniel J Blueman
Date: Wednesday, March 31, 2010 - 4:20 am

On Mon, Mar 29, 2010 at 10:22 PM, Trond Myklebust

Passing the error back without forcing a lookup sounds like a good
win, if it can avoid a comparatively expensive roundtrip to the server
(iff the dentry is fresh enough). Is this possible?

Talking of expensive, I see latencytop show >16000ms latency for
writing pages when I have a workload that does large buffered I/O to
an otherwise uncongested server. The gigabit network is saturated, and
reads often stall for 1000-4000ms (!). Client has the default 16 TCP
request slots, and server has 8 nfsds - the server is far from disk or
processor-saturated. I'll see if there is any useful debugging I can
get about this.

Thanks,
  Daniel
-- 
Daniel J Blueman
--

From: Chuck Lever
Date: Wednesday, March 31, 2010 - 7:21 am

That latency is pretty much guaranteed to be due to a long RPC backlog 
queue on the client.  Bumping the size of the slot table to 128 and 
increasing the number of NFSD threads may help.

-- 
chuck[dot]lever[at]oracle[dot]com
--

From: Daniel J Blueman
Date: Friday, April 2, 2010 - 3:34 am

Increasing these values did help quite a bit, though I was still
seeing 5000-8000ms at nfs_wait_bit_uninterruptible()  [1] and close().
Then again, I also was seeing 'Scheduler waiting for cpu' taking up to
3000ms! I suspect processor throttling, due to exceeding thermal
limits.

Thanks,
  Daniel

--- [1]

nfs_wait_bit_uninterruptible
nfs_wait_on_request
nfs_wait_in_requests_locked
nfs_sync_mapping_wait
nfs_write_mapping
nfs_wb_nocommit
nfs_getattr
vfs_getattr
vfs_fstat
sys_newfstat
system_call_fastpath
-- 
Daniel J Blueman
--

Previous thread: [PATCH] dma-mapping: Remove WARN_ON in dma_free_coherent by Stefan Bader on Monday, March 29, 2010 - 11:09 am. (4 messages)

Next thread: [git pull] m68k updates for 2.6.34 by Geert Uytterhoeven on Monday, March 29, 2010 - 11:52 am. (2 messages)