On Fri, 1 Dec 2006, Trond Myklebust wrote:'ls -al' cares about the stat() results, but does not care about the relative timing accuracy wrt the preceeding readdir(). I'm not sure why 'ls --color' still calls stat when it can get that from the readdir() results, but either way it's asking more from the kernel/filesystem than it needs. It sounds like you're talking about a single (asynchronous) client in a directory. In that case, the client need only flush if someone calls readdirplus() instead of readdir(), and since readdirplus() is effectively also a stat(), the situation isn't actually any different. The more interesting case is multiple clients in the same directory. In order to provide strong consistency, both stat() and readdir() have to talk to the server (or more complicated leasing mechanisms are needed). In that scenario, readdirplus() is asking for _less_ synchronization/consistency of results than readdir()+stat(), not more. i.e. both the readdir() and stat() would require a server request in order to achieve the standard POSIX semantics, while a readdirplus() would allow a single request. The NFS client already provibes weak consistency of stat() results for clients. Extending the interface doesn't suddenly require the NFS client to provide strong consistency, it just makes life easier for the implementation if it (or some other filesystem) chooses to do so. Consider two use cases. Process A is 'ls -al', who doesn't really care about when the size/mtime are from (i.e. sometime after opendir()). Process B waits for a process on another host to write to a file, and then calls stat() locally to check the result. In order for B to get the correct result, stat() _must_ return a value for size/mtime from _after_ the stat() initiated. That makes 'ls -al' slow, because it probably has to talk to the server to make sure files haven't been modified between the readdir() and stat(). In reality, 'ls -al' doesn't care, but the filesystem has no way to know that without the presense of readdirplus(). Alternatively, an NFS (or other distributed filesystem) client can cache file attributes to make 'ls -al' fast, and simply break process B (as NFS currently does). readdirplus() makes it clear what 'ls -al' doesn't need, allowing the client (if it so chooses) to avoid breaking B in the general case. That simply isn't possible to explicitly communicate with the existing interface. How is that not a win? I imagine that most of the time readdirplus() will hit something in the VFS that simply calls readdir() and stat(). But a smart NFS (or other network filesytem) client can can opt to send a readdirplus over the wire for readdirplus() without sacrificing stat() consistency in the general case. sage - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tim Tassonis | reiser4 for 2.6.27-rc1 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
| Jarek Poplawski | Re: [BUG] New Kernel Bugs |
git: | |
| Junio C Hamano | Re: Comments on recursive merge.. |
| Ken Pratt | Re: pack operation is thrashing my server |
| Junio C Hamano | [ANNOUNCE] GIT 1.5.4 |
| Chris Hoffman | git-daemon on Windows? |
| Peter Zijlstra | [BUG?] sendfile / distcc |
| KOSAKI Motohiro | [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Mark Lord | Re: 2.6.25-rc8: FTP transfer errors |
| Ilpo Järvinen | Re: [PATCH 2/4] tcpv6: trivial formatting changes to send_(ack|reset) |
| Richard Stallman | Real men don't attack straw men |
| Diana Eichert | OpenBSD on decTOP? |
| Jeff Ross | U320 Drive on U160 controller? |
| Sebastian Reitenbach | problems with hoststated and relayd |
