On Dec 05, 2006 10:23 -0500, Trond Myklebust wrote:Actually, wouldn't the ability for readdirplus() (with valid flag) be useful for NFS if only to indicate that it does not need to flush the cache because the ctime/mtime isn't needed by the caller? It does in any but the most simplistic invocations, like "find -mtime" or "find -mode" or "find -uid", etc. I guess I just don't understand how fadvise() on a directory file handle (used for readdir()) can be used to affect later stat operations (which definitely will NOT be using that file handle)? If you mean that the application should actually open() each file, fadvise(), fstat(), close(), instead of just a stat() call then we are WAY into negative improvements here due to overhead of doing open+close. Most clustered filesystems have strong cache semantics, so that isn't a problem. IMHO, the mechanism to pass the hint to the filesystem IS the readdirplus_lite() that tells the filesystem exactly which data is needed on each directory entry. Because in many cases it is desirable to limit the number of DLM locks on a given client (e.g. GFS2 thread with AKPM about clients with millions of DLM locks due to lack of memory pressure on large mem systems). That means a finite-size lock LRU on the client that risks being wiped out by a few thousand files in a directory doing "readdir() + 5000*stat()". Consider a system like BlueGene/L with 128k compute cores. Jobs that run on that system will periodically (e.g. every hour) create up to 128K checkpoint+restart files to avoid losing a lot of computation if a node crashes. Even if each one of the checkpoints is in a separate directory (I wish all users were so nice :-) it means 128K inodes+DLM locks for doing an "ls" in the directory. But it would still need 128K RPCs to get that information, and 128K new inodes on that client. And what is the chance that I can get a multi-threading "ls" into the upstream GNU ls code? In the case of local filesystems multi-threading ls would be a net loss due to seeking. But even for local filesystems readdirplus_lite() would allow them to fill in stat information they already have (either in cache or on disk), and may avoid doing extra work that isn't needed. For filesystems that don't care, readdirplus_lite() can just be readdir()+stat() internally. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Peter Zijlstra | Re: Problem with ata layer in 2.6.24 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
| Andi Kleen | Re: [patch] Add basic sanity checks to the syscall execution patch |
git: | |
| Johannes Schindelin | Re: git on MacOSX and files with decomposed utf-8 file names |
| Junio C Hamano | Re: [PATCH resend] make "git push" update origin and mirrors, "git push --mirror" ... |
| Morten Welinder | Re: [Census] So who uses git? |
| Steven Grimm | Segmentation fault in git-svn |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| frantisek holop | nptd regression in 4.2 |
| Josh Grosse | Re: Real men don't attack straw men |
| peter | ntpd not synching |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Dong Liu | Re: CXterm for LINUX |
| erc | HARDWARE COMPATIBILITY LIST |
| Douglas Graham | Re: Buggy omit-frame-pointer? |
