Hi all, The use model for openg() and openfh() (renamed sutoc()) is n processes spread across a large cluster simultaneously opening a file. The challenge is to avoid to the greatest extent possible incurring O(n) FS interactions. To do that we need to allow actions of one process to be reused by other processes on other OS instances. The openg() call allows one process to perform name resolution, which is often the most expensive part of this use model. Because permission checking is also performed as part of the openg(), some file systems to not require additional communication between OS and FS at openfh(). External communication channels are used to pass the handle resulting from the openg() call out to processes on other nodes (e.g. MPI_Bcast). dup(), openat(), and UNIX sockets are not viable options in this model, because there are many OS instances, not just one. All the calls that are being discussed as part of the HEC extensions are being discussed in this context of multiple OS instances and cluster file systems. Regarding the lifetime of the handle, there has been quite a bit of discussion about this. I believe that we most recently were thinking that there was an undefined lifetime for this, allowing servers to "forget" these values (as in the case where a server is restarted). Clients would need to perform the openg() again if they were to try to use an outdated handle, or simply fall back to a regular open(). This is not a problem in our use model. I've attached a graph showing the time to use individual open() calls vs. the openg()/MPI_Bcast()/openfh() combination; it's a clear win for any significant number of processes. These results are from our colleagues at Sandia (Ruth Klundt et. al.) with PVFS underneath, but I expect the trend to be similar for many cluster file systems. Regarding trying to "force APIs using standardization" on you (Christoph's 11/29/2006 message), you've got us all wrong. The standardization process is going to take some time, so we're starting on it at the same time that we're working with prototypes, so that we don't have to wait any longer than necessary to have these things be part of POSIX. The whole reason we're presenting this on this list is to try to describe why we think these calls are important and get feedback on how we can make these calls work well in the context of Linux. I'm glad to see so many people taking interest. I look forward to further constructive discussion. Thanks, Rob --- Rob Ross Mathematics and Computer Science Division Argonne National Laboratory Christoph Hellwig wrote:
| Mariusz Kozlowski | [PATCH 12] fs/reiser4/plugin/file/cryptcompress.c: kmalloc + memset conversion to ... |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Eric Paris | Re: [malware-list] [RFC 0/5] [TALPA] Intro to a linux interface for on access scan... |
| Pardo | Re: pthread_create() slow for many threads; also time to revisit 64b context switc... |
git: | |
| Aaron Bentley | Re: VCS comparison table |
| Ken Pratt | pack operation is thrashing my server |
| Jonas Fonseca | Re: First cut at git port to Cygwin |
| Ingo Molnar | [OT] Your branch is ahead of the tracked remote branch 'origin/master' by 50 commi... |
| Richard Stallman | Real men don't attack straw men |
| Richard Stallman | Re: Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Predrag Punosevac | Skype on the OpenBSD |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Rick Emerson | Re: [comp.os.linux]: Re: File system issues! |
| Doug Evans | Re: Stabilizing Linux |
| Dong Liu | Re: CXterm for LINUX |
