I currently have the dentry to carry the name of the file* class it is linked to. I'd prefer to keep it that way, unless there are huge factors against. - Davide -
[ Al Viro added to Cc: as the arbiter of good taste in the VFS layer. He has veto powers even over my proposals ;^] I assume that the *only* reason for having multiple dentries is really just the output in /proc/<pid>/fd/, right? Or is there any other reason to have separate dentries for these pseudo-files? It's a bit sad to waste that much memory (and time) on something like that. I bet that the dentry setup is a noticeable part of the whole sigfd()/timerfd() setup. It's likely also a big part of any memory footprint if you have lots of them. So how about just doing: - do a single dentry - make a "struct file_operations" member function that prints out the name of the thing in /proc/<pid>/fd/, and which *defaults* to just doing the d_path() on the dentry, but special filesystems like this could do something else (like print out a fake inode number from the "file->f_private_data" information) There seems to really be no downsides to that approach. No existing filesystem will even notice (they'll all have NULL in the new f_op member), and it would allow pipes etc to be sped up and use less memory. Linus -
I'm OK with everything that avoid code duplication due to those fake inodes. The change can be localized inside the existing API, so it doesn't really affect me externally. - Davide -
Can you try with the first patch version that doesn't do anything special at all, and just uses a single dentry. Yeah, the dentry name will be identical, and so you would see something like "7 -> signalfd:signalfd" when you do "ls -l /proc/<pid>/fd/" on a task that has such a special file descriptor (with no way to tell different timerfd's and signalfd's apart), but I think it's better to start off simple than to overdesign things. And trying to tell them apart sounds a bit like overdesign, if only because I really don't see why anybody would really *care*. So it's a timer for poll/select/epoll - why care about anything else? If it really really turns out that people care, we know how we can do it. We'd hook into "proc_fd_link()" and we'd allow a per-file mntget/dget that we could use to let special filesystems create fake entries on demand. So it's not impossible, it's just likely simply not needed. Linus -
The code would not change/shrink much with the single dentry. We'd save memory, and we'd lose the capability of seeing aino:[CLASS]. Both ways are fine with me. - Davide -
I would definitly *love* saving dentries for pipes (and sockets too), but how are you going to get the inode ? pipes()/sockets() can use read()/write()/rw_verify_area() and thus need file->f_path.dentry->d_inode (so each pipe needs a separate dentry) Are you suggesting adding a new "struct file_operations" member to get the inode ? Or re-intoducing an inode pointer in struct file ? -
I was not planning to touch anything but epoll, signalfd and timerfd Currently, they use a single inode, and multiple dentries (to give the name of the class). But this could be changed to a single dentry like Linus was suggesting. I'll wait for Al's reply before doing anything. Memory saving can be something, on top of the already big one of avoiding code duplication. - Davide -
Crazy ideas : (some readers are going to kill me) 1) Use the low order bit of f_path.dentry to say : this pointer is not a pointer to a dentry but the inode pointer (with the low order bit set to 1) OR 2) file->f_path.dentry set to NULL for this special files (so that we dont need to dput() and cache line ping pong the common dentry each time we __fput() a pipe/socket. Same trick could be used for file->f_path.mnt, because there is a big SMP cache line ping/pong to maintain a mnt_count on pipe/sockets mountpoint while these file systems cannot be un-mounted) If dentry is NULL, we get the inode pointer from an overlay of struct file_ra_state f_ra; (because for this special files readahead is unused) This adds some conditional branches of course, but being able to save ram and better use cpu caches might be worth them. -
No way on either one. f_path.dentry always beeing there is an assumption we make all over the place, and changing that would be a big regression for code qualityand reliability all over the place. Face it folks, memory is generally cheap, and we're not going to uglify huge amounts of code to shave of a little bit. [and that is only in reply to this one, the single dentry optimizations Same thing as above. We might do a hack to not refcount these vfsmounts, but we definitively want to keep the invariant of f_path.mnt never beeing NULL. -
First off, as noted earlier, you don't need crazy ideas. No, we don't want to do that. There's a lot of code that wants to just follow the dentry/inode chain unconditionally, and we want to have people able to just do i_size_read(file->f_path.dentry->d_inode); kind of thing, without having to check anything in between. It's costly enough that we have a chain of pointers, it's much *worse* if we then have I agree abot the ping pong, but I suspect it's a lot less than the current pingpong on dcache_lock, which is a lot hotter than a pipe-dentry counter would be. If it becomes a big issue, we could just have a fixed number of dentries, and hash them out some way (to dilute the ping-ponging). So making things NULL would just be horrible for everybody, since we have lots of generic code that just looks up the dentry and inode, and we don't want to make that worse. (I'm sure using "filp->f_private_data" has some downsides too, but it seems fairly simple at least for pipes. The biggest problem is likely that we currently use the mutex in the inode in "filp->f_dentry->d_inode" as a way to protect the inode->i_pipe pointer itself, and there is no equivalent locking at the "struct file *" level at all. We might have to make the "f_ep_lock" spinlock be unconditional) Linus -
No, at least pipes could easily just use "file->f_private_data" instead. Now, sockets really do want the inode (or it would be really really big changes), but pipes really just want a "struct pipe_inode_info" pointer, which we could hide away directly in the file descriptor itself. That's what Davide already did (on my suggestion) for signalfd - there's a *single* inode, and the real data is in the per-fd f_private_data. Linus -
sockets already uses file->private_data. But calls to read()/write() (not send()/recv()) still need to go through the Please find enclosed the following patch, to prepare this path. [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to be able to provide a dentry name for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen) 2) Use this new method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... We also avoid mntput()/mntget() on sock_mnt 3) Use this new method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... We also avoid mntput()/mntget() on pipe_mnt A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> fs/dcache.c | 3 +++ fs/pipe.c | 16 +++++++++++----- include/linux/dcache.h | 1 + net/socket.c | 15 +++++++++++---- 4 files changed, 26 insertions(+), 9 deletions(-)
(resending with a more convenient attachment) Please find enclosed the following patch, to prepare this path. [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method calle= d=20 d_dname() might be called from d_path() to be able to provide a dentry name= =20 for special filesystems. It is called without locks. =46uture patches (if we succeed in having one common dentry for all pipes) = may=20 need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen) 2) Use this new method for sockets : No more sprintf() at socket creation.= =20 This is delayed up to the moment someone does an access to /proc/pid/fd/... We also avoid mntput()/mntget() on sock_mnt 3) Use this new method for pipes : No more sprintf() at pipe creation. This= is=20 delayed up to the moment someone does an access to /proc/pid/fd/... We also avoid mntput()/mntget() on pipe_mnt A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a= =20 *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> =C2=A0fs/dcache.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A03= +++ =C2=A0fs/pipe.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 16= +++++++++++----- =C2=A0include/linux/dcache.h | =C2=A0 =C2=A01 + =C2=A0net/socket.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 15 +++++++++= ++---- =C2=A04 files changed, 26 insertions(+), 9 deletions(-)
Sure. The dentry and the inode need to *exist*, but they can be one single static dentry/inode per "file descriptor type". We always pass in the "struct file *" to read/write too, since we need it anyway for things like file control information (eg "is it a nonblocking read or write" kinds of things). So I'm not suggesting a NULL dentry/inode, I'm suggesting a single static one per type. And yeah, it may be harder than it looks. Some things "know" that all the relevant info is in the inode, so they just pass in the inode. In the pipe layer, for example, you'd need to change free_pipe_info() and alloc_pipe_info() to pass in the file descriptor instead, same goes for pipe_release(). But the "struct file *" is always available, it's just that since the code was originally written to have all the info in the inode, some of the code isn't set up to use it or pass it on.. But your patch is independent of that, and looks fine. Except I don't like this part: - file->f_path.mnt = mntget(sock_mnt); + file->f_path.mnt = NULL; since I'd be much happer with always having f_path.mnt available, the same way we should always have f_path.dentry there. (Btw, your patch is *not* going to work with the file->f_private_data approach, because d_path() is not passed down the "file *" thing. So we'd need to do that, and that's more intrusive (it can be NULL, since for things like cwd/pwd we don't have a "struct file"). But I like your patch as a totally independent thing. "It just makes sense". (Apart from the f_path.mnt thing, which I think was something else ;) Linus -
Yes, but mntget()/mntput() are protected against NULL. I was quite happy to remove two locked operations :) I tried this path today and failed... Too many changes to do (nameidata) to propagate a 'struct file *' OK no problem here is the patch without messing f_path.mnt (benchmark results not really different on my little machine, SMP kernel but one CPU only... maybe because lock suffix is changed by a nop) [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to be able to provide a dentry name for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen) 2) Use this new method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 3) Use this new method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> fs/dcache.c | 3 +++ fs/pipe.c | 12 +++++++++--- include/linux/dcache.h | 1 + net/socket.c | 13 ++++++++++--- 4 files changed, 23 insertions(+), 6 deletions(-)
All right. I like it. Definitely worth putting into -mm, or just re-sending to me after 2.6.21 is out (I'll forget all about it otherwise). I have one more worry, namely this:: - char name[32]; - this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); - this.name = name; + this.len = 0; + this.name = NULL; I think that's fine, and it *happens* to work, but it happens to work just because then d_alloc() will do: memcpy(dname, name->name, name->len); dname[name->len] = 0; and passing in NULL to memcpy() is generally ok when len is 0. HOWEVER. Not only might memcpy() do a "prefetch for read" on the source for some architectures (which in turn may end up being slow for an address that isn't in the TLB, like NULL), but you depend on a very much internal detail, since it *could* have been using something like memcpy(dname, name->name, name->len+1); instead, and expected to get the final '\0' character from the source string. So I would actually much prefer it to be written as this.len = 0; this.name = ""; just because it's safer. But other than that small detail, I think this is not only an optimization, it's an actual cleanup, and we migth some day want to use something like this for some other things too (eg maybe this kind of approach is usable for /proc/<pid> entries too, to avoid instantiating them). As to avoiding the mntget(), I'm not violently opposed to it, but I do think that it's a totally unrelated matter, so even if it's decided it's worth it, it should be done as a separate patch regardless. Linus -
Well, I hope a prefetch(NULL) is OK because we are doing millions of them (see
Yes very true, I will change that and push to Andrew for mm testing
I was thinking about being able to cache the name into the dentry, do you
think it's worth the pain ? (its not SMP safe for example...)
static char * pipefs_dname(struct dentry *dentry, char *buffer, int buflen)
{
if (!dentry->d_iname[0])
snprintf(dentry->d_iname,
DNAME_INLINE_LEN_MIN,
"pipe:[%lu]",
dentry->d_inode->i_ino);
strlcpy(buffer, dentry->d_iname, buflen);
return buffer;
}
-
Actually, it *can* be SMP-safe, if you do it right. Something like
len = dentry->d_name.len;
if (!len) {
len = snprintf(dentry->d_iname,
DNAME_INLINE_LEN_MIN,
"pipe:[%lu]",
dentry->d_inode->i_ino);
smp_wmb();
dentry->d_name.len = len;
}
if (len >= buflen)
len = buflen-1;
memcpy(buffer, dentry->d_iname, len);
buffer[len] = 0;
return buffer;
should work, although it depends on the fact that our snprintf()
implementation should be "stable" (ie if snprintf() modifies the buffer
temporarily as it goes along, that would break, but I think our
vsnprintf is good in that respect).
So you could have two different CPU's doing the snprintf() on the same
buffer at the same time (and assigning the length at the same time), but
since they'll write the same thing, you don't really care.
It's a bit subtle, though. And probably not really worth it.
Linus
-
Funny you mention this. We found some noticeable ppc64 regressions when
moving the dcache to standard list macros and had to do this to fix it
up:
static inline void prefetch(const void *x)
{
if (unlikely(!x))
return;
__asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
}
Urgh :)
Anton
-
Yeah, I'm not at all surprised. Any implementation of "prefetch" that doesn't just turn into a no-op if the TLB entry doesn't exist (which makes them weaker for *actual* prefetching) will generally have a hard time with a NULL pointer. Exactly because it will try to do a totally unnecessary TLB fill - and since most CPU's will not cache negative TLB entries, that unnecessary TLB fill will be done over and over and over again.. In general, using software prefetching is just a stupid idea, unless - the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure that you don't try to prefetch the non-existent end entry) AND - the CPU is stupid (in-order in particular). I think Intel even suggests in their optimization manuals to *not* do software prefetching, because hw can usually simply do better without it. Linus -
Data prefetch instructions should indeed avoid page table walks. (Instruction prefetch mechanisms often do induce table walks on ITLB miss.) Not just because of the null pointer case, but because it's quite normal to run off the end of an array in a loop with an embedded prefetch instruction. If you have an extra instruction issue unit that shares the same DTLB, and you know you will really want that data, you can sometimes use it to force DTLB preloads by doing an actual data fetch from the foreseeable page. This is potentially one of the best uses of chip multi-threading on an architecture like Sun's Niagara. (I don't think Intel's hyper-threading works for this purpose; the DTLB is shared but the entries are marked as owned by one thread or the other. HT can be used for L2 cache prefetching, although the results so far seem to be mixed: Not the XScale -- it performs quite poorly without prefetch, as people who have run ARMv5-optimized binaries on it can testify. From the XScale Core Developer's Manual: <quote> The Intel XScale(r) core has a true prefetch load instruction (PLD). The purpose of this instruction is to preload data into the data and mini-data caches. Data prefetching allows hiding of memory transfer latency while the processor continues to execute instructions. The prefetch is important to compiler and assembly code because judicious use of the prefetch instruction can enormously improve throughput performance of the core. Data prefetch can be applied not only to loops but also to any data references within a block of code. Prefetch also applies to data writing when the memory type is enabled as write allocate The Intel XScale(r) core prefetch load instruction is a true prefetch instruction because the load destination is the data or mini-data cache and not a register. Compilers for processors which have data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache. This technique has the disadvantages ...
Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-byte cacheline and has no serializing effects whatsoever, making it ideal for use in a linked-list-traversal inner loop. Cheers, Kyle Moffett -
OK, 200 cycles... But what is the cost of the conditional branch you added in prefetch(x) ? if (!x) return; (correctly predicted or not, but do powerPC have a BTB ?) About the NULL 'potential problem', maybe we could use a dummy nil (but mapped) object, and use its address in lists, ie compare for &nil instead of NULL. This would avoid : - The conditional test in some prefetch() implementations - The potential TLB problem with the NULL value. -
You avoid those two, but add the very high likelyhood that a statement of= the form 'if (=21x) =7B....=7D' will creep back in and bug out. I doubt that= changing the form of a basic C idiom to save a few cycles is worth it, especially = when the (=21x) form can be tested without a memory fetch, but (x =21=3D nil) = may require fetching 'nil' (remember - the x86 is a very popular chipset, but registe= r starved - if the loop is any size at all, 'nil' may require a reload each= time around, costing you the win you thought you had....
Well, PowerPC "dcbt" does prefetch() correctly, it doesn't ever raise exceptions, doesn't have any side effects, takes only enough CPU to decode the address, and is ignored if it would have to do anything other than load the cacheline from RAM. Prefetch streams are halted when they reach the end of a page boundary (no trapping to the MMU) and if the TLB entry isn't present then they would asynchronously abort. This is a suitable prefetch implementation for G5: #define prefetch(x) __asm__ __volatile__ ("dcbt 0, %0"::"r"(x)); On GCC4+ the "__builtin_prefetch" built-in functions is probably mildly better tuned for most architectures: #define PREFETCH_RD 0 #define PREFETCH_WR 1 #define PREFETCH_LOCALITY_NONE 0 #define PREFETCH_LOCALITY_LOW 1 #define PREFETCH_LOCALITY_MED 2 #define PREFETCH_LOCALITY_HIGH 3 #define prefetch(x, args...) __builtin_prefetch(x ,##args) Cheers, Kyle Moffett -
It depends on the implementation and the HID bit settings. Some do walk the MMU hashtable if it isnt in the TLB. Anton -
Much less than the tablewalk. On ppc64 a tablewalk of an address that is not populated in the hashtable will incur 2 cacheline lookups (primary and secondary buckets). This plus the MMU state machine overhead adds up. Cue Linus rant about PowerPC MMU :) Anton -
No, I just checked, and Intel's own optimization manual makes it clear that you should be careful. They talk about performance penalties due to resource constraints - which makes tons of sense with a core that is good at handling its own resources and could quite possibly use those resources better to actually execute the loads and stores deeper down the instruction pipeline. So it's not just 3DNow! making AMD look bad, or Intel would obviously Blah. XScale isn't even an OoO CPU, *of*course* it needs prefetching. Calling that "getting it right" is ludicrous. If anything, it gets things so wrong that prefetching is *required* for good performance. I'm talking about real CPU's with real memory pipelines that already do prefetching in hardware. The better the core is, the less the prefetch helps (and often the more it hurts in comparison to how much it helps). But if you mean "doesn't try to fill the TLB on data prefetches", then I just suspect that the upside for Core 2 Due is likely fairly low. The L2 cache is good, the memory re-ordering is working.. I doubt "prefetch" helps in generic code that much for things like linked list following, you should probably limit it to code that has *known* access patterns and you know it's not going to be in the cache. (In other words, I bet prefetching can help a lot with MMX/media kind of code, I doubt it's a huge win for "for_each_entry()") Linus -
Certainly you should be careful -- and that usually means leaving it up to the compiler. But hinting to the compiler can help; there may be an analogue of the (un)likely macros waiting to be implemented for loop prefetch. And while out-of-order execution and fancy hardware prefetch streams greatly reduce the need for explicit prefetch in general code, there's no substitute for the cache _bypassing_ instructions when trying to avoid excessive cache eviction in DDoS situations. For instance, if I wind up working on a splay-tree variant of Robert Olsson's trie/hash work, I'll try to measure the effect of using SSE2 non-temporal stores to write half-open connections to the leaves of the tree. That may give some additional improvement in the ability to Intel puts a lot of effort into educating compiler writers about the optimal prefetch insertion strategies for particular cache architectures. At the same time, they put out the orange cones to warn people off of hand-tuning prefetch placement using micro-benchmarks. People did that when 3DNow! first came out, with That's not an accident. Hardware prefetch units cost a lot in power consumption. Omitting the hardware prefetch unit and drastically simplifying the pipeline is how they got a design whose clock they could crank into the stratosphere and still run on battery power. And in the network processor space, they can bolt a myriad of on-chip microengines and still have some prayer of accurately simulating the patterns of internal bus cycles. Errors in simulation can still be fixed up with prefetch instruction placement to put memory accesses from the XScale core into phases where the data path processors aren't working so hard. Moreover, because they're embedded targets and rarely have to run third-party binaries originally compiled for older cores, it didn't really cost them anything to say, "Sorry, this chip's performance is going to suck if your compiler's prefetch insertion isn't properly tuned." The _only_ ...
Yeah this is exactly what we were seeing :) Anton -
This patch needs a lot more documentation. It needs some really big comments on why this should never ever be used for a real filesystem (real as in user mountable), and probably add an assert for that invariant somewhere. Please also update Documentation/filesystems/Locking and Documentation/filesystems/vfs.txt for it. -
Thank you for your feedback Christoph Here is the version I was about to push to -mm, so if nobody complains, I ask Andrew to push it to mm so that it can reach 2.6.22 target [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen) 2) Use this new method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 3) Use this new method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Christoph Hellwig <hch@infradead.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Davide Libenzi <davidel@xmailserver.org> CC: Al Viro <viro@zeniv.linux.org.uk> Documentation/filesystems/Locking | 2 ++ Documentation/filesystems/vfs.txt | 12 +++++++++++- fs/dcache.c | 3 +++ fs/pipe.c | 12 +++++++++--- include/linux/dcache.h | 1 + net/socket.c | 13 ++++++++++--- 6 files changed, 36 insertions(+), 7 deletions(-)
Please don't put braces around the function pointer call. Also I normally this would be: Can you add a helper to init this one? -
Thanks again for your feedback Christoph :) I think I addressed all your remarks. Thank you [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen) 2) Use this new method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 3) Use this new method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Christoph Hellwig <hch@infradead.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Davide Libenzi <davidel@xmailserver.org> CC: Al Viro <viro@zeniv.linux.org.uk> Documentation/filesystems/Locking | 2 ++ Documentation/filesystems/vfs.txt | 12 +++++++++++- fs/dcache.c | 7 +++++++ fs/pipe.c | 15 +++++++++------ include/linux/dcache.h | 1 + net/socket.c | 17 ++++++++++------- 6 files changed, 40 insertions(+), 14 deletions(-)
It's not _some_ filesystems. If real filesystem did this we'd be in horrible trouble. It's "synthetic, non-mountable" filesystem. Except for this the patch looks fine to me, thanks Eric! -
Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Except you should fix the subject line when you send it out to Andrew ;) Linus -
Hi Andrew Could you please put this final version in mm for testing ? Thank's to all contributors. [PATCH] VFS : Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes/sockets) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen); 2) Use this new method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 3) Use this new method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Documentation/filesystems/Locking | 2 ++ Documentation/filesystems/vfs.txt | 12 +++++++++++- fs/dcache.c | 10 ++++++++++ fs/pipe.c | 15 +++++++++------ include/linux/dcache.h | 1 + net/socket.c | 17 ++++++++++------- 6 files changed, 43 insertions(+), 14 deletions(-)
Hi Andrew I am sorry, my previous patch had a /proc/*/fd/ in a comment, so the */ clo= sed=20 the comment and fs/dcache.c could not compile. Could you please put this 'final-final' version in mm for testing ? Thank's to all contributors, sorry for the noise. [PATCH] VFS : Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method calle= d=20 d_dname() might be called from d_path() to build a pathname=20 for special filesystems. It is called without locks. =46uture patches (if we succeed in having one common dentry for all=20 pipes/sockets) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen); 2) Use this new method for sockets : No more sprintf() at socket creation.= =20 This is delayed up to the moment someone does an access to /proc/pid/fd/... 3) Use this new method for pipes : No more sprintf() at pipe creation. This= is=20 delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a= =20 *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> =C2=A0Documentation/filesystems/Locking | =C2=A0 =C2=A02 ++ =C2=A0Documentation/filesystems/vfs.txt | =C2=A0 12 +++++++++++- =C2=A0fs/dcache.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 | =C2=A0 10 ++++++++++ =C2=A0fs/pipe.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 15 +++++++++------ =C2=A0include/linux/dcache.h =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2= =A0 =C2=A01 + =C2=A0net/socket.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0| =C2=A0 17 ++++++++++------- =C2=A06 files changed, 43 ...
Hi Andrew
Please find a new version of this patch : I realized d_path() has very
uncommon semantic (it seems nobody caught the point in previous patches), and
had to change the documentation and pipefs_dname() / sockfs_dname()
accordingly.
Now, readlink("/proc/pid/fd/xx", buffer, 4096) returns the exact number of
bytes, not the whole 4095 bytes :)
Extract of new Documentation:
CAUTION : d_path() logic is quite tricky.
The correct way to return for example "Hello" is to put it
at the end of the buffer, and returns a pointer to the first char.
Example :
static char *somefs_dname(struct dentry *dent, char *buffer, int buflen)
{
char *string = "Hello";
int sz = strlen(string) + 1;
if (sz > buflen)
return ERR_PTR(-ENAMETOOLONG);
buffer += (buflen - sz);
return memcpy(buffer, string, sz);
}
Thank you
[PATCH] VFS : Delay the dentry name generation on sockets and pipes.
1) Introduces a new method in 'struct dentry_operations'. This method called
d_dname() might be called from d_path() to build a pathname
for special filesystems. It is called without locks.
Future patches (if we succeed in having one common dentry for all
pipes/sockets) may need to change prototype of this method, but we now use :
char *d_dname(struct dentry *dentry, char *buffer, int buflen);
2) Use this new method for sockets : No more sprintf() at socket creation.
This is delayed up to the moment someone does an access to /proc/pid/fd/...
3) Use this new method for pipes : No more sprintf() at pipe creation. This is
delayed up to the moment someone does an access to /proc/pid/fd/...
A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :
3.090 s instead of 3.450 s
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
...Yeah, it's subtle, since it wants to use a single buffer, and not copy
things around too much.
But can I ask you to do a take3, and simply have a helper function like
char *dynamic_dname(struct dentry *dentry, char *buffer, int len,
const char *fmt, ...)
{
va_list args;
char temp[64];
int i;
va_start(args, fmt);
i = vsnprintf(tmp,sizeof(tmp),fmt,args) + 1;
va_end(args);
if (i > len)
return ERR_PTR(-ENAMETOOLONG);
buffer += len - i;
memcpy(buffer, tmp, i);
return buffer;
}
and just require that everybody use that function.
Then the pipe code would just become
static char *pipefs_dname(struct dentry *dentry, char *buffer, int buflen)
{
return dynamic_dname(dentry, buffer, buflen, ""pipe:[%lu]", dentry->d_inode->i_ino);
}
and you're done, and you have only *one* place in the VFS layer
(preferably right next to d_path() itself) that cares about the
subtle issues that we have.
Linus
-
Hi Andrew Please find take3 of this patch : Linus suggested to introduce a helper function to factorize work done by most d_dname() implementations. Thank you [PATCH] VFS : Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes/sockets) may need to change prototype of this method, but we now use : char *d_dname(struct dentry *dentry, char *buffer, int buflen); 2) Adds a dynamic_dname() helper function that eases d_dname() implementations 3) Defines d_dname method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 4) Defines d_dname method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a *nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
