Re: [RFC] readdir mess

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Al Viro <viro@...>
Cc: OGAWA Hirofumi <hirofumi@...>, <linux-fsdevel@...>, <linux-kernel@...>
Date: Tuesday, August 12, 2008 - 8:28 pm

On Wed, 13 Aug 2008, Al Viro wrote:

The really sad part is that readdir() really is also the thing that should 
make us change locking. That i_mutex thing is fine and dandy for 
everything else, but for readdir() we really would be much better off with 
a rwsem - and only take it for reading.

Right now, readdir() is one of the most serialized parts of the whole 
kernel. Sad. And while it's a per-directory lock, there are directories 
that get much more reading than others, and this has been a small 
scalability issue (for samba and apache) for years.


The thing is, generic_file_llseek() takes i_mutex, exactly because of 
issues like this. Of course, you have to ask for it (the _default_ llseek 
does not do it), and you're right that 9p does not.

Strangely enough, at least 9p _does_ use it for regular files. I'm not 
sure how come it decided to do that, but whatever.


The reason ext2 is ok is that you long long ago fixed it to use the page 
cache. That got rid of a _lot_ of the crap, and made all the IO look like 
regular files (including read-ahead etc). Ext2 _used_ to be the same crap 
that ext3 is.

I so wish that ext3 could do the same thing, but no. I still think it 
should be possible, but the whole journalling is designed for buffer 
heads.


I don't dispute at all that the readdir() thing is one of the weakest 
points of the whole VFS layer. And part of it is that there is no good 
caching helper for it at the VFS level, so we always end up having to do 
everything at the low-level filesystem level, and that invariably ends up 
being sh*t compared to the shared VFS routines.

I'm convinced that the reason we do well on most other filesystem accesses 
is exactly the fact that a filesystem basically has to be crazy to try to 
do their own version, and in many cases cannot really do it at all (eg you 
can't really even avoid using the dcache or the page cache and actually 
get any valid semantics).

But readdir() is the _one_ operation where the low-level filesystem still 
basically does it all itself. Which is why we can't fix locking, and why 
even simple changes are hard because it's not just complex code, it's 
complex code in 50+ filesystems with almost zero shared code!

			Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] readdir mess, Al Viro, (Tue Aug 12, 2:22 am)
Re: [RFC] readdir mess, OGAWA Hirofumi, (Tue Aug 12, 1:02 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 1:18 pm)
Re: [RFC] readdir mess, OGAWA Hirofumi, (Tue Aug 12, 3:45 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 4:05 pm)
Re: [RFC] readdir mess, Alan Cox, (Tue Aug 12, 5:47 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 6:20 pm)
Re: [RFC] readdir mess, Alan Cox, (Tue Aug 12, 6:10 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 4:59 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 5:24 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 5:54 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 6:04 pm)
Re: [RFC] readdir mess, J. Bruce Fields, (Wed Aug 13, 12:20 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 2:10 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 4:21 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 4:38 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 5:04 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 8:04 pm)
Re: [RFC] readdir mess, Jan Harkes, (Fri Aug 15, 1:06 am)
Re: [RFC] readdir mess, Linus Torvalds, (Fri Aug 15, 12:58 pm)
Re: [RFC] readdir mess, Al Viro, (Sun Aug 24, 6:10 am)
Re: [RFC] readdir mess, Linus Torvalds, (Sun Aug 24, 1:20 pm)
Re: [RFC] readdir mess, Al Viro, (Sun Aug 24, 3:59 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Sun Aug 24, 7:51 pm)
Re: [RFC] readdir mess, Al Viro, (Sun Aug 24, 9:33 pm)
Re: [RFC] readdir mess, Al Viro, (Sun Aug 24, 9:44 pm)
Re: [RFC] readdir mess, Al Viro, (Sun Aug 24, 7:03 am)
Re: [RFC] readdir mess, J. Bruce Fields, (Mon Aug 25, 12:16 pm)
Re: [RFC] readdir mess, Al Viro, (Fri Aug 15, 1:34 am)
Re: [RFC] readdir mess, Brad Boyer, (Wed Aug 13, 4:36 am)
Re: [RFC] readdir mess, Al Viro, (Wed Aug 13, 12:19 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 8:28 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 9:19 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 9:51 pm)
Re: [RFC] readdir mess, Linus Torvalds, (Tue Aug 12, 4:02 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 2:22 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 2:37 pm)
Re: [RFC] readdir mess, Al Viro, (Tue Aug 12, 3:24 pm)