On Monday 21 July 2008 19:56, Theodore Tso wrote:I didn't realize it would be so difficult to detect telldir coming from NFSD. NFS is the entire reason for all this posturing, so if the strategy fails for NFS then it isn't worth spending any time on it. I wrote generic btree code for ddsnap with arbitrary depth and merge on delete, but was never able to justify the time to port it to HTree during my day job. Props to clusterfs (Andreas!) for stepping up to do this. No doubt a commercial imperative helped achieve focus there. But I will not use HTree for my own filesystem work, I will use a new thing called PHTree, which is a (P)hysically stable variant of HTree, obtained by inserting a new layer of index blocks between the index nodes and the dirent blocks, the "terminal" index blocks. Each terminal index block is populated with { hash : block } pairs, each of which indicates that there is a dirent in <block> with hash <hash>. This requires one extra lookup per operation which is regretable, but it solves the physical stability problem. Directory entries will just stay in the leaf blocks in which they are first created, and only index nodes will ever be split. Because leaf blocks will typically be full instead of 75% full in HTree, the space usage ends up about the same. A more efficient hash algorithm can be used, and deletion and other operations will tend to be faster due to less thrashing of the inode table. Free space management becomes an issue: when dirents are deleted we need to be able to re-use them for new entries. To support efficient detection of available dirents of appropriate size, I will use a lazy method of recording the maximum sized dirent available in any leaf block. The actual largest free direct may be smaller than that, which will be detected when a search fails, causing the lazy limit to be updated. We can safely skip searching for free space in any block for which the lazy max is less than the size we require. One byte is sufficient for the lazy max, so one 4K block is sufficient to keep track of 2^12 * 2^12 bytes worth of directory blocks, a 16 meg directory with about half a million entries. For larger directories this structure becomes a radix tree, with lazy max recorded also at each index pointer for quick free space searching. Mind you, I haven't coded this yet and it will be a good few months before I do, so I don't have performance numbers. I just thought I should mention it now, because I expect it to meet or beat HTree overall in performance while supporting brain damaged NFS semantics in a much nicer way. Hopefully we will find out for sure before Ext4 indexing becomes cast in stone. Regards, Daniel --
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| David Chinner | Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md. |
| Andrew Morton | -mm merge plans for 2.6.23 |
| Trent Piepho | Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
git: | |
| David Miller | Re: iptables very slow after commit784544739a25c30637397ace5489eeb6e15d7d49 |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
