On Fri, Nov 16, 2007 at 04:25:38PM -0800, Abhishek Rai wrote:Well, also I suggested that if the metacluster region is full, that it attempt to find a block starting at end of the metacluster region and then wrap around, instead of starting at the beginning of the block group. That way it's more likely that subsequent metadata block is nearer to the previous metadata blocks. The practice of starting search in the next block block in the metadata area only makes a difference for one indirect block, yes, but it's the right thing to do. And if you fold the ext3_new_blocks and ext3_new_indirect_blocks(), it's really not that hard. You can basically do something like this: if (alloc_for_metadata) strategy = 0x132; else strategy = 0x231; for (; strategy; strategy = strategy >> 8) { switch (strategy & 0xF) { case 1: start = block_group_start; end = mc_start - 1; break; case 2: start = mc_start; end = mc_end; break; case 3: start = mc_end + 1; end = block_group_end; break; } <search region between start.. end> } Allocating a superblock field is no big deal. I'll note further that metaclustering is not necessarily mutually exclusive with ext4 extents. Allocating the extent tree data blocks out of the metacluster blocks can be a good idea, depending on the average size of the blocks and how fragmented the filesystem gets (and hence how many contiguous extents can be expected). If the filesystem is storing lots of really big files where being contiguous across multiple blockgroups are productive, then the metacluster area would actually be counterproductive. And if files are all small so the extents fit the inode, the metadata cluster area wouldn't be necessary at all. But if there are multiple external extent blocks in a block group, it would be useful for them to be allocated together. Yes, it doesn't make sense to retune the filesystem. I was assuming that this would only be done at mke2fs time. I'm not sure I understand your concern. The reality is that 99% of the time users will never change it from the defaults, but making it tunable makes it much, much easier for us to try various experiments to determine what is the best initial value for different workloads. What might get used for a Usenet news spool or a Squid cache might be quite different from series of DVD image files. That is clever. Oh, one other thing. You didn't mention what happened when the metacluster field was placed at the end of the block group. I assume you tried that in your experiments; what were the results? The obvious thing to do to avoid further fragmentation of the block group would be to put level 1 at the end of the block group, level 2 just before it, and level 3 before that, and then allocate the data blocks starting at the beginning of the block group, i.e: +----------------------------------+---------------+---------+-------+ | data | level 3 | level 2 | lvl 1 | +----------------------------------+---------------+---------+-------+ Ideally, true, but this was a defect with the original metacluster scheme as well. We could steal some bits in the block_group descriptor structure to indicate whether a particular level is full, though. This would be another data format change that would require e2fsprogs support, though. Regards, - Ted -
| H. Peter Anvin | Re: [RFC 00/15] x86_64: Optimize percpu accesses |
| Greg Kroah-Hartman | [PATCH 008/196] Chinese: add translation of volatile-considered-harmful.txt |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Alex Chiang | [PATCH 1/4] Remove path attribute from sgi_hotplug |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [GIT]: Networking |
| Eric Dumazet | Re: [PATCH 3/3] Convert the UDP hash lock to RCU |
