On Nov 20, 2007 12:22 -0800, Mingming Cao wrote:My original thoughts on the design for this were slightly different: - that the per-directory reserved window would scale with the size of the directory, so that even (or especially) with htree directories the inodes would be kept in hash-ordered sets to speed up stat/unlink - it would be possible/desirable to re-use the existing block bitmap reservation code to handle inode bitmap reservation for directories while those directories are in-core. We already have the mechanisms for this, "all" that would need to change is have the reservation code point at the inode bitmaps but I don't know how easy that is - after an unmount/remount it would be desirable to re-use the same blocks for getting good hash->inode mappings, wherein lies the problem of compatibility One possible solutions for the restart problem is searching the directory leaf block in which an inode is being added for the inode numbers and try to use those as a goal for the inode allocation... Has a minor problem with ordering, because usually the inode is allocated before the dirent is created, but isn't impossible to fix (e.g. find dirent slot first, keep a pointer to it, check for inode goals, and then fill in dirent inum after allocating inode) One likely reason that the create dirs step is slower is that this is doing a lot more IO than in the past. Only a single inode in each inode table block is being used, so that means that a lot of empty bytes are being read and written (maybe 16x as much data in this case). Also, in what order are you creating files in the directories? If you are creating them in directory order like: for (f = 0; f < 15; f++) for (i = 0; i < 50000; i++) touch dir$i/f$f then it is completely unsurprising that directory reservation is faster at file create/unlink because those inodes are now contiguous at the expense of having gaps in the inode sequence. Creating 15 files per directory is of course the optimum test case also. How does this patch behave with benchmarks like dbench, mongo, postmark? Note that mballoc already creates an in-memory struct for each group. I think the initialization of this should be moved outside of mballoc so that it can be used for other purposes as you propose. Eric had a benchmark where creating many files/subdirs would cause a huge slowdown because of bitmap searching, and having a per-group pointer with the first free inode (or last allocated inode might be less work to track) would speed this up a lot. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc. -
| Ingo Molnar | [patch 12/13] syslets: x86: optimized copy_uatom() |
| Greg Kroah-Hartman | [PATCH 017/196] aoechr: Convert from class_device to device |
| Yinghai Lu | Re: 2.6.26, PAT and AMD family 6 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
