Re: [PATCH] ext4: dir inode reservation V3

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Mingming Cao <cmm@...>
Cc: <coyli@...>, <linux-ext4@...>, <linux-kernel@...>
Date: Tuesday, November 20, 2007 - 10:09 pm

On Nov 20, 2007  12:22 -0800, Mingming Cao wrote:

My original thoughts on the design for this were slightly different:
- that the per-directory reserved window would scale with the size of
  the directory, so that even (or especially) with htree directories the 
  inodes would be kept in hash-ordered sets to speed up stat/unlink
- it would be possible/desirable to re-use the existing block bitmap
  reservation code to handle inode bitmap reservation for directories
  while those directories are in-core.  We already have the mechanisms
  for this, "all" that would need to change is have the reservation code
  point at the inode bitmaps but I don't know how easy that is
- after an unmount/remount it would be desirable to re-use the same blocks
  for getting good hash->inode mappings, wherein lies the problem of
  compatibility

One possible solutions for the restart problem is searching the directory
leaf block in which an inode is being added for the inode numbers and try
to use those as a goal for the inode allocation...  Has a minor problem
with ordering, because usually the inode is allocated before the dirent
is created, but isn't impossible to fix (e.g. find dirent slot first,
keep a pointer to it, check for inode goals, and then fill in dirent
inum after allocating inode)


One likely reason that the create dirs step is slower is that this is
doing a lot more IO than in the past.  Only a single inode in each
inode table block is being used, so that means that a lot of empty
bytes are being read and written (maybe 16x as much data in this case).

Also, in what order are you creating files in the directories?  If you
are creating them in directory order like:

	for (f = 0; f < 15; f++)
		for (i = 0; i < 50000; i++)
			touch dir$i/f$f

then it is completely unsurprising that directory reservation is faster
at file create/unlink because those inodes are now contiguous at the
expense of having gaps in the inode sequence.  Creating 15 files per
directory is of course the optimum test case also.

How does this patch behave with benchmarks like dbench, mongo, postmark?


Note that mballoc already creates an in-memory struct for each group.
I think the initialization of this should be moved outside of mballoc
so that it can be used for other purposes as you propose.

Eric had a benchmark where creating many files/subdirs would cause
a huge slowdown because of bitmap searching, and having a per-group
pointer with the first free inode (or last allocated inode might be
less work to track) would speed this up a lot. 

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] ext4: dir inode reservation V3, Coly Li, (Tue Nov 13, 10:12 am)
Re: [PATCH] ext4: dir inode reservation V3, Jan Kara, (Tue Nov 20, 11:58 am)
Re: [PATCH] ext4: dir inode reservation V3, Coly Li, (Tue Nov 20, 12:40 pm)
Re: [PATCH] ext4: dir inode reservation V3, Jan Kara, (Tue Nov 20, 12:44 pm)
Re: [PATCH] ext4: dir inode reservation V3, Mingming Cao, (Mon Nov 19, 10:01 pm)
Re: [PATCH] ext4: dir inode reservation V3, Coly Li, (Tue Nov 20, 12:14 am)
Re: [PATCH] ext4: dir inode reservation V3, Mingming Cao, (Tue Nov 20, 4:22 pm)
Re: [PATCH] ext4: dir inode reservation V3, Andreas Dilger, (Tue Nov 20, 10:09 pm)
Re: [PATCH] ext4: dir inode reservation V3, Alex Tomas, (Tue Nov 13, 10:09 am)
Re: [PATCH] ext4: dir inode reservation V3, Coly Li, (Tue Nov 13, 12:27 pm)
Re: [PATCH] ext4: dir inode reservation V3, Coly Li, (Tue Nov 13, 12:43 pm)