[PATCH] Ext4: Uninitialized Block Groups

Previous thread: Bogus high interrupt load? by Aggelos Manousarides on Tuesday, September 18, 2007 - 4:56 pm. (2 messages)

Next thread: [PATCH 1/2] UML - Fix registers.c build by Jeff Dike on Tuesday, September 18, 2007 - 6:31 pm. (1 message)
From: Avantika Mathur
Date: Tuesday, September 18, 2007 - 5:25 pm

from: Andreas Dilger <adilger@clusterfs.com>

In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, 
regardless of whether it is in use.  This is this the most time consuming part 
of the filesystem check.  The unintialized block group feature can greatly 
reduce e2fsck time by eliminating checking of uninitialized inodes.  

With this feature, there is a a high water mark of used inodes for each block 
group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.

The feature is enabled through a mkfs option

	mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has 
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance 
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows 
linearly with the total number of inodes in the filesytem.  In ext4 with the 
uninitialized block groups feature, the e2fsck time is constant, based 
solely on the number of used inodes rather than the total inode count.  
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users.  With performance improvement of 2-20 
times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.  

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
        Inode table is not initialized/used in this group. So we can skip
        the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
        No block in the group is used. So we can skip the block bitmap
        verification for this group.

We also add two new fields ...
From: Andrew Morton
Date: Tuesday, September 18, 2007 - 8:03 pm

That's rather sad.  A plain old "depends on" would be better.
-

From: Andreas Dilger
Date: Tuesday, September 18, 2007 - 11:30 pm

My bad.  We wrote this patch and it had to run on older kernels that might
not even have lib/crc16.c (it was added around 2.6.15 or so, so e.g. RHEL4
doesn't have it).  I forgot to remove it from the upstream submission,
and since it didn't cause problems nobody else complained...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-

From: Avantika Mathur
Date: Wednesday, September 19, 2007 - 3:54 pm

The incremental patch below removes the local crc16 code, and also resolves an
issue with properly updating bg_itable_unused when an inode is allocated in an 
unused block groups.

Thanks
Avantika

---
 fs/Kconfig       |    1 +
 fs/ext4/ialloc.c |    8 +++++++-
 fs/ext4/super.c  |   51 +--------------------------------------------------
 3 files changed, 9 insertions(+), 51 deletions(-)

Index: linux-2.6.23-rc6/fs/ext4/ialloc.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/ext4/ialloc.c	2007-09-19 15:38:21.000000000 -0700
+++ linux-2.6.23-rc6/fs/ext4/ialloc.c	2007-09-19 15:41:11.000000000 -0700
@@ -635,12 +635,18 @@
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
 		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-			free = EXT4_INODES_PER_GROUP(sb);
+			free = 0;
 		} else {
 			free = EXT4_INODES_PER_GROUP(sb) -
 				le16_to_cpu(gdp->bg_itable_unused);
 		}
 
+		/*
+		 * Check the relative inode number against the last used
+		 * relative inode number in this group. if it is greater
+		 * we need to  update the bg_itable_unused count
+		 *
+		 */
 		if (ino > free)
 			gdp->bg_itable_unused =
 				cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
Index: linux-2.6.23-rc6/fs/ext4/super.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/ext4/super.c	2007-09-19 15:38:21.000000000 -0700
+++ linux-2.6.23-rc6/fs/ext4/super.c	2007-09-19 15:38:51.000000000 -0700
@@ -37,6 +37,7 @@
 #include <linux/quotaops.h>
 #include <linux/seq_file.h>
 #include <linux/log2.h>
+#include <linux/crc16.h>
 
 #include <asm/uaccess.h>
 
@@ -1248,56 +1249,6 @@
 	return res;
 }
 
-#if !defined(CONFIG_CRC16)
-/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
-__u16 const crc16_table[256] = {
-	0x0000, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
-	0xC601, 0x06C0, ...
From: Andrew Morton
Date: Tuesday, September 18, 2007 - 8:05 pm

And is we really really have to do this, then the ext4-private crc16() 
should have static scope.
-

From: Andrew Morton
Date: Thursday, September 20, 2007 - 4:22 pm

On Tue, 18 Sep 2007 17:25:31 -0700

This needed a few fixups due to conflicts with
ext2-ext3-ext4-add-block-bitmap-validation.patch but they were pretty
straightforward.  Please check that the result is OK.


-

Previous thread: Bogus high interrupt load? by Aggelos Manousarides on Tuesday, September 18, 2007 - 4:56 pm. (2 messages)

Next thread: [PATCH 1/2] UML - Fix registers.c build by Jeff Dike on Tuesday, September 18, 2007 - 6:31 pm. (1 message)