There are patches original worked by Takashi Sato to support large block size (up to 64k) in ext2/3/4, which addressed the directory issue as well. I just forward ported. Will posted it in a separate thread. Haven't get a chance to integrated with your patch yet (next step). thanks, -
The next 4 patches support large block size (up to PAGESIZE, max 64KB) for ext2/3/4, originally from Takashi Sato. http://marc.info/?l=linux-ext4&m=115768873518400&w=2 It's quite simple to support large block size in ext2/3/4, mostly just enlarge the block size limit. But it is NOT possible to have 64kB blocksize on ext2/3/4 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is to put 2 empty records in such a directory, or to special-case an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 4 patches. [1/4] ext2/3/4: enlarge blocksize - Allow blocksize up to pagesize [2/4] ext2: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize [3/4] ext3: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize [4/4] ext4: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested only. Next steps: Need a e2fsprogs changes to able test this feature. As mkfs needs to be educated not assuming rec_len to be blocksize all the time. Will try it with Christoph Lameter's large block patch next. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- --- fs/ext2/super.c | 2 +- fs/ext3/super.c | 5 ++++- fs/ext4/super.c | 5 +++++ include/linux/ext2_fs.h | 4 ++-- include/linux/ext3_fs.h | 4 ++-- include/linux/ext4_fs.h | 4 ++-- 6 files changed, 16 insertions(+), 8 deletions(-) Index: linux-2.6.23-rc3/fs/ext2/super.c =================================================================== --- linux-2.6.23-rc3.orig/fs/ext2/super.c 2007-08-12 21:25:24.000000000 -0700 +++ ...
Two problems were found when testing largeblock on ext3. Patches to follow. Good news is, with your changes, plus all these extN changes, I am able to run ext2/3/4 with 64k block size, tested on x86 and ppc64 with 4k page size. fsx test runs fine for an hour on ext3 with 16k blocksize on x86:-) Mingming
From clameter:
Teach jbd/jbd2 slab management to support >8k block size. Without this, it refused to mount on >8k ext3.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Index: my2.6/fs/jbd/journal.c
===================================================================
--- my2.6.orig/fs/jbd/journal.c 2007-08-30 18:40:02.000000000 -0700
+++ my2.6/fs/jbd/journal.c 2007-08-31 11:01:18.000000000 -0700
@@ -1627,16 +1627,17 @@ void * __jbd_kmalloc (const char *where,
* jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
* and allocate frozen and commit buffers from these slabs.
*
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
*/
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size) (size >> 11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size) get_order((size) << (PAGE_SHIFT - 10))
static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
static const char *jbd_slab_names[JBD_MAX_SLABS] = {
- "jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
+ "jbd_1k", "jbd_2k", "jbd_4k", "jbd_8k",
+ "jbd_16k", "jbd_32k", "jbd_64k"
};
static void journal_destroy_jbd_slabs(void)
Index: my2.6/fs/jbd2/journal.c
===================================================================
--- my2.6.orig/fs/jbd2/journal.c 2007-08-30 18:40:02.000000000 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 11:04:37.000000000 -0700
@@ -1639,16 +1639,18 @@ void * __jbd2_kmalloc (const char *where
* jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
* and allocate frozen and commit buffers from these slabs.
*
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the ...But the real fix is to kill this code. We can't send down slab pages down the block layer without breaking iscsi or aoe. And this code is only used in so rare cases that all the normal testing won't hit it. Very bad combination. -
We are doing what you describe right now. So the current code is broken? -
How about getting rid of the slabs there and use kmalloc? Kmalloc in mm (and therfore hopefully 2.6.24) will convert kmallocs > PAGE_SIZE to page allocator calls. Not sure what to do about the 1k and 2k requests though.
The problem is that we must never use kmalloc pages, so we always need to request a page or more for these. Better to use get_free_page directly, that's how I fixed it in XFS a while ago.
So you'd be fine with replacing the allocs with get_free_pages(GFP_xxx, get_order(size)) ?
Yes. And rip out all that code related to setting up the slabs. I plan to add WARN_ONs to bio_add_page and friends to detect further usage of slab pages if there is any.
jbd/jbd2: Replace slab allocations with page cache allocations
From: Christoph Lameter <clameter@sgi.com>
JBD should not pass slab pages down to the block layer.
Use page allocator pages instead. This will also prepare
JBD for the large blocksize patchset.
Tested on 2.6.23-rc6 with fsx runs fine.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
fs/jbd/checkpoint.c | 2
fs/jbd/commit.c | 6 +-
fs/jbd/journal.c | 107 ++++---------------------------------------------
fs/jbd/transaction.c | 10 ++--
fs/jbd2/checkpoint.c | 2
fs/jbd2/commit.c | 6 +-
fs/jbd2/journal.c | 109 ++++----------------------------------------------
fs/jbd2/transaction.c | 18 ++++----
include/linux/jbd.h | 23 +++++++++-
include/linux/jbd2.h | 28 ++++++++++--
10 files changed, 83 insertions(+), 228 deletions(-)
Index: linux-2.6.23-rc5/fs/jbd/journal.c
===================================================================
--- linux-2.6.23-rc5.orig/fs/jbd/journal.c 2007-09-13 13:37:57.000000000 -0700
+++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.000000000 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
/*
* Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
jbd_unlock_bh_state(bh_in);
- tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+ tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
- jbd_slab_free(tmp, bh_in->b_size);
+ jbd_free(tmp, bh_in->b_size);
goto repeat;
}
@@ -679,7 +678,7 @@ static journal_t * journal_init_common (
/* Set up a default-sized revoke table for the new mount. */
err = journal_init_revoke(journal, ...Currently memory allocation for committed_data(and frozen_buffer) for bufferhead is done through jbd slab management, as Christoph Hellwig pointed out that this is broken as jbd should not pass slab pages down to IO layer. and suggested to use get_free_pages() directly. The problem with this patch, as Andreas Dilger pointed today in ext4 interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste 1/3-1/2 page space. What was the originally intention to set up slabs for committed_data(and frozen_buffer) in JBD? Why not using kmalloc? -
Looks good. Small suggestion is to get rid of all kmalloc() usages and consistently use jbd_kmalloc() or jbd2_kmalloc(). Thanks, Badari
Here is the incremental small cleanup patch.
Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
fs/jbd/journal.c | 8 +++++---
fs/jbd/revoke.c | 12 ++++++------
fs/jbd2/journal.c | 8 +++++---
fs/jbd2/revoke.c | 12 ++++++------
4 files changed, 22 insertions(+), 18 deletions(-)
Index: linux-2.6.23-rc6/fs/jbd/journal.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-17 14:32:16.000000000 -0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-17 14:33:59.000000000 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
- journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+ journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+ GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
- journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+ journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+ GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal->j_inode);
if (journal->j_revoke)
journal_destroy_revoke(journal);
- kfree(journal->j_wbuf);
+ jbd_kfree(journal->j_wbuf);
jbd_kfree(journal);
}
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c 2007-09-17 14:32:22.000000000 -0700
+++ ...Shouldn't we kill jbd_kmalloc instead? -
It seems useful to me to keep jbd_kmalloc/jbd_free. They are central places to handle memory (de)allocation(<page size) via kmalloc/kfree, so in the future if we need to change memory allocation in jbd(e.g. not using kmalloc or using different flag), we don't need to touch every place in the jbd code calling jbd_kmalloc. Regards, Mingming -
I disagree. Why would jbd need to globally change the way it allocates memory? It currently uses kmalloc (and jbd_kmalloc) for allocating a variety of structures. Having to change one particular instance won't necessarily mean we want to change all of them. Adding unnecessary wrappers only obfuscates the code making it harder to understand. You wouldn't want every subsystem to have it's own *_kmalloc() that took different arguments. Besides, there aren't that many calls to kmalloc and kfree in the jbd code, so there wouldn't be much pain in changing GFP flags or whatever, if it ever needed to be done. Shaggy -- David Kleikamp IBM Linux Technology Center
Okay, Points taken, Here is the updated patch to get rid of slab management and jbd_kmalloc from jbd totally. This patch is intend to replace the patch in mm tree, Andrew, could you pick up this one instead? Thanks, Mingming jbd/jbd2: JBD memory allocation cleanups From: Christoph Lameter <clameter@sgi.com> JBD: Replace slab allocations with page cache allocations JBD allocate memory for committed_data and frozen_data from slab. However JBD should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/jbd/commit.c | 6 +-- fs/jbd/journal.c | 99 ++------------------------------------------------ fs/jbd/transaction.c | 12 +++--- fs/jbd2/commit.c | 6 +-- fs/jbd2/journal.c | 99 ++------------------------------------------------ fs/jbd2/transaction.c | 18 ++++----- include/linux/jbd.h | 18 +++++---- include/linux/jbd2.h | 21 +++++----- 8 files changed, 52 insertions(+), 227 deletions(-) Index: linux-2.6.23-rc6/fs/jbd/journal.c =================================================================== --- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-18 17:19:01.000000000 -0700 +++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-18 17:51:21.000000000 -0700 @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); static void __journal_abort_soft (journal_t *journal, int errno); -static int journal_create_jbd_slab(size_t slab_size); /* * Helper function used to manage commit timeouts @@ -334,10 +333,10 @@ repeat: char *tmp; jbd_unlock_bh_state(bh_in); - tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS); + tmp = jbd_alloc(bh_in->b_size, GFP_NOFS); ...
__GFP_NOFAIL should only be used when we have no way of recovering from failure. The allocation in journal_init_common() (at least) _can_ recover and hence really shouldn't be using __GFP_NOFAIL. (Actually, nothing in the kernel should be using __GFP_NOFAIL. It is there as a marker which says "we really shouldn't be doing this but we don't know how to fix it"). So sometime it'd be good if you could review all the __GFP_NOFAILs in there and see if we can remove some, thanks. -
Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
cases except one handles memory allocation failure so I get rid of those
GFP_NOFAIL flags.
Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
in jbd/jbd2? I will send a separate patch to cleanup that.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
fs/jbd/journal.c | 2 +-
fs/jbd/transaction.c | 3 +--
fs/jbd2/journal.c | 2 +-
fs/jbd2/transaction.c | 3 +--
4 files changed, 4 insertions(+), 6 deletions(-)
Index: linux-2.6.23-rc6/fs/jbd/journal.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-19 11:47:58.000000000 -0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-19 11:48:40.000000000 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
- journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+ journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c 2007-09-19 11:48:05.000000000 -0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c 2007-09-19 11:49:10.000000000 -0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
alloc_transaction:
if (!journal->j_running_transaction) {
- new_transaction = kmalloc(sizeof(*new_transaction),
- GFP_NOFS|__GFP_NOFAIL);
+ new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.000000000 -0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c 2007-09-19 11:49:46.000000000 -0700
@@ -654,7 +654,7 @@ static journal_t * ...Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
with the rest of kmalloc flag used in the JBD/JBD2 layer.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
fs/jbd/journal.c | 6 +++---
fs/jbd/revoke.c | 8 ++++----
fs/jbd2/journal.c | 6 +++---
fs/jbd2/revoke.c | 8 ++++----
4 files changed, 14 insertions(+), 14 deletions(-)
Index: linux-2.6.23-rc6/fs/jbd/journal.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-19 11:51:10.000000000 -0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-19 11:51:57.000000000 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
- journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+ journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
- journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+ journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
- journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+ journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===================================================================
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c 2007-09-19 11:51:30.000000000 -0700
+++ ...On Wed, 19 Sep 2007 12:22:09 -0700 These were all OK using GFP_KERNEL. GFP_NOFS should only be used when the caller is holding some fs locks which might cause a deadlock if that caller reentered the fs in ->writepage (and maybe put_inode and such). That isn't the case in any of the above code, which is all mount time stuff (I think). ext3/4 should be using GFP_NOFS when the caller has a transaction open, has a page locked, is holding i_mutex, etc.
Thanks for your feedback. Mingming -
Is there a reason for this change except "it's in a filesystem, so it should be GFP_NOFS"? We are only doing journal setup during mount so there shouldn't be any problem using GFP_KERNEL. I don't think it will inject any defect into the code, but I don't think it is needed either. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
No. GFP_NOFS avoids deadlock. It prevents the allocation from making recursive calls back into the file system that could end up blocking on jbd code. Shaggy -- David Kleikamp IBM Linux Technology Center -
Oh, I see your patch now. You mean use GFP_NOFS instead of -- David Kleikamp IBM Linux Technology Center -
This should probably be a __GFP_NOFAIL if we are trying to start a new handle in truncate, as there is no way to propagate an error to the caller. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Thanks, updated version. Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2, most cases they are not needed. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/jbd/journal.c | 2 +- fs/jbd2/journal.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.23-rc6/fs/jbd/journal.c =================================================================== --- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-19 11:47:58.000000000 -0700 +++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-19 14:23:45.000000000 -0700 @@ -653,7 +653,7 @@ static journal_t * journal_init_common ( journal_t *journal; int err; - journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); + journal = kmalloc(sizeof(*journal), GFP_KERNEL); if (!journal) goto fail; memset(journal, 0, sizeof(*journal)); Index: linux-2.6.23-rc6/fs/jbd2/journal.c =================================================================== --- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.000000000 -0700 +++ linux-2.6.23-rc6/fs/jbd2/journal.c 2007-09-19 14:23:45.000000000 -0700 @@ -654,7 +654,7 @@ static journal_t * journal_init_common ( journal_t *journal; int err; - journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); + journal = kmalloc(sizeof(*journal), GFP_KERNEL); if (!journal) goto fail; memset(journal, 0, sizeof(*journal));
kmalloc is using slabs :) The intent was to avoid the wasted memory, but as we've repeated a gazillion times wasted memory on a rather rare codepath doesn't really matter when you just crash random storage drivers otherwise.
Thanks Mingming.
The blocks per page could be less or quals to 1 with the large block support in VM.
The patch fixed the way to calculate the number of blocks to reserve in journal in the
case blocksize > pagesize.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Index: my2.6/fs/jbd/journal.c
===================================================================
--- my2.6.orig/fs/jbd/journal.c 2007-08-31 13:27:16.000000000 -0700
+++ my2.6/fs/jbd/journal.c 2007-08-31 13:28:18.000000000 -0700
@@ -1611,7 +1611,12 @@ void journal_ack_err(journal_t *journal)
int journal_blocks_per_page(struct inode *inode)
{
- return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+ int bits = PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+ if (bits > 0)
+ return 1 << bits;
+ else
+ return 1;
}
/*
Index: my2.6/fs/jbd2/journal.c
===================================================================
--- my2.6.orig/fs/jbd2/journal.c 2007-08-31 13:32:21.000000000 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 13:32:30.000000000 -0700
@@ -1612,7 +1612,12 @@ void jbd2_journal_ack_err(journal_t *jou
int jbd2_journal_blocks_per_page(struct inode *inode)
{
- return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+ int bits = PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+ if (bits > 0)
+ return 1 << bits;
+ else
+ return 1;
}
/*
Ahh. Good. I could add the path to the large blocksize patchset?
[2/4] ext2: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
fs/ext2/dir.c | 46 ++++++++++++++++++++++++++++++++++++----------
include/linux/ext2_fs.h | 13 +++++++++++++
2 files changed, 49 insertions(+), 10 deletions(-)
Index: linux-2.6.23-rc3/fs/ext2/dir.c
===================================================================
--- linux-2.6.23-rc3.orig/fs/ext2/dir.c 2007-08-12 21:25:24.000000000 -0700
+++ linux-2.6.23-rc3/fs/ext2/dir.c 2007-08-29 15:29:51.000000000 -0700
@@ -94,9 +94,9 @@ static void ext2_check_page(struct page
goto out;
}
for (offs = 0; offs <= limit - EXT2_DIR_REC_LEN(1); offs += rec_len) {
+ offs = EXT2_DIR_ADJUST_TAIL_OFFS(offs, chunk_size);
p = (ext2_dirent *)(kaddr + offs);
rec_len = le16_to_cpu(p->rec_len);
-
if (rec_len < EXT2_DIR_REC_LEN(1))
goto Eshort;
if (rec_len & 3)
@@ -108,6 +108,7 @@ static void ext2_check_page(struct page
if (le32_to_cpu(p->inode) > max_inumber)
goto Einumber;
}
+ offs = EXT2_DIR_ADJUST_TAIL_OFFS(offs, chunk_size);
if (offs != limit)
goto Eend;
out:
@@ -283,6 +284,7 @@ ext2_readdir (struct file * filp, void *
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + ext2_last_byte(inode, n) - EXT2_DIR_REC_LEN(1);
for ( ;(char*)de <= limit; de = ext2_next_entry(de)) {
+ de = EXT2_DIR_ADJUST_TAIL_ADDR(kaddr, de, sb->s_blocksize);
if (de->rec_len == 0) {
ext2_error(sb, __FUNCTION__,
"zero-length directory entry");
@@ -305,8 +307,10 @@ ext2_readdir (struct file * filp, void *
return 0;
}
}
+ filp->f_pos = EXT2_DIR_ADJUST_TAIL_OFFS(filp->f_pos, sb->s_blocksize);
filp->f_pos += le16_to_cpu(de->rec_len);
}
+ filp->f_pos = EXT2_DIR_ADJUST_TAIL_OFFS(filp->f_pos, sb->s_blocksize);
ext2_put_page(page);
}
return 0;
@@ -343,13 +347,14 @@ struct ext2_dir_entry_2 * ...