This fixes a bug where readdir() would return a directory entry twice if there was a hash collision in an hash tree indexed directory. Signed-off-by: Eugene Dashevsky <eugene@ibrix.com> Signed-off-by: Mike Snitzer <msnitzer@ibrix.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext3/dir.c | 22 ++++++++++++++++------ 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c index 2eea96e..42c5391 100644 --- a/fs/ext3/dir.c +++ b/fs/ext3/dir.c @@ -410,7 +410,7 @@ static int call_filldir(struct file * filp, void * dirent, get_dtype(sb, fname->file_type)); if (error) { filp->f_pos = curr_pos; - info->extra_fname = fname->next; + info->extra_fname = fname; return error; } fname = fname->next; @@ -449,11 +449,21 @@ static int ext3_dx_readdir(struct file * filp, * If there are any leftover names on the hash collision * chain, return them first. */ - if (info->extra_fname && - call_filldir(filp, dirent, filldir, info->extra_fname)) - goto finished; - - if (!info->curr_node) + if (info->extra_fname) { + if (call_filldir(filp, dirent, filldir, info->extra_fname)) + goto finished; + + info->extra_fname = NULL; + info->curr_node = rb_next(info->curr_node); + if (!info->curr_node) { + if (info->next_hash == ~0) { + filp->f_pos = EXT3_HTREE_EOF; + goto finished; + } + info->curr_hash = info->next_hash; + info->curr_minor_hash = 0; + } + } else if (!info->curr_node) info->curr_node = rb_first(&info->root); while (1) { -- 1.5.6.1.205.ge2c7.dirty --
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> For blocksize < pagesize we need to remove blocks that got allocated in block_write_begin() if we fail with ENOSPC for later blocks. block_write_begin() internally does this if it allocated page locally. This makes sure we don't have blocks outside inode.i_size during ENOSPC. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext3/inode.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 507d868..bff22b9 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1178,6 +1178,13 @@ write_begin_failed: ext3_journal_stop(handle); unlock_page(page); page_cache_release(page); + /* + * block_write_begin may have instantiated a few blocks + * outside i_size. Trim these off again. Don't need + * i_size_read because we hold i_mutex. + */ + if (pos + len > inode->i_size) + vmtruncate(inode, inode->i_size); } if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries)) goto retry; -- 1.5.6.1.205.ge2c7.dirty --
Note: some people thinks this represents a security bug, since it might make the system go away while it is printing a large number of console messages, especially if a serial console is involved. Hence, it has been assigned CVE-2008-3528, but it requires that the attacker either has physical access to your machine to insert a USB disk with a corrupted filesystem image (at which point why not just hit the power button), or is otherwise able to convince the system administrator to mount an arbitrary filesystem image (at which point why not just include a setuid shell or world-writable hard disk device file or some such). Me, I think they're just being silly. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: Eugene Teo <eugeneteo@kernel.sg> --- fs/ext2/dir.c | 60 +++++++++++++++++++++++++++++++++----------------------- 1 files changed, 35 insertions(+), 25 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index a78c6b4..c53c790 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -103,7 +103,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len) return err; } -static void ext2_check_page(struct page *page) +static void ext2_check_page(struct page *page, int quiet) { struct inode *dir = page->mapping->host; struct super_block *sb = dir->i_sb; @@ -146,10 +146,10 @@ out: /* Too bad, we had an error */ Ebadsize: - ext2_error(sb, "ext2_check_page", - "size of directory #%lu is not a multiple of chunk size", - dir->i_ino - ); + if (!quiet) + ext2_error(sb, __func__, + "size of directory #%lu is not a multiple " + "of chunk size", dir->i_ino); goto fail; Eshort: error = "rec_len is smaller than minimal"; @@ -166,32 +166,36 @@ Espan: Einumber: error = "inode out of bounds"; bad_entry: - ext2_error (sb, "ext2_check_page", "bad entry in directory #%lu: %s - " - "offset=%lu, inode=%lu, rec_len=%d, ...
Note: some people thinks this represents a security bug, since it might make the system go away while it is printing a large number of console messages, especially if a serial console is involved. Hence, it has been assigned CVE-2008-3528, but it requires that the attacker either has physical access to your machine to insert a USB disk with a corrupted filesystem image (at which point why not just hit the power button), or is otherwise able to convince the system administrator to mount an arbitrary filesystem image (at which point why not just include a setuid shell or world-writable hard disk device file or some such). Me, I think they're just being silly. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: Eugene Teo <eugeneteo@kernel.sg> --- fs/ext3/dir.c | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c index 42c5391..283938a 100644 --- a/fs/ext3/dir.c +++ b/fs/ext3/dir.c @@ -102,6 +102,7 @@ static int ext3_readdir(struct file * filp, int err; struct inode *inode = filp->f_path.dentry->d_inode; int ret = 0; + int dir_has_error = 0; sb = inode->i_sb; @@ -148,9 +149,12 @@ static int ext3_readdir(struct file * filp, * of recovering data when there's a bad sector */ if (!bh) { - ext3_error (sb, "ext3_readdir", - "directory #%lu contains a hole at offset %lu", - inode->i_ino, (unsigned long)filp->f_pos); + if (!dir_has_error) { + ext3_error(sb, __func__, "directory #%lu " + "contains a hole at offset %lld", + inode->i_ino, filp->f_pos); + dir_has_error = 1; + } /* corrupt size? Maybe no more blocks to read */ if (filp->f_pos > inode->i_blocks << 9) break; -- 1.5.6.1.205.ge2c7.dirty --
The description should explain what the problem is. And the last sentence is a little ambiguous. This is a user-triggerable DoS. The administrator who mounted the filesystem image or partition might not know that the dir->i_size and dir->i_blocks are corrupted. A remote user just need to perform either a read or write operation to the mounted image or partition, and this could trigger the problem, resulting in a denial of service. Take note that another problem the test image shows is that, the ext2/3 (and possibly ext4) filesystem does not honour the read-only mode when the revision level is too high. That is, when le32_to_cpu(es->s_rev_level) > EXT3_MAX_SUPP_REV. Eric replied me in a private email that this is a different, and unrelated bug that will be fixed. Thanks, Eugene --
On Sat, 13 Sep 2008 11:32:50 -0400 This patch was purportedly authored by yourself, but I'm going to assume that it was authored by Eric. --
It was, after some discussion w/ Ted & Andreas. Also just FWIW I'm also in the "as a security issue this is a bit silly" camp ;) Thanks, -Eric --
On Sat, Sep 13, 2008 at 11:32 PM, Theodore Ts'o <tytso@mit.edu> wrote: Minor issue. Since you are already changing "ext2_check_page" to __func__, you might as well change it here too. Thanks, Eugene --
On Sat, 13 Sep 2008 11:32:49 -0400 Well we used to do this trimming in core VFS, but Nick broke it. We still do it if the fs doesn't implement ->write_begin(). Should we do this trimming in pagecache_write_begin() in both cases? --
We still do it in block_write_begin if the pages are allocated by pagecache_write_begin is not used in the write_begin call path for ext3/ext4. generic_file_buffered_write generic_perform_write ext3_write_begin block_write_begin -aneesh --
On Sat, 13 Sep 2008 11:32:48 -0400 That sounds like a serious problem, but given the amount of time it took to turn up, I guess it's pretty rare. What are your thoughts regarding a 2.6.27 merge for this? 2.6.26.x? 2.6.25.x? ... --
It's not a regression, so per Linus's request that at this point we're too late for anything other than regression fixes, I had assumed that it should be pushed for 2.6.28, and then go into the various stable trees. It's true that it took quite a while for people to notice, probably because many programs won't notice if readdir() returns a directory entry twice. BTW, this hasn't hit -mm yet, and I've got a number of ext3 patches that don't appear to have hit -mm, including one that was authored by Linus. Should I create a git tree or a quilt series if that would make things easier for you? - Ted --
On Wed, 1 Oct 2008 18:37:55 -0400 I currently have: jbd-abort-when-failed-to-log-metadata-buffers.patch #jbd-fix-error-handling-for-checkpoint-io.patch: double-check this jbd-fix-error-handling-for-checkpoint-io.patch ext3-add-checks-for-errors-from-jbd.patch jbd-dont-dirty-original-metadata-buffer-on-abort.patch ext3-dont-try-to-resize-if-there-are-no-reserved-gdt-blocks-left.patch ext3-fix-ext3-block-reservation-early-enospc-issue.patch jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch ext3-add-an-option-to-control-error-handling-on-file-data.patch jbd-ordered-data-integrity-fix.patch ext3-fix-ext3_dx_readdir-hash-collision-handling.patch ext3-fix-ext3_dx_readdir-hash-collision-handling-checkpatch-fixes.patch ext3-truncate-block-allocated-on-a-failed-ext3_write_begin.patch ext3-avoid-printk-floods-in-the-face-of-directory-corruption.patch #jbd-abort-instead-of-waiting-for-nonexistent-transactions.patch: sct probs jbd-abort-instead-of-waiting-for-nonexistent-transactions.patch and I don't see anything from yourself in the backlog? There are some ext3 patches from other people I need to go through - I've been offlineish for nearly a week and have spent a fun day staring in horror at the mess which people are shovelling in Stephen's direction for linux-next. --
