Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andrew Morton <akpm@...>
Cc: Eric Sandeen <sandeen@...>, Theodore Tso <tytso@...>, Andi Kleen <andi@...>, <linux-ext4@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, May 19, 2008 - 1:16 pm

On Monday 19 May 2008, Andrew Morton wrote:

I think one mistake we (myself included) have made all along with the barrier 
code is intermixing discussions about the cost of the solution with 
discussions about needing barriers at all.  Everyone thinks the barriers are 
slow because we also think running without barriers is mostly safe.

Barriers are actually really fast, at least when you compare them to running 
with the writecache off.  Making them faster in general may be possible, but 
they are somewhat pushed off to the side right now because so few people are 
running them.

Here's a test workload that corrupts ext3 50% of the time on power fail 
testing for me.  The machine in this test is my poor dell desktop (3ghz, dual 
core, 2GB of ram), and the power controller is me walking over and ripping 
the plug out the back.

In other words, this is not a big automated setup doing randomized power fails 
on 64 nodes over 16 hours and many TB of data.  The data working set for this 
script is 32MB, and it takes about 10 minutes per run.

 The workload has 4 parts:

1) A directory tree full of empty files with very long names (160 chars)
2) A process hogging a significant percent of system ram.  This must  be
    enough to force constant metadata writeback due to memory pressure, and is
    controlled with -p size_in_mb
3) A process constantly writing, fsyncing and truncating to zero a single 64k
    file
4) A process constantly renaming the files with very long names from (1)
    between long-named-file.0 and long-named-file.1

The idea was to simulate a loaded mailserver, and to find the corruptions by 
reading through the directory tree and finding files long-named-file.0 and 
long-named-file.1 existing at the same time.  In practice, it is faster to 
just run fsck -f on the FS after a crash.

In order to consistently cause corruptions, the size of the directory from
(1) needs to be at least as large as the ext3 log.  This is controlled with
the -s command line option.  Smaller sizes may work for the impatient, but it 
is more likely to corrupt for larger ones.

The program first creates the files in a directory called barrier-test
then it starts procs to pin ram and run the constant fsyncs.  After
each phase has run long enough, they print out a statement about
being ready, along with some other debugging output:

Memory pin ready
fsyncs ready
Renames ready

Example run:

# make 500,000 inodes on a 2GB partition.  The results in a 32MB log
mkfs.ext3 -N 500000 /dev/sda2
mount /dev/sda2 /mnt
cd /mnt

# my machine has 2GB of ram, -s 1500 will pin ~1.5GB
barrier-test -s 32 -p 1500

Run init, don't cut the power yet
10000 files 1 MB total
 ... these lines repeat for a bit
200000 files 30 MB total
Starting metadata operations now
r:1000
Memory pin ready
f:100 r:2000 f:200 r:3000 f:300
fsyncs ready
r:4000 f:400 r:5000 f:500 r:6000 f:600 r:7000 f:700 r:8000 f:800 r:9000 f:900 
r:10000
Renames ready

# I pulled the plug here
# After boot:

root@opti:~# fsck -f /dev/sda2
fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
/dev/sda2: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Problem in HTREE directory inode 281377 (/barrier-test): bad block number 
13543.
Clear HTree index<y>?    

< 246 other errors are here >

-chris
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Fri May 16, 3:02 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Fri May 16, 4:05 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Thu May 29, 9:36 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Pavel Machek, (Mon May 19, 5:04 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andi Kleen, (Sun May 18, 3:54 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Mon May 19, 9:26 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 11:36 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Tue May 20, 12:02 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 12:27 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Tue May 20, 1:08 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 6:26 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Mon May 19, 10:46 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Ric Wheeler, (Fri May 23, 2:33 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Fri May 16, 6:30 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Fri May 16, 4:53 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Fri May 16, 6:03 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Sun May 18, 8:28 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 11:13 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Greg Smith, (Wed May 21, 4:25 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Fri May 16, 6:21 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andi Kleen, (Sun May 18, 4:03 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Sun May 18, 8:43 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Sun May 18, 10:29 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 7:35 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Mon May 19, 12:11 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Mon May 19, 1:16 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Daniel Phillips, (Wed May 21, 6:30 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 10:58 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Mon May 19, 2:39 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Pavel Machek, (Wed May 21, 7:22 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Wed May 21, 2:03 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Wed May 21, 3:54 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Wed May 21, 2:29 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Wed May 21, 3:36 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Wed May 21, 3:40 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Wed May 21, 2:49 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Wed May 21, 3:42 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Wed May 21, 2:15 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Wed May 21, 3:43 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Wed May 21, 8:32 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jens Axboe, (Tue May 20, 4:25 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Tue May 20, 8:17 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jan Kara, (Mon May 19, 6:39 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Mon May 19, 8:29 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Timothy Shimmin, (Mon May 19, 11:29 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Tue May 20, 8:04 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Fri May 16, 6:53 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Fri May 16, 8:20 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Fri May 16, 8:35 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Chris Mason, (Sat May 17, 8:48 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 7:44 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Sat May 17, 9:36 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Ric Wheeler, (Sun May 18, 10:49 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 7:48 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 10:42 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Sat May 17, 9:43 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Theodore Tso, (Sat May 17, 4:44 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Tue May 20, 10:45 am)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andreas Dilger, (Sat May 17, 1:59 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Andrew Morton, (Fri May 16, 4:58 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Fri May 16, 5:45 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Eric Sandeen, (Fri May 16, 6:03 pm)
Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes, Jamie Lokier, (Fri May 16, 6:09 pm)
[PATCH 4/4] ext4: call blkdev_issue_flush on fsync, Eric Sandeen, (Fri May 16, 3:09 pm)
Re: [PATCH 4/4] ext4: call blkdev_issue_flush on fsync, Theodore Tso, (Mon May 19, 10:34 pm)
Re: [PATCH 4/4] ext4: call blkdev_issue_flush on fsync, Jamie Lokier, (Tue May 20, 11:43 am)
Re: [PATCH 4/4] ext4: call blkdev_issue_flush on fsync, Jamie Lokier, (Tue May 20, 6:02 pm)
Re: [PATCH 4/4] ext4: call blkdev_issue_flush on fsync, Eric Sandeen, (Tue May 20, 11:52 am)
[PATCH 3/4] ext4: enable barriers by default, Eric Sandeen, (Fri May 16, 3:08 pm)
[PATCH 2/4] ext3: call blkdev_issue_flush on fsync, Eric Sandeen, (Fri May 16, 3:07 pm)
Re: [PATCH 2/4] ext3: call blkdev_issue_flush on fsync, Jamie Lokier, (Fri May 16, 6:15 pm)
[PATCH 1/4] ext3: enable barriers by default, Eric Sandeen, (Fri May 16, 3:05 pm)
Re: [PATCH 1/4] ext3: enable barriers by default, Pavel Machek, (Mon May 19, 4:58 am)