| From | Subject | Date |
|---|---|---|
| Joel Becker | Re: [Ocfs2-devel] [PATCH] OCFS2: Allow huge (> 16 TiB) v ...
[Added jbd2 Ccs. Sorry about the whole-patch-quote, but I want jbd2
folks to see what we're doing.]
This is completely unsafe. Two reasons. First, you're checking
the journal features after ocfs2_journal_load() has done recovery. This
may or may not be safe; recovering a 32bit journal probably works even
on a 64bit filesystem, and we shouldn't see that combination in the
wild anyway. That's not so bad.
Far worse is that you might recover a 64bit journal before
you've checked the ...
| Jul 6, 1:04 pm 2010 |
| David Howells | Re: [PATCH] Rearrange i_flags to be consistent with FS_I ...
That occurred to me after I sent the patch. I can add some preprocessor guards
for this.
David
--
| Jul 6, 6:40 am 2010 |
| David Howells | Re: [PATCH] Rearrange i_flags to be consistent with FS_I ...
They're not so dependent. They're based on the FS_IOC_[GS]ETFLAGS ioctl which
even XFS translates its flags for. These ioctl flags must now remain
invariant. Whilst they might have originated as Ext2/3/4 flags, they're now
This can be argued one way or another, however aligning i_flags with something
would probably be an improvement somewhere. Most of what I deal with is Ext3/4
based, and BTRFS-based is likely to become important too.
David
--
| Jul 6, 4:45 pm 2010 |
| Eric Sandeen | Re: inconsistent file placement
Using a recent e2fsprogs, and the "filefrag -v" command, will
give you much more interesting layout information:
# filefrag -v testfile
Filesystem type is: ef53
File size of testfile is 1073741824 (262144 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 1865728 32768
1 32768 1898496 32768
2 65536 1931264 32768
3 98304 1964032 32768
4 131072 1996800 2048
5 133120 2000896 ...
| Jul 5, 7:38 pm 2010 |
| Amir G. | Re: inconsistent file placement
The ext[23] (and I suppose 4 as well) uses the process pid % 16 to
define a 'color' for the process.
New files first block goal depends on that 'color' - the goal is one
of 16 different offsets in the block group
where the new file's inode was allocated (usually the block group of
its parent directory).
The logic behind this allocator is that multiple files created
concurrently in the same directory would
have less chance of stepping over each other's allocations.
I am not sure what you ...
| Jul 5, 11:52 pm 2010 |
| Daniel Taylor | inconsistent file placement
I realize that it is enerally not a good idea to tune
an operating system, or subsystem, for benchmarking, but
there's something that I don't understand about ext[234]
that is badly affecting our product. File placement on
newly-created file systems is inconsistent. I can't,
yet, call it a bug, but I really need to understand what
is happening, and I cannot find, in the source code, the
source of the randomization (related to "goal"???).
Disk drive performance for writing/reading large ...
| Jul 5, 6:49 pm 2010 |
| Daniel Taylor | RE: inconsistent file placement
In all of my recent tests, there has only been one file created, in
the root directory of the freshly created and mounted file system.
mkfs.ext[234] -b 65536 /dev/sda4
mount <some options tested> /dev/sda4 /DataVolume
touch /DataVolume/hex.txt
"for i in 1 2 3 4 5; do dd if=/hex.txt bs=64K; \
done >>/DataVolume/hex.txt"
umount /DataVolume
dumpe2fs /dev/sda4 >/<log file>
where /hex.txt is a 1G file on the NFS root.
I tried with, and without, orlov on ext3 (-o orlov and -o oldalloc)
and ...
| Jul 6, 3:15 pm 2010 |
| tytso | Re: inconsistent file placement
In ext3, it really is random. The randomness you're looking for can
be found in fs/ext3/ialloc.c:find_group_orlov(), when it calls
get_random_bytes(). This is responsible for "spreading" directories
so they are spread across the block groups, to try to prevent
fragmented files. Yes, if all you care about is benchmarks which only
use 10% of the entire file system, and for which the benchmarks don't
adequately simulate file system aging, the algorithms in ext3 will
cause a lot of ...
| Jul 6, 11:55 am 2010 |
| Eric Sandeen | Re: inconsistent file placement
However, from the test description it looks like it is writing
a file to the root dir, so there should be no parent-dir random spreading,
right?
-Eric
--
| Jul 6, 11:59 am 2010 |
| tytso | Re: inconsistent file placement
Hmm, yes, I missed that part of Daniel's e-mail. He's just writing a
single file. In that case, Amir is right, the only thing which would
be causing this is the colour offset, at least for ext2 and ext3.
This is avoid fragmented files caused by two or more processes running
on different CPU's all writing into the same block group.
In the case of ext4, we don't use a pid-determined colour algorithm if
delayed allocation is used, and the randomness is caused by the
writeback system deciding ...
| Jul 6, 3:01 pm 2010 |
| tytso | Re: inconsistent file placement
Out of curiosity, what *are* the "common NAS benchmarks" in use today,
and who chooses them?
There have been times in the past when "common benchmarks" promulgated
by reviewers have done active harm in the industry, driving disk drive
manufacturers to chose unsafe defaults, all because the only thing
people paid attention to was crappy benchmarks.
Sometimes the right answer is to put a spotlight on deficient
Delayed allocation is the default for ext4. If you are seeing random
behaviour ...
| Jul 6, 4:14 pm 2010 |
| Eric Sandeen | Re: inconsistent file placement
orlov is an inode allocator for directory inodes; since you
are creating 1 file in the root dir, they won't matter.
It affects file placement because files prefer to be close to their
parent dir, more or less, but in your case you are never allocating
a directory so the point is moot.
delalloc is the default as well.
filefrag -v output would be much more enlightening than what you've
shown so far...
--
| Jul 6, 4:34 pm 2010 |
| Eric Sandeen | Re: inconsistent file placement
that patch is rather simplistic, FWIW; at least for XFS it -hurt- perf
due to the unwritten->written conversion and the relatively small, frequent
preallocations.
More smarts to merge up multiple 1-byte-writes into a large preallocation
might help, as the bug mentions.
But ... is something like it already in samba? that'd be nifty, but I wasn't
aware of that. There is a preallocation-sounding switch but I think it doesn't
do what you think it does. I'd have to go look up details, ...
| Jul 6, 4:39 pm 2010 |
| Eric Sandeen | Re: extent counting fun
Well, I didn't actually look but I'm 98% sure it's just because it's
not reporting the interspersed metadata blocks.
Sorry, above was on ext3, that wasn't clear, just a stock dd-streamed
Hm don't we have that already?
Hmm... just xattr I guess.
In any case it's still a question of whether ext3 extent count should
be "fudged" to make blocks separated by metadata look contiguous
or not ...
--
| Jul 5, 5:50 pm 2010 |
| Eric Sandeen | Re: [PATCH] fix for consistency errors after crash
Hi Amir, I really do appreciate the effort, the patch, and the ping. :)
I'll have to set aside some time to give it a hard look, but linking
it back to existing bugs of mine raises that priority, thanks. :)
--
| Jul 6, 9:00 am 2010 |
| Amir G. | Re: [PATCH] fix for consistency errors after crash
Hi Eric,
I've seen you guys had some open RH bugs on ext3, who all share in
common the "bit already free" error.
This bug I reported can explain many different problems in ext[34].
Essentially, every time there is a kernel crash (or hard reboot)
during delete/truncate of a large file,
it may result in "bit already clear" error after reboot.
The problem is very simple and so is the fix.
I proved the problem with 100% recreation chances using a small patch,
instead of running statistical ...
| Jul 6, 6:00 am 2010 |
| bugzilla-daemon | [Bug 15827] ext4_get_blocks may be called while ext4_tr ...
https://bugzilla.kernel.org/show_bug.cgi?id=15827
Dmitry Monakhov <dmonakhov@openvz.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |CODE_FIX
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are ...
| Jul 6, 12:15 am 2010 |
| bugzilla-daemon | [Bug 15792] ext4_inode_info->i_flags modification is racy
https://bugzilla.kernel.org/show_bug.cgi?id=15792
Dmitry Monakhov <dmonakhov@openvz.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |CODE_FIX
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are ...
| Jul 6, 12:15 am 2010 |
| bugzilla-daemon | [Bug 15742] Fallocated extents handled incorrectly if be ...
https://bugzilla.kernel.org/show_bug.cgi?id=15742
Dmitry Monakhov <dmonakhov@openvz.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |CODE_FIX
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are ...
| Jul 6, 12:16 am 2010 |
| previous day | today | next day |
|---|---|---|
| July 5, 2010 | July 6, 2010 | July 7, 2010 |
