"I wasn't planning on releasing v0.12 yet, and it was supposed to have some initial support for multiple devices. But, I have made a number of performance fixes and small bug fixes, and I wanted to get them out there before the (destabilizing) work on multiple-devices took over," explained Chris Mason regarding the 0.12 release of his new btrfs filesytem. Btrfs was first announced in June of 2007, as an alpha-quality filesystem offering checksumming of all files and metadata, extent based file storage, efficient packing of small files, dynamic inode allocation, writable snapshots, object level mirroring and striping, and fast offline filesystem checks, among other features. The project's website explains, "Linux has a wealth of filesystems to choose from, but we are facing a number of challenges with scaling to the large storage subsystems that are becoming common in today's data centers. Filesystems need to scale in their ability to address and manage large storage, and also in their ability to detect, repair and tolerate errors in the data stored on disk." Regarding the latest release, Chris offered:
"So, here's v0.12. It comes with a shiny new disk format (sorry), but the gain is dramatically better random writes to existing files. In testing here, the random write phase of tiobench went from 1MB/s to 30MB/s. The fix was to change the way back references for file extents were hashed."
From: Chris Mason <chris.mason@...>
Subject: [ANNOUNCE] Btrfs v0.12 released
Date: Feb 6, 1:00 pm 2008
Hello everyone,
I wasn't planning on releasing v0.12 yet, and it was supposed to have some
initial support for multiple devices. But, I have made a number of
performance fixes and small bug fixes, and I wanted to get them out there
before the (destabilizing) work on multiple-devices took over.
So, here's v0.12. It comes with a shiny new disk format (sorry), but the gain
is dramatically better random writes to existing files. In testing here, the
random write phase of tiobench went from 1MB/s to 30MB/s. The fix was to
change the way back references for file extents were hashed.
Other changes:
Insert and delete multiple items at once in the btree where possible. Back
references added more tree balances, and it showed up in a few benchmarks.
With v0.12, backrefs have no real impact on performance.
Optimize bio end_io routines. Btrfs was spending way too much CPU time in the
bio end_io routines, leading to lock contention and other problems.
Optimize read ahead during transaction commit. The old code was trying to
read far too much at once, which made the end_io problems really stand out.
mount -o ssd option, which clusters file data writes together regardless of
the directory the files belong to. There are a number of other performance
tweaks for SSD, aimed at clustering metadata and data writes to better take
advantage of the hardware.
mount -o max_inline=size option, to override the default max inline file data
size (default is 8k). Any value up to the leaf size is allowed (default
16k).
Simple -ENOSPC handling. Emphasis on simple, but it prevents accidentally
filling the disk most of the time. With enough threads/procs banging on
things, you can still easily crash the box.
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
just stabilize it already
Just having an extents based file system with fast fcsks, tail packing and dynamic inodes is more than I could wish for. I'd love to have this subset of btrfs stable enough for everyday use. I am a bit worried that Chris Mason is a bit too ambitious when he wants to add multiple devices with striping etc. to it. What additional benefits would it give compared to running it over the device mapper?
I agree completely
Release 1.0 before any crazy pseudo-raid/lvm support. That could be 2.0.
After all, most people interrested in lvm/raid already have it set up, and they don't mind the current implementation. Is there something wrong with dm? Then let's fix dm.
Many of us desperately need btrfs so much, prio nr1 is 1.0. Just to get rid of ext.
Please Chris, reconsider this.
Disagree, Chris is right
I'm eagerly awaiting Btrfs too but file systems are not applications or even kernel modules. If an application or kernel module has a problem or limitation, 9999 times out of 10000 you can silently replace it without affecting things too much. However, if a file system has a problem, you're stuck with it. Worse, if there is potential for data corruption, you can't trust any of your data. That's why ext4 is a relatively tame upgrade of ext3. Ext3 is a good workhorse that's good for most users, but even the authors know there are some inherent limitations to what can be done without serious breakage. And since no-one wants serious breakage, the problems have to be just accepted.
If Btrfs were stabilized today and released to production within 6 months, the fundamental problems in Btrfs (which haven't been found) will be mostly set in stone and people will start complaining about Btrfs (especially if the competition has the features).
The important thing is to have a clear and defined set of requirements so that feature creep doesn't keep Btrfs in the HURD stage for the next 20 years. The specs of ZFS should be a goal of what is currently realistically possible, so that might be a good goal. If Btrfs reaches the point where "it's obvious how to implement all of ZFS's features given enough time without breaking the disk format", then it's ready for beta and then final release. But not before.
Is there something wrong
Is there something wrong with dm? Then let's fix dm.
Many of us desperately need btrfs so much, prio nr1 is 1.0. Just to get rid of ext.
Is there something wrong with ext? Then let's fix ext.
;-)
faster and easier integrity checking, rebuild, self-repair
Hi,
Well, if its the filesystem that handles device replication/mirroring, it can make a rebuild mutch faster, because it doesn't need to resilver/resync the whole span of the device volume, only the data and metadata (that can make a huge diference).
(That can be observed with a md raid 5 versus a ZFS raidz, failing one disk, replace it with a new, and watch the speed of the reconstruction..).
Any kind of reconstruction, be it extending/growing with new devices, replacing devices..etc.. can be faster with this kind of tight coupling between filesystem and volume management.
Other thing that becomes easier is detecting device silent data corruption from one drive, by using checksums, and going to the next device with that data, retrieve it, check it, and if good, replace the corrupted data in the 1st drive with the good data. (something that marketing guys would call "self-healing").
Notice that without having control of the mirrored volumes, the FS would have more dificulty in discovering that one device silently had currupted data, and the other did not. (when the md raid returned the block, from which device did it came? which device silently currupted that block?)
Basically, having "inside" information about disk layout and state provides a faster/easier way to implement this kind of "health-checking", "self-healing" ..etc..
It would still be possible to have these features with the traditional approach, but its harder, more labour intensive (you have to add new logic and new information channels between virtual block devices and filesystem layer).
That's what I can think off.
Miguel Sousa Filipe
The focus on Btrfs
The focus on Btrfs development has been getting the code production ready as quickly as possible. I've made a lot of tradeoffs that favor development speed over perfection...
I really appreciate that people are anxious to start using Btrfs, but a big part of the Btrfs story is being robust in the face of metadata corruption.
A key component of that is metadata mirroring, even in single disk configurations. For multiple spindles, MD and DM based mirroring don't make it easy to read an alternate copy of the block from the mirror set, and they make it very difficult for the FS to understand the underlying storage topology.
The Btrfs chunking design aims to solve that, and I hope to push it down a layer so that other filesystems can take advantage of it. It will also be a key component in taking advantage of the SSD combo drives that are coming out, saving power and improving performance.
Most importantly, adding these features after the disk format is frozen would greatly complicate life, and I think lower the quality of the FS as a whole. A few months spent hammering it out will give us a much better long term code base.
thanks
Thank you for explaining your thoughts around this. We mere mortals are anxiously awaiting the time when we can sink our teeth into this... Just please keep the discussion with the other powers that be alive so we don't risk issues such as the ones that plagued Reiser from the start. What I mean is that if you reinvent too much stuff then integrating this important piece of code will be hard, for both technical and social reasons. A lot of people actually want to use this. (Yesterday, if possible :) )
A few months spent hammering
A few months spent hammering it out [...]
Pun intended ? ;-)
Btrfs
I agree with the first two comments. We need an extendable filesystem now. Even managing multiple OS's on desktop/laptop systems with large HD's is a problem without an _easily extendable_ filesystem.
How those this compare to
How those this compare to Reiser4 ?
Re: How those this compare to
Btrfs still has an active developer.
Reiser 4 also has an active,
Reiser 4 also has an active, unpaid developer.
Please check your facts before trolling.
http://chichkin_i.zelnet.ru/namesys/
compression, anyone?
Will Linux ever have a filesystem with transparent compression? Currently, only JFFS2 has it, but it's for flash-media only.