logo
Published on KernelTrap (http://kerneltrap.org)

BSDCan 2008: ZFS Internals

By Jeremy
Created May 16 2008 - 21:14

Pawel Dawidek first ported ZFS to FreeBSD from OpenSolaris in April of 2007. He continues to actively port new ZFS features from OpenSolaris, and focuses on improving overall ZFS stability. During the introduction to his talk at BSDCan [1], he explained that his goal was to offer an accessible view of ZFS internals. His discussion was broken into three sections, a review of the layers ZFS is built from and how they work together, a look at unique features found in ZFS and how they work internally, and a report on the current status of ZFS in FreeBSD.

The BSDCan website notes that Pawel is a FreeBSD committer, adding:

"In the FreeBSD project, he works mostly in the storage subsystems area (GEOM, file systems), security (disk encryption, opencrypto framework, IPsec, jails), but his code is also in many other parts of the system. Pawel currently lives in Warsaw, Poland, running his small company."


Derived from notes taken at a one-hour BSDCan talk by Pawel Dawidek, titled, A closer look at the ZFS file system. Simple administration, transactional semantics, end-to-end data integrity [2].

ZFS Layers

In a series of slides titled "ZFS, the internals", Pawel started with a diagram illustrating the many layers of ZFS, offering a quick overview of how it all fits together, and how it fits into FreeBSD. He then quickly moved from layer to layer.

ZFS Features

ZFS Status in FreeBSD
Pawel explained that he has already ported the most recent version of ZFS from OpenSolaris, and that it currently lives in his private Perforce source code repository. He noted that this port is completed code wise and everything works, but that he's working on writing regression tests. He's already written 2,000 tests, but these only cover half of ZFS functionality -- an illustration of just how many features ZFS has. The new code will not be comitted until he completes the writing of his regression tests, so he suggests "be patient".

Cool New Features in the Latest Port

When Will ZFS Be Production Ready?
Pawel notes that he's heard this question a lot. "The experimental status is very inconvenient," he commented to lots of laughter from the crowded room. He noted that he's currently the only maintainer, and suggested until someone comes along to co-maintain the code to help debug things when the filesystem gains more users he wouldn't be marking the code as production ready. He also commented that nobody has stepped up yet to co-maintain the code, so he expect is will be a while yet.

He went on to note that he's personally used ZFS on FreeBSD in production for 2 years, and on his laptop for more than 1 year, "it just works, and it doesn't lose data. It doesn't corrupt data, and you don't have to wait for fsck."

Questions and Answers
With this, Pawel opened the floor to questions.

Q: Is the latest port available?
A: Not yet. The regression tests are being written first, then the patch will be published, then it wil go into CVS.

Q: Will the new version of ZFS be able to talk to partitions created with the old version of ZFS?.
A: Yes, but you will need to use a command to update the volume if you want to access the new ZFS features.

Q: How does ZFS handle bad sectors on the disk
A: This can be handled by mirror disks or using RAID-Z. In addition, ZFS always replicates its metadata, and it's possible to configure it to also replicate data on a single disk.

Q: Does it support ACLs?
A: The new version does. In OpenSolaris they use filesystem attributres. In FreeBSD we use extended attributes. In the new version the two can be translated. It's also possible to implement POSIX ACLs, but this isn't likely to happen as it would make ZFS on FreeBSD incompatible with ZFS on OpenSolaris. There's also a Google Summer of Code project related to this.

Q: How does ZFS work with 64-bit architectures?
A: Another nice ZFS feature is that it has no endian dependencies. ZFS always writes in the architecture's endianness, and doesn't slow down writes by translating. When reading, it simply checks the order in which data was stored, then feeds bytes appropriately.

Q: Can you dynamically expand filesystems?
A: Yes.
Pawel then popped up a terminal and offered a live demonstration of how it works.

Q: How much space is allocated for snapshots?
A: No space is allocated for a snapshot until you start modifying it, then it allocates space as the filesystem changes.


Source URL:
http://kerneltrap.org/FreeBSD/BSDCan_2008_ZFS_Internals