login
Header Space

 
 

"Fake" Write Support

June 10, 2008 - 9:02am
Submitted by Jeremy on June 10, 2008 - 9:02am.
Linux news

In a series of seven patches, Arnd Bergmann proposed adding in-memory write support to mounted cramfs file systems. He explained, "the intention is to use it for instance on read-only root file systems like CD-ROM, or on compressed initrd images. In either case, no data is written back to the medium, but remains in the page/inode/dentry cache, like ramfs does." Reactions were mixed. When Arnd suggested this as an alternative to using the more complex unionfs to overlay a temporary filesystem over a read-only file system, and that similar support could be added to other file systems, it was pointed out that there was ultimately more gained by focusing on a single solution that worked with all filesystems. David Newall stressed, "multiple implementations is a recipe for bugs and feature mismatch." Erez Zadok suggested, "I favor a more generic approach, one that will work with the vast majority of file systems that people use w/ unioning, preferably all of them." He went on to add that more gains would be had from modifying the union destination filesystem rather than multiple source filesystems. Arnd agreed in principle, but noted it would add complexity. He indicated that he'd explore the idea further, then explained:

"My idea was to have it in cramfs, squashfs and iso9660 at most, I agree that doing it in even a single writable file system would add far too much complexity. I did not mean to start a fundamental discussion about how to do it the right way, just noticed that there are half a dozen implementations that have been around for years without getting close to inclusion in the mainline kernel, while a much simpler approach gives you sane semantics for a subset of users."


From: <arnd@...>
Subject: [RFC 0/7] [RFC] cramfs: fake write support
Date: May 31, 11:37 am 2008

Inspired by a discussion with Christoph Hellwig, I tried to
recreate a patch that he did a few years ago to add support
for writing to a mounted cramfs file system. It still has
known problems (and likely unknown ones), but should be
good enough for practical use. I've been able to boot
a full Ubuntu installation from a cramfs image and work with
it normally.

The intention is to use it for instance on read-only root
file systems like CD-ROM, or on compressed initrd images.
In either case, no data is written back to the medium, but
remains in the page/inode/dentry cache, like ramfs does.

Many existing systems currently use unionfs or aufs for this
purpose, by overlaying a tmpfs over a read-only file
system like cramfs, squashfs or iso9660. IMHO, it would
be a much nicer solution to not require unionfs for a simple
case like this, but rather have support for it in the file
system. If people find this useful, we can do the same in
other read-only file system.

Writing to existing files is broken in at least two corner
cases, and I'm still looking for a solution here:

When you truncate an on-disk to make it larger, reading
beyond the old end of the file will make cramfs try to
read from disk instead of filling with zeroes. I'm not sure
if this can be solved without adding additional members to
the inode structure (using a private inode cache) to remember
the end of the on-disk file.

Deleting a preexisting file currently does not free the inode
and page cache for that file, which I assume is easy to fix.

Also, the i_nlink field of directories is always 1, and
has always been on cramfs. Getting the count right should
simplify the code a bit and make it more correct according
to posix, but will cost a bit of performance on 'stat'.

The patch series also lives on
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground.git cramfs

Comments?

	Arnd <><
--

From: David Newall <davidn@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: May 31, 2:56 pm 2008 arnd@arndb.de wrote: > Many existing systems currently use unionfs or aufs for this > purpose, by overlaying a tmpfs over a read-only file > system like cramfs, squashfs or iso9660. IMHO, it would > be a much nicer solution to not require unionfs for a simple > case like this, but rather have support for it in the file > system. If people find this useful, we can do the same in > other read-only file system. > I don't agree that it is nicer to do this in cramfs. I prefer the technique of union of a tmpfs over some other fs because a single solution that works with all filesystems is better than re-implementing the same idea in multiple filesystems. Multiple implementations is a recipe for bugs and feature mismatch. --
From: Arnd Bergmann <arnd@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: May 31, 4:40 pm 2008 On Saturday 31 May 2008, David Newall wrote: > I don't agree that it is nicer to do this in cramfs.  I prefer the > technique of union of a tmpfs over some other fs because a single > solution that works with all filesystems is better than re-implementing > the same idea in multiple filesystems.  Multiple implementations is a > recipe for bugs and feature mismatch. You're right in principle, but unfortunately there is to date no working implementation of union mounts. Giving users the option of using an existing file system with a few tweaks can only be better than than forcing them to use hacks like unionfs. Arnd <>< --
From: David Newall <davidn@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 1, 2:02 am 2008 Arnd Bergmann wrote: > On Saturday 31 May 2008, David Newall wrote: > >> I prefer the technique of union of a tmpfs over some other fs >> > > You're right in principle, but unfortunately there is to date no working > implementation of union mounts. Giving users the option of using an > existing file system with a few tweaks can only be better than than > forcing them to use hacks like unionfs. I've not used unionfs (nor aufs) so I'm not aware of its foibles, but I can say that it's the right kind of solution. Rather than spend effort implementing write support for read-only filesystems, why not put your time into fixing whatever you see wrong with one or both of those? --
From: Jörn <joern@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 1, 12:25 pm 2008 On Sun, 1 June 2008 15:32:50 +0930, David Newall wrote: > > I've not used unionfs (nor aufs) so I'm not aware of its foibles, but I > can say that it's the right kind of solution. Rather than spend effort > implementing write support for read-only filesystems, why not put your > time into fixing whatever you see wrong with one or both of those? There is a strong argument to be made for fixing some problem once instead of N times. But when that solution is M times more complicated, with M being significantly larger than N, said argument becomes rather weak. And having looked at unionfs, I claim that your argument is paper-thin. Jörn -- /* Keep these two variables together */ int bar; --
From: Jan Engelhardt <jengelh@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 1, 5:11 am 2008 On Sunday 2008-06-01 08:02, David Newall wrote: >> >>> I prefer the technique of union of a tmpfs over some other fs >> >> You're right in principle, but unfortunately there is to date no working >> implementation of union mounts. Giving users the option of using an >> existing file system with a few tweaks can only be better than than >> forcing them to use hacks like unionfs. > >I've not used unionfs (nor aufs) so I'm not aware of its foibles, but I >can say that it's the right kind of solution. Rather than spend effort >implementing write support for read-only filesystems, why not put your >time into fixing whatever you see wrong with one or both of those? I have to join in. Unionfs and AUFS may be bigger in bytes than the embedded developer wants to sacrifice, but that is what it takes for a solid implementation that has to deal with things like NFS and mmap. Even so, there is a fs called mini_fo you can try using if you disagree with the size of unionfs/aufs, at the cost of not having support for all corner cases. --
From: Phillip Lougher <phillip@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: May 31, 11:54 pm 2008 Arnd Bergmann wrote: > On Saturday 31 May 2008, David Newall wrote: >> I don't agree that it is nicer to do this in cramfs. I prefer the >> technique of union of a tmpfs over some other fs because a single >> solution that works with all filesystems is better than re-implementing >> the same idea in multiple filesystems. Multiple implementations is a >> recipe for bugs and feature mismatch. > > You're right in principle, but unfortunately there is to date no working > implementation of union mounts. Giving users the option of using an > existing file system with a few tweaks can only be better than than > forcing them to use hacks like unionfs. > I tend to agree with Arnd Bergmann. While I prefer the aesthetic cleanliness of stackable filesystems, the lack of proper stacking support in the Linux VFS makes other techniques necessary. Unionfs is complex and for many embedded systems with constrained resources Unionfs adds a lot of extra overhead. If I read the patches correctly, when a file page is written to, only that page gets copied into the page cache and locked, the other pages continue to be read off disk from cramfs? With Unionfs a page write causes the entire file to be copied up to the r/w tmpfs and locked into the page cache causing unnecessary RAM overhead. Phillip --
From: Jamie Lokier <jamie@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 1, 8:28 am 2008 Phillip Lougher wrote: > If I read the patches correctly, when a file page is written to, only > that page gets copied into the page cache and locked, the other pages > continue to be read off disk from cramfs? With Unionfs a page write > causes the entire file to be copied up to the r/w tmpfs and locked into > the page cache causing unnecessary RAM overhead. Ok, so why not fix that in unionfs? An option so that holes in the overlay file let through data from the underlying file sounds like it would be generally useful, and quite easy to implement. If not unionfs, a "union-tmpfs" combination would be good. Many filesystems aren't well suited to being the overlay filesystem - adding to the implementation's complexity - but a modified tmpfs could be very well suited. -- Jamie --
From: Arnd Bergmann <arnd@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 1, 5:49 pm 2008 On Sunday 01 June 2008, Jamie Lokier wrote: > Ok, so why not fix that in unionfs?  An option so that holes in the > overlay file let through data from the underlying file sounds like it > would be generally useful, and quite easy to implement. I can imagine a lot of unexpected effects with that. Think of e.g. someone replacing the underlying file with a new one. Then enlarge the file using truncate() and read from it -- suddenly you see the old contents instead of zeroes. Probably fixable as well, but certainly not in a nice way. Besides, there are a many more problems with unionfs, which have all been mentioned in the previous review cycles. Aufs doesn't address those either AFAIK, with the exception of at least not making additional copies in the page cache when writing to a file. The real solution of course are VFS based union mounts (think 'mount --union -t tmpfs none /'), but the patches for that are not stable enough for inclusion in mainline yet. > If not unionfs, a "union-tmpfs" combination would be good.  Many > filesystems aren't well suited to being the overlay filesystem - > adding to the implementation's complexity - but a modified tmpfs could > be very well suited. Yes, that is similar to one of my earlier ideas as well. Christoph managed to convince me that it's not as easy as I thought, though I can't remember the exact arguments any more. I'll try to think about that some more. One of the problems is certainly the complexity involved in tmpfs to start with, which is the reason I based the code on ramfs instead. Arnd <>< --

From: Erez Zadok <ezk@...>
Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support 
Date: Jun 2, 12:37 am 2008

> Jan Engelhardt wrote:
> > On Sunday 2008-06-01 08:02, David Newall wrote:
> >>   
> >>> I prefer the technique of union of a tmpfs over some other fs
> >>
> >> You're right in principle, but unfortunately there is to date no working
> >> implementation of union mounts. Giving users the option of using an
> >> existing file system with a few tweaks can only be better than than
> >> forcing them to use hacks like unionfs.
> >
> >I've not used unionfs (nor aufs) so I'm not aware of its foibles, but I
> >can say that it's the right kind of solution.  Rather than spend effort
> >implementing write support for read-only filesystems, why not put your
> >time into fixing whatever you see wrong with one or both of those?
> 
> I have to join in. Unionfs and AUFS may be bigger in bytes than the
> embedded developer wants to sacrifice, but that is what it takes for
> a solid implementation that has to deal with things like NFS and
> mmap. Even so, there is a fs called mini_fo you can try using if
> you disagree with the size of unionfs/aufs, at the cost of not having
> support for all corner cases.

I agree w/ Jan E.

Folks, I've said it before: unioning is a deceptively simple idea in
principle, and &^@%*$&^@ hard in practice.  And anyone who thinks otherwise
is welcome to write a *versatile* unioning implementation on their own.  Once
you get through all corner cases and satisfy all the features which users
want, you have a complex large file system.

I believe that implementing unioning inside actual filesystems is totally the
wrong direction: going to lower layers is wrong, instead of going up to a
VFS-based solution.  Unioning is a namespace operation that should not be
done deep inside a lower f/s.

People often wonder why FScache is (reportedly) so complex and big.  It's
b/c in some part it has to deal with similar issues: unioning is
copy-on-write, whereas caching is copy-on-read.

Nevertheless, I can understand if the embedded community wants lightweight
unioning.  Union Mounts initially may not support everything that unionfs
does, but it should be smaller, and it should be enough I believe for the
basic unioning uses --- perhaps even for the embedded community.  If so,
then I suggest people offer to help Bharata and Jan Blunk's efforts, rather
than [sic] cramming unioning into a single file system.

Erez.
--

From: Jan Engelhardt <jengelh@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 2, 3:17 am 2008 On Monday 2008-06-02 06:37, Erez Zadok wrote: >> Jan Engelhardt wrote: >> > On Sunday 2008-06-01 08:02, David Newall wrote: >> >> >> >>> I prefer the technique of union of a tmpfs over some other fs >> >> >> >> You're right in principle, but unfortunately there is to date no working >> >> implementation of union mounts. Giving users the option of using an >> >> existing file system with a few tweaks can only be better than than >> >> forcing them to use hacks like unionfs. >Folks, I've said it before: unioning is a deceptively simple idea in >principle, and &^@%*$&^@ hard in practice. And anyone who thinks otherwise >is welcome to write a *versatile* unioning implementation on their own. Once >you get through all corner cases and satisfy all the features which users >want, you have a complex large file system. >[...] To the original posters: I urge those who do believe {au,union}fs is too fat to go and build their unioning into their on-disk filesystems, then let users run it (remark: iff you can convince (or force) them why they should not be using existing fs), let users report issues and iron it out for perhaps 2-3 years, and then see how much your implementation has grown. That is, if you actually added code (see remark 1). About last year (June 2007), SLAX sought a solution that enhances VFAT with UNIX permissions -- much like the old umsdosfs. A kernel solution was initially preferred by Tomas (SLAX developer), yet I (who got to write posixovl then) went for FUSE. It was about 20 KB when it was moderately usable. The end result? Posixovl is a 46 KB C file today. For userspace code. I bet it would be much more if it was in-kernel. Take that as a hint when developing your fs-specific unioning. --
From: Bharata B Rao <bharata.rao@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 2, 2:07 am 2008 On Mon, Jun 2, 2008 at 10:07 AM, Erez Zadok <ezk@cs.sunysb.edu> wrote: > > Nevertheless, I can understand if the embedded community wants lightweight > unioning. Union Mounts initially may not support everything that unionfs > does, but it should be smaller, and it should be enough I believe for the > basic unioning uses --- perhaps even for the embedded community. If so, > then I suggest people offer to help Bharata and Jan Blunk's efforts, rather > than [sic] cramming unioning into a single file system. > Though Union Mount effort has become slow and silent lately, some of us are still working on it. While I worked on readdir support lately, Jan Blunck and David Woodhouse are working on having a generic whiteout support for linux. Talking about help, Union Mount effort could take a generous help in getting directory listing implementation right. We first tried to handle duplicate elimination (during readdir) inside the kernel entirely. The outcome was neither clean nor efficient. (http://lkml.org/lkml/2007/12/5/147). Then there was a suggestion to push the duplicate elimination to userspace. When that was tried out (http://lkml.org/lkml/2008/4/29/248), we were told that NFS support is going to be an issue. (BTW NFS support is going to be an issue irrespective of where directory listing is implemented: kernel or userspace). Some insights into feasibility of supporting NFS with Union Mount from people who understand NFS better would be very helpful. Regards, Bharata. -- http://bharata.sulekha.com/blog/posts.htm --

From: Erez Zadok <ezk@...>
Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support 
Date: Jun 1, 11:25 pm 2008

Arnd Bergmann:
> Besides, there are a many more problems with unionfs, which have
> all been mentioned in the previous review cycles. Aufs doesn't
> address those either AFAIK, with the exception of at least
> not making additional copies in the page cache when writing to
> a file.

Correction: Unionfs doesn't make additional copies in the page cache.

Arnd, I favor a more generic approach, one that will work with the vast
majority of file systems that people use w/ unioning, preferably all of
them.  Supporting copy-on-write in cramfs will only help a small subset of
users.  Yes, it might be simple, but I fear it won't be useful enough to
convince existing users of unioning to switch over.  And I don't think we
should add CoW support in every file system -- the complexity will be much
more than using unionfs or some other VFS-based solution.

I can see some advantages (re: cache coherency) by hacking CoW support
directly into a f/s.  If you want to use a filesystem-specific solution,
then I suggest you don't modify a file system used as a source in a union,
but one used as a destination.  You'll have better overage that way.  The
vast majority of times, unionfs users will either write to tmpfs or ext2;
but the source readonly f/s can be a lot of different ones (most popular are
ext*, nfs*, isofs, and cramfs/squashfs).

I find it somewhat ironic to hear the argument that "union mounts isn't
stable yet, so lets come up with a new solution inside cramfs."  Why should
your solution become stable much faster than union mounts (which also had
patches floating around for a long time already).

If you have cycles to spare, why not help Bharata and Jan?

Cheers,
Erez.
--

From: Arnd Bergmann <arnd@...> Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support Date: Jun 2, 3:51 am 2008 On Monday 02 June 2008, Erez Zadok wrote: > Correction: Unionfs doesn't make additional copies in the page cache. Ok, I must have misunderstood something there. Sorry about that. > Arnd, I favor a more generic approach, one that will work with the vast > majority of file systems that people use w/ unioning, preferably all of > them.  Supporting copy-on-write in cramfs will only help a small subset of > users.  Yes, it might be simple, but I fear it won't be useful enough to > convince existing users of unioning to switch over.  And I don't think we > should add CoW support in every file system -- the complexity will be much > more than using unionfs or some other VFS-based solution. My idea was to have it in cramfs, squashfs and iso9660 at most, I agree that doing it in even a single writable file system would add far too much complexity. I did not mean to start a fundamental discussion about how to do it the right way, just noticed that there are half a dozen implementations that have been around for years without getting close to inclusion in the mainline kernel, while a much simpler approach gives you sane semantics for a subset of users. > I can see some advantages (re: cache coherency) by hacking CoW support > directly into a f/s.  If you want to use a filesystem-specific solution, > then I suggest you don't modify a file system used as a source in a union, > but one used as a destination.  You'll have better overage that way.  The > vast majority of times, unionfs users will either write to tmpfs or ext2; > but the source readonly f/s can be a lot of different ones (most popular are > ext*, nfs*, isofs, and cramfs/squashfs). Yes, that absolutely makes sense. I don't care much about a persistant storage for the overlay, so tmpfs (if not ramfs) should be the only place to do it in. It does introduce some of the same old problems though, because you could still write to a bind mounted copy of the underlying file system (unlike cramfs, which is guaranteed to be read-only), which forces you to either to a full copy-up, or can result in inconsistent file contents. Also, stacking multiple union-tmpfs copies on top of each other would be hard to do without the potential to overflow the kernel stack. I'll probably try implementing a '-o union' option tmpfs anyway, just to see how hard it is and what the problems are. > I find it somewhat ironic to hear the argument that "union mounts isn't > stable yet, so lets come up with a new solution inside cramfs."  Why should > your solution become stable much faster than union mounts (which also had > patches floating around for a long time already). Because the patches are not trying to solve any of the hard problems at all: Persistent storage of overlays, readdir traversal through more than two layers, stable inode numbers, opening a file through two different overlays, copyup, and so on. I'm sure you know more about these problems that I do, but as long as I don't have to care about them, I don't see a problem with my patches (other than the bugs I already described). > If you have cycles to spare, why not help Bharata and Jan? I spent a lot of time on discussing the initial implementation with Jan years ago, and will keep reviewing their patches, but I have neither the time nor the brains to really contribute much to them. As you mentioned in your reply to Jan E., it's on an entirely different scale than doing a small hack to cramfs or tmpfs. Arnd <>< --


What is good for?

June 10, 2008 - 10:46am
Anonymous (not verified)

In laymen terms, what would this be good for?

Embedded systems

June 10, 2008 - 12:07pm

For an embedded file system you may want a configured file system with sensible defaults which can never the less be changed should the user tweak a setting. The alternative is to mount a ram disk in the "volatile" part of your file system and copy templates across from your ro fs. This will waste RAM if most of those templates are never touched during the course of running the system.

--
Alex

just read the first two sentences!

June 10, 2008 - 3:07pm
sileNT (not verified)

Just read the first two sentences, i.e.: "the intention is to use it for instance on read-only root file systems like CD-ROM, or on compressed initrd images."

Essentially it is as it

June 10, 2008 - 7:00pm
Nony Mouse (not verified)

Essentially it is as it says, write support for read-only filesystems.

It works sort of like a union mount of tmpfs over a read-only filesystem, except it is implemented in the underlying read-only filesystem code such as iso9660 rather than providing a union mount.

The discussion is largely about wether or not this is a better idea than union mounts or union filesystems, the main argument being that both union mounts and filesystems are not very robust, with some saying it would be better just to fix union mounts. Implementing it into the filesystem is slightly more light-weight than stacking a filesystem, possibly more robust but probably not, isn't modular and I dread to think what would happen if you union mounted a read-only filesystem which already had write support.

Why not union mounts?

June 11, 2008 - 7:58am
Lawrence D'Oliveiro (not verified)

What exactly is it about union mounts that are not very "robust"? It seems to me that the union of a writable tmpfs with any read-only system would give you exactly the same capability.

Yes, it is exactly the same

June 12, 2008 - 7:52pm
Nony Mouse (not verified)

Yes, it is exactly the same functionality as a union mounted tmpfs over a read-only filesystem, but the implementation differs considerably.

If only you could read!

June 11, 2008 - 8:41am
Anonymous (not verified)

The intention is to use it for instance on read-only root file systems like CD-ROM, or on compressed initrd images.

RTFM!

I don't understand

June 11, 2008 - 1:28pm
Anonymous (not verified)

Okay, but I still do not understand what it is good for.

Useful for embedded apps

June 22, 2008 - 9:37am

In the big picture it's not that useful, which is why we've been able to get away with it for so long. But imagine an embedded application - say, a cellphone with a web browser. You don't want the browser to write its cache to flash or else you'll wear it out and the phone becomes a brick. Instead, you want to allocate a fixed amount of RAM as a filesystem and layer it on top of /home. Now, /home already has a bunch of configuration files that you don't want to be hidden - you just want all writes to be non-persistent. This way, the important data stays on flash, apps can write to disk unmodified and not wear out the flash, and you didn't have to go through each program and tune it for this environment.

Union of SSD and HD

June 11, 2008 - 5:20am
Anonymous (not verified)

Is there some projet for union between SSD device and simple HD to use the SSD for there very low latency and the HD for the udge space available ? (SSD could receive all meta data and the HD everything else)

Bingo, this seems rather a

June 11, 2008 - 10:06am
F (not verified)

Bingo, this seems rather a nice question/idea. Hope someone more knoledgeable or informed answers. Meanwhile i have the feeling that this kind of decision (on where to put files ecc) should be done on block level not filesystem one though ... probably with autotuning and based on scheduliing decisions .... whatever, seems a nice territory to explore.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary