Yeah, I understand that and I've been also thinking for some time whether
I cannot avoid implementing block reservation but I haven't come up with
anything really acceptable. Moreover, unless we write via mmap to a sparse
file, the code paths taken are changed only a little (only when and how
we account for allocated blocks)...
Well, mmap beyond EOF is still undefined AFAIK (although Linux
traditionally supports it) but mmap of sparse files was always supposed
to work. My favorite user of sparse-file mmap is Berkeley DB, some torrent
clients do that as well and I believe there are others. So it's not the most
common thing but it happens often enough.
This is no-go IMHO. We would surely get lots of users complaining...
Doing allocation at mmap time does not really work - on each mmap we
would have to map blocks for the whole file which would make mmap really
expensive operation. Doing it at page-fault as you suggest in (2a) works
(that's the second plausible option IMO) but the increased fragmentation
and thus loss of performance is rather noticeable. I don't have current
numbers but when I tried that last year Berkeley DB was like two or three
times slower.
In your (2b) suggestion, I don't see how we would avoid leaking allocated
blocks when we crash before writing allocation to indirect block. Also the
fragmentation problem which seems to be the main source of performance
issues would stay the same.
Here again I see the problem that mapping all file blocks at mmap time
is rather expensive and so does not seem viable to me. Also the
overestimation of needed blocks could be rather huge.
I'm aware of this. Actually, the user observable differences should be
rather minimal. The only one I'm aware of is that you can get SIGSEGV at
page fault time because the filesystem runs out of disk space (or out of
disk quota) which seems better than throwing away the data later. Also I
don't think anybody serious runs systems close to ENOSPC regularly and if
that happens accidentally, manual intervention is usually needed anyway...
Thanks for your ideas!
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html