In message <Pine.LNX.4.64.0710222019020.23513@blonde.wat.veritas.com>, Hugh Dickins writes:
What's the precise semantics of AOP_WRITEPAGE_ACTIVATE? Is it considered an
error or not? If it's an error, then I usually feel that it's important for
a stacked f/s to return that error indication upwards.
The unionfs page and the lower page are somewhat tied together, at least
logically. For unionfs's page to be considered to have been written
successfully, the lower page has to be written successfully. So again, if
the lower f/s returns AOP_WRITEPAGE_ACTIVATE, should I consider my unionfs
page to have been written successfully or not? If I don't return
AOP_WRITEPAGE_ACTIVATE up, can there be any chance that some vital data may
never get flushed out?
Anyway, now that unionfs has ->writepages that won't bother calling ->write
for file systems with BDI_CAP_NO_WRITEBACK, the issue of
AOP_WRITEPAGE_ACTIVATE in ->writepage may be less important.
Based on vfs.txt (which perhaps should be revised :-), I was trying to do
the best I can to ensure that no data is lost if the current page cannot be
written out to the lower f/s.
I used to do grab_cache_page() before, but that caused problems: writepage
is not the right place to _increase_ memory pressure by allocating a new
page...
One solution I thought of is do what ecryptfs does: keep an open struct file
in my inode and call vfs_write(), but I don't see that as a significantly
cleaner/better solution. (BTW, ecrypfts kinda had to go for vfs_write b/c
it changes the data size and content of what it writes below; unionfs is
simpler in that manner b/c it needs to write the same data to the lower file
at the same offset.)
Another idea we've experimented with before is "page pointer flipping." In
writepage, we temporarily set the page->mapping->host to the lower_inode;
then we call the lower writepage with OUR page; then fix back the
page->mapping->host to the upper inode. This had two benefits: first we can
guarantee that we always have a page to write below, and second we don't
need to keep both upper and lower pages (reduces memory pressure). Before
we did this page pointer flipping, we verified that the page is locked so no
other user could be written the page->mapping->host in this transient state,
and we ensured that no lower f/s was somehow caching the temporarily changed
value of page->mapping->host for later use. But, mucking with the pointers
in this manner is kinda ugly, to say the least. Still, I'd love to find a
clean and simple way that two layers can share the same struct page and
cleanly pass the upper page to a lower f/s.
If you've got suggestions how I can handle unionfs_write more cleanly, or
comments on the above possibilities, I'd love to hear them.
Yup. ecryptfs no longer does that: it recently changed things and now it
stores and open struct file in its inode, so it can always pass the file to
vfs_write. This nicely avoids calling the lower writepage, but one has to
keep an open file for every inode. Neither the solutions employed currently
by unionfs and ecryptfs seem really satisfactory (clean and efficient).
Thanks,
Erez.
-