I've created an updated quota design document here: https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4 No major changes from last time. One new thing is a proposed (optional) change to the quota format, to use the 32-bit dqpb_pad field in the v2r1 on-disk quota structure as a 32-bit CRC of the quota entry. This would allow the quota system to detect corrupted quota entries. Jan, what do you think? - Ted --
Hi, It might be reasonable to checksum dquots so that we get closer to all-metadata-are-checksummed state. I'm just thinking whether checksumming each dquot is so useful. For example OCFS2 checksums each quota block. That has an advantage that also quota file tree blocks and headers are protected. Also it's possible to use the generic block checksumming framework in JBD2 for this case. OTOH ext4 seems to have chosen to checksum each group descriptor individually so checksumming each dquot structure would seem more consistent. So I don't have a strong opinion which checksumming scheme should be chosen. I just wanted to point out that there's another reasonable option. Generic quota code can easily handle both (including leaving some bytes at the end of each block for checksums as it does for OCFS2 now). Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR --
Well, the reason why I suggested just checksuming the each quota entry is that it was the simplest thing to do, and wouldn't require making huge changes to the rest of the quota_tree code. It also means we don't need to do any kind of special locking to make sure there isn't another process modifying another quota entry in the same block at the same time that we are calculating the per-block checksum --- i.e., I assume OCFS2 is just using dqdh_pad2 or dqdh_pad1 for its checksum? - Ted --
With metadata which get journaled it should be quite easy. JBD already must know before you go and modify buffer contents - that's why journal_get_write_access and friends exist. It also makes sure that your data cannot be modified from the moment the buffer enters commit upto the moment the commit is finished. So you can use buffer commit hook to compute No. quota_tree code sets info->dqi_usable_bs to something smaller than 1 << info->dqi_qtree_depth. Thus quota code leaves a few bytes in each block unused and ocfs2 stores there the checksum. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR --
True, but we're also interested in making sure this feature can be
used in the non-journal case as well....
- Ted
--
Ah, I forgot about that... Doing per-block checksums in that case would be indeed more complicated. So computing checksum for each dquot is probably simpler. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR --
Q:What is use case for that non-journal quota ? A: ASAIU answer will "GFS chunkservers" Are any chances that quota will be consistent with real space usage after any failure? Currently difference may be huge. BTW: ASAIU that it is not safe to use unclean fs in nojournal mode without explicit e2fsck. And ASAIU that is the reason why nojournal users use replication or any other redundancy mechanism to protect data and just throw away broken data after any failure on a single node. --
