On Sep 15, 2007, at 13:24:46, Andreas Dilger wrote:I really think that to get proper non-block-device-level filesystem redundancy you need to base it on something similar to the GIT model. Data replication is done in specific-sized chunks indexed by SHA-1 sum and you actually have a sort of "merge algorithm" for when local and remote changes differ. The OS would only implement a very limited list of merge algorithms, IE one of: (A) Don't merge, each client gets its own branch and merges are manual (B) Most recent changed version is made the master every X-seconds/ open/close/write/other-event. (C) The tree at X (usually a particular client/server) is always used as the master when there are conflicts. This lets you implement whatever replication policy you want: You can require that some files are replicated (cached) on *EVERY* system, you can require that other files are cached on at least X systems. You can say "this needs to be replicated on at least X% of the online systems, or at most Y". Moreover, the replication could be done pretty easily from userspace via a couple syscalls. You also automatically keep track of history with some default purge policy. The main point is that for efficiency and speed things are *not* always replicated; this also allows for offline operation. You would of course have "userspace" merge drivers which notice that the tree on your laptop is not a subset/superset of the tree on your desktop and do various merges based on per-file metadata. My address-book, for example, would have a custom little merge program which knows about how to merge changes between two address book files, asking me useful questions along the way. Since a lot of this merging is mechanical, some of the code from GIT could easily be made into a "merge library" which knows how to do such things. Moreover, this would allow me to have a "shared" root filesystem on my laptop and desktop. It would have 'sub-project'-type trees, so that "/" would be an independent branch on each system. "/etc" would be separate branches but manually merged git-style as I make changes. "/home/*" folders would be auto-created as separate subtrees so each user can version their own individually. Specific subfolders (like address-book, email, etc) would be adjusted by the GUI programs that manage them to be separate subtrees with manual- merging controlled by that GUI program. Backups/dumps/archival of such a system would be easy. You would just need to clone the significant commits/trees/etc to a DVD and replace the old SHA-1-indexed objects to tiny "object-deleted" stubs; to rollback to an archived version you insert the DVD, "mount" it into the existing kernel SHA-1 index, and then mount the appropriate commit as a read-only volume somewhere to access. The same procedure would also work for wide-area-network backups and such. The effective result would be the ability to do things like the following: (A) Have my homedir synced between both systems mostly- automatically as I make changes to different files on both systems (B) Easily have 2 copies of all my files, so if one system's disk goes kaput I can just re-clone from the other. (C) Keep archived copies of the last 5 years worth of work, including change history, on a stack of DVDs. (D) Synchronize work between locations over a relatively slow link without much work. As long as files were indirectly indexed by sub-block SHA1 (with the index depth based on the size of the file), and each individually- SHA1-ed object could have references, you could trivially have a 4TB- sized file where you modify 4 bytes at a thousand random locations throughout the file and only have to update about 5MB worth of on- disk data. The actual overhead for that kind of operation under any existing filesystem would be 100% seek-dominated regardless whereas with this mechanism you would not directly be overwriting data and so you could append all the updates as a single 5MB chunk. Data reads would be much more seek-y, but you could trivially have an on-line defragmenter tool which notices fragmented commonly-accessed inode objects and creates non-fragmented copies before deleting the old ones. There's a lot of other technical details which would need resolution in an actual implementation, but this is enough of a summary to give you the gist of the concept. Most likely there will be some major flaw which makes it impossible to produce reliably, but the concept contains the things I would be interested in for a real "networked filesystem". Cheers, Kyle Moffett -
| Linus Torvalds | Linux 2.6.27-rc8 |
| Rafael J. Wysocki | 2.6.26-rc9-git12: Reported regressions from 2.6.25 |
| Jarod Wilson | [PATCH 05/18] lirc driver for i2c-based IR receivers |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Andrew Morton | email address handling |
| Pierre Habouzit | Re: [RFC] Re: Convert 'git blame' to parse_options() |
| Linus Torvalds | Re: What's in git.git (stable), and Announcing GIT 1.4.4.3 |
| Brian Gernhardt | Re: [FIXED PATCH] Make rebase save ORIG_HEAD if changing current branch |
| Leon Dippenaar | New tcp stack attack |
| Marco Peereboom | Re: Real men don't attack straw men |
| Richard Storm | MAXDSIZ 1GB memory limit for process |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Patrick McHardy | [NET_SCHED 00/04]: External SFQ classifiers/flow classifier |
| Arjan van de Ven | Re: [GIT]: Networking |
| Ingo Oeser | Re: [PATCH]: Third (final?) release of Sun Neptune driver |
| Linus Torvalds | Re: [GIT]: Networking |
