On Dec 11, 2006, at 22:45:25, Daniel Barkalow wrote:Hmm, ok. It would seem to be a reasonable requirement that if you want to change any of the "preserve_*_attributes" config options you need to blow away and recreate your index, no? I would probably change the underlying index format pretty completely and stick a new version tag inside it. Ahh, I hadn't thought of it that way before but that makes a lot of sense. Thanks! Ok, seems straightforward enough. One other thing that crossed my mind was figuring out how to handle hardlinks. The simplest solution would be to add an extra layer of indirection between the "file inode" and the "file data". Instead of your directory pointing to a "file-data" blob and "file-attributes" object, it would point to an "file-inode" object with embedded attribute data and a pointer to the file contents blob. I remember reading some discussions from the early days of GIT about how that was considered and discarded because the extra overhead wouldn't give any real tangible benefit. On the other hand for something like /etc the added benefits of tracking extended attributes and hardlinks might outweigh the cost of a bunch of extra objects in the database. A bit of care with the construction of the index file should make it sufficiently efficient for day-to-day usage. If you're interested in some random musings about using GIT concepts to version whole filesystems (think checkpointing your disk drive and instantly restoring when you screw up), read on below, otherwise don't bother. Cheers, Kyle Moffett <Random Tangential Off-the-Wall Thought Experiment> NOTE: This probably belongs in it's own thread but it's such a random, undeveloped, and off-the-wall concept that I threw it in here just for kicks. Combining extensions like those described above with something like the Ext3 block-allocation, inode-management and journalling code to produce a "versioned filesystem". With the exponential growth of storage density over the last several years we've gotten to the point where we can many many hours of extremely realistic video and audio on your average small-computer drive. Versioning your home directory, or even your entire computer, even with fairly steady modifications to multimedia files, installation of software programs, etc, doesn't seem like such an impossible undertaking anymore. One predefined inode would contain a list of tags/heads and their current hashes. Mount the filesystem with a "tag=$TAG" option to specify the initial tree object used for the root directory (with syscalls to navigate the history). Allocate an inode per-mount to represent any changes from the last commit. For efficiency purposes (no need to revision the entire system when I commit a change in my home directory) add a "subtree" object type which can specify either a particular hash or a symbolic tag/head name as a pseudo sub-mountpoint. Trap traversal of the sub- mountpoint node to mount the filesystem with "tag=$SUBTAG" on the sub- mountpoint, expiring it some time after the last traversal. The only remaining issue would be properly navigating through the history, preserving or discarding changes. Since the kernel could easily manage copy-on-write semantics for underlying disk blocks you wouldn't need a separate "working copy" except where it's modified from the original, and discarding changes is as simple as unlinking any files referenced by the per-mount delta inode. Committing changes would get tricky, you would need to hot-remap memory-mapped pages read-only while you checksum and store them. The next write attempt would then separate the page from the freshly- committed on-disk version. Would need a mechanism for applications to "trap" the commit so they could make databases consistent, with the ability for root or the mountpoint owner to commit without waiting for synchronization. Only needs to synchronize files belonging to the new commit. Merges would be managed from userspace, as long as there is a way to browse through objects by hash given sufficient permissions. Make sure it's really easy to make a new atomic commit and/or reset to a known state every time the computer is rebooted (whether soft- rebooted or via crash/powerkill). With journalling and the write- once nature of GIT it would be trivial to never require an fsck run. Also needs a way to move data between filesystems. Makes LVM largely irrelevant; it doesn't matter how many disks you have if they're all treated as a shared storage pool for your GITfs data. Make sure it's possible to archive data onto slower disks/media and purge older commits from the archive (missing parent commit references are tolerable in many situations). Needs a way to notice hash collisions and take action to avoid them. </Random Tangential Off-the-Wall Thought Experiment> Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| H. Peter Anvin | Re: [RFC 00/15] x86_64: Optimize percpu accesses |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Ingo Molnar | [announce] "kill the Big Kernel Lock (BKL)" tree |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Ben Hutchings | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH iproute2] Re: HTB accuracy for high speed |
