I've recently become somewhat interested in the idea of using GIT to store the contents of various folders in /etc. However after a bit of playing with this, I discovered that GIT doesn't actually preserve all permission bits since that would cause problems with the more traditional software development model. I'm curious if anyone has done this before; and if so, how they went about handling the permissions and ownership issues. I spent a little time looking over how GIT stores and compares permission bits; trying to figure out if it's possible to patch in a new configuration variable or two; say "preserve_all_perms" and "preserve_owner", or maybe even "save_acls". It looks like standard permission preservation is fairly basic; you would just need to patch a few routines which alter the permissions read in from disk or compare them with ones from the database. On the other hand, it would appear that preserving ownership or full POSIX ACLs might be a bit of a challenge. Thanks for your insight and advice! Cheers, Kyle Moffett -
It's a great idea, something I would like to do, and something I've suggested before. You could dig through the mailing list archives, if you're motivated. I actively use git to version, store and distribute an exim mail configuration across six servers. So far my solution has been a 'fix perms' script, or using the file perm checking capabilities of cfengine. But it would be a lot better if git natively cared about ownership and permissions (presumably via an option). Jeff -
I have not used it, but you could try: http://www.isisetup.ch/ that uses git as a backend. Santi -
I want to have a tripwire-like system checking the files to make sure that they haven't changed unexpectedly. the program I'm looking at notices inode as well as timestamp and content changed. when you checkout a file from git will it re-write/overwrite a file that hasn't changed or will it realize there is no change and leave it as-is? does this answer change if there is a trigger on checkout (to change permissions or otherwise manipulate the file)? David Lang -
If the stat data is current it will leave it as-is. You can force the index to refresh with `git update-index --refresh` or by running Only if the trigger does something in addition, like force overwrite files. But we don't have a checkout trigger. So there's no trigger. -- Shawn. -
I was looking at checkout, not checkin so I'm not understanding how the index is we don't have a checkout trigger? I thought that what Linus had suggested for permissions was to have a script triggered on checkin that stored the permissions of the files, and a script triggered on checkout that set the permissions from the stored file. if there isn't a checkout trigger how would the permissions ever get set? in my particular case I'd like to have the checkin run a script that produces a 'generic' version of each file, and the checkout run a script that converts the generic version into the host specific version. I already have a script that does this work (and (ab)uses ssh to propogate the generic version to other hosts and create the host specific versions there), but I was interested in useing git to add better version control to the generic versions of the files (I currently use RCS on each box to version control the host specific versions) David Lang -
During checkout we use the index to help us decide if a file needs to be updated with new content or can be left as-is. Its a cache of what version each file is at, and its based on the file stat data (dev, inode, modification date, etc.) to tell us if the file has been modified or was last created by Git. If Git was the one that last modified the file and the version stored in the index matches the version needed during the checkout, the file is left alone. Someone needs to implement support for a post-checkout trigger. _Then_ You may be able to do that in the pre-commit hook by updating the index -- Shawn. -
I keep the files I want to track in a separate folder that I track with Git and use a Makefile for updating /etc. I basically have a rule for checking for differences between the tracked folder and /etc and a rule for installing changed files (with the correct permissions). It works, but it does require some "Makefile magic" to work right (or the way /I/ want it anyway). nikolai -
The first thing you'd want to do is correct the fact that the index doesn't keep full permissions. We decided long ago that we don't want to track more than 0100, but we're discarding the rest between the filesystem and the index, rather than between the index and the tree. (This is weird of us, since we keep gid and uid in the index, as changedness heuristics, but don't keep permissions; of course, we'd have to apply umask to the index when we check it out to sync what we expect to be there with what has actually been created.) I think that would be the only change needed to the index and index/working directory connection, although it might be necessary to support longer values for uid/gid/etc, since they'd be important data now. Note that git only stores content, not incidental information. But a lot of information which is incidental in a source tree is content in /etc. This implies that /etc and working/linux-2.6 are fundamentally different sorts of things, because different aspects of them are content. I'd suggest a new object type for a directory with permissions, ACLs, and so forth. It should probably use symbolic owner and group, too. My guess is that you'll want to use "commit"s, the new object type, and "blob"s. Everything that uses trees would need to have a version that uses the new type. But I think that you generally want different behavior anyway, so that's not a major issue. -Daniel *This .sig left intentionally blank* -
Hmm, ok. It would seem to be a reasonable requirement that if you want to change any of the "preserve_*_attributes" config options you need to blow away and recreate your index, no? I would probably change the underlying index format pretty completely and stick a new Ahh, I hadn't thought of it that way before but that makes a lot of Ok, seems straightforward enough. One other thing that crossed my mind was figuring out how to handle hardlinks. The simplest solution would be to add an extra layer of indirection between the "file inode" and the "file data". Instead of your directory pointing to a "file-data" blob and "file-attributes" object, it would point to an "file-inode" object with embedded attribute data and a pointer to the file contents blob. I remember reading some discussions from the early days of GIT about how that was considered and discarded because the extra overhead wouldn't give any real tangible benefit. On the other hand for something like /etc the added benefits of tracking extended attributes and hardlinks might outweigh the cost of a bunch of extra objects in the database. A bit of care with the construction of the index file should make it sufficiently efficient for day-to-day usage. If you're interested in some random musings about using GIT concepts to version whole filesystems (think checkpointing your disk drive and instantly restoring when you screw up), read on below, otherwise don't bother. Cheers, Kyle Moffett <Random Tangential Off-the-Wall Thought Experiment> NOTE: This probably belongs in it's own thread but it's such a random, undeveloped, and off-the-wall concept that I threw it in here just for kicks. Combining extensions like those described above with something like the Ext3 block-allocation, inode-management and journalling code to produce a "versioned filesystem". With the exponential growth of storage density over the last several years we've gotten to the point ...
I wonder if git's skill at managing content is the answer? Rather than mess around with git's internals, the index, or the object database; how about simply having a pre-commit script that writes out a file that looks like: -rw-r--r-- andyp andyp CHANGES -rw-r--r-- andyp andyp COPYING -rw-rw-r-- andyp andyp CREDITS -rw-r--r-- andyp andyp Configure -rw-rw-r-- andyp andyp Makefile -rw-r--r-- andyp andyp README If /that/ file were stored in the repository and you had a script that could read that file and apply the permissions after a checkout you'd have what you want. If the permissions of a file changed but the content didn't, then this ".gitpermissions" file would have changed content but the file itself would remain the same. If the content changed but not the permissions then ".gitpermissions" would be untouched. Assuming that you're allowed to mess with the index in pre-commit (I haven't checked), one half of it can be automatic. I suppose you could also plead for a post-checkout hook to apply those permissions and the whole lot would be transparent. Andy -- Dr Andy Parkins, M Eng (hons), MIEE andyparkins@gmail.com -
This discussion reminds me of a use of git I've had in the back of my head to try out for a while. Right now I'm doing my local snapshot backups using the rsync-with-hard-links scheme (http://www.mikerubel.org/computers/rsync_snapshots/ if you're not familiar with it). This is nice in that the contents of files that don't change are only stored once on the backup disk. But it is less than optimal in that a file that changes even a little bit is stored from scratch. What would be great for this would be to store each day's backup as a git revision; with a periodic repack, this would be much more space-efficient than the rsync hard links. The problem is that while that would give me a very efficient backup scheme, the repository would still grow over time. In rsync land, I solve the disk space issue by keeping two weeks' worth of daily snapshots, then six months' worth of weekly snapshots, then two years' worth of monthly snapshots; files that change daily have a constant number of revisions stored in my backups, and older files drop off the backup disk as they age. Given that there's no way (or is there?) to delete revisions from the *beginning* of a git revision history, right now it seems like the only approach that comes close is to give up on the "daily then weekly then monthly" thing -- probably fine given the space savings of delta compression -- and periodically make shallow clones of the backup repository that fetch all but the first N revisions; once a shallow clone is made, the original gets deleted and the clone is the new backup repo. But it would sure be more efficient to be able to "shallow-ize" an existing repository. That would be useful for things other than backups, too, e.g. the recent request for some way to track just the current version of the kernel code rather than its revision history. If there were a shallowize command, you could do something like "git pull; git shallowize --depth 1" to track the latest revision without ...
Hi, Almost! $ git pull --depth 1 Though it needs a server _and_ a client supporting shallow clones, which support is brewed in "next" right now. Ciao, Dscho -
Will that actually discard old revisions that are already stored locally? -Steve -
Hi, No. A pull should _never_ lose anything from the repository. However, if some objects become no-longer reachable (and at the moment it looks like we cut of history, even if we should not need to), they can be pruned from the repo. Hth, Dscho -
Steven, I've been thinking myself of writing a pdumpfs lookalike that uses git internally. Sounds you you've got one already ;-) In terms of getting rid of old history, have you considered moving a graft point "forward" in time, and running git-repack -a -d? With your history being (mostly?) linear this could be a workable scheme, but I don't have much practice with using grafts. cheers, martin -
Actually - what I was considering was mixing the "daily commit" with GITFS ;-) http://www.sfgoth.com/~mitch/linux/gitfs/ are your scripts published anywhere? cheers, martin -
Why not use N independent branches? I'd illustrate only with
two levels below, but you could:
(0) make a full tree snapshot. Store the commit in 'daily'
branch as its tip.
(1) A new day comes. Create an empty branch 'daily' if you
do not already have one. Make a full tree snapshot, and
create a parentless commit for the day if the 'daily'
branch did not exist, or make it a child of the 'daily'
commit from the previous day if the branch existed.
(2) End of week comes. Create an empty branch 'weekly' if you
do not already have one. Make a full tree snapshot, and
create a parentless commit for the week if the 'weekly'
branch did not exist, or make it a child of the 'weekly'
commit from the last week. Discard 'lastweek' branch if
you have one, and rename 'daily' branch to 'lastweek'.
At the end of month, you can rename 'weekly' to 'lastmonth'; if
you discard previous 'lastmonth' at this point, you essentially
made files older than two months drop off the backup disk. You
can add more hierarchy with longer period to extend the scheme
ad infinitum.
-
That sounds like it'd work, but doesn't it imply that the history of a
given file in the backups is not continuous? That is, an old copy of a
file on the "weekly" branch doesn't have any kind of ancestor
relationship with the same file on the "daily" branch? While that's
obviously no different than the current git-less situation where there's
no notion of ancestry at all, it'd be neat if this backup scheme could
actually track long-term changes to individual files.
I wonder if rebasing can get me what I want. Something like:
(1) Make a new branch from the latest daily. Commit a full tree
snapshot to the new branch. (Each branch has exactly one commit.)
(2) To expire a daily backup, rebase the second-oldest daily branch,
which will initially be a child of the oldest daily branch, under
the latest weekly branch instead. Delete the oldest daily branch.
I believe the right commands here would be:
git-rebase -s recursive -s ours --onto latest-weekly \
oldest-daily second-oldest-daily
git-branch -D oldest-daily
(Not sure about the double "-s", but I want it to detect renames
where possible and never flag any conflicts.)
(3) At the end of the week, instead of expiring the oldest daily
branch, rename it to indicate that it's now a weekly snapshot.
(That will implicitly do the first part of step 2, since the
next daily branch in line will already be a descendant of the
newly renamed branch.)
Repeat step 2, rebasing against the latest monthly branch,
to expire the oldest weekly.
(4) To expire an old monthly, rebase the second-oldest monthly branch
under the initial empty revision, then delete the oldest monthly.
This is basically step 2 again, but rebasing under a fixed starting
point.
(5) Run git-prune to expire the objects in the deleted branches, then
git-repack -a -d to delta-compress everything.
That's a bit convoluted, admittedly, and probably a perversion of ...You can keep them connected by rewriting history of bounded
number of commits. When you start a new week, you would make
the Monday commit a child of the tip of weekly branch that
represents the latest weekly shapshot. Then on Friday, the
history would show the 5 commits during the week and behind that
would be a sequence of commits with one-per-week granularity.
When you rotate the week's daily log out and the commit for
Monday is based on the weekly history you are going to toss out,
you may need to rebase that week's daily log branch.
Let's say your policy is to keep daily log for at least one week
and enough number of end-of-week weekly logs. Let's say it is
week #2 right now.
Aooo... (week #2 daily)
/|
ooooooB | (week #1 daily)
/ |
o--------o---------C (end-of-week weekly log)
The first commit in this week's daily log (A) would have two
parents: last commit from daily log of week #1 (B), and the
latest commit on the end-of-week weekly log (C). Most likely, B
and C would have exactly the same tree. That way, you would
have at least 7 days of daily log; at the end of this week you
would have close to 14 days but "keeping at least one week" is
satisfied.
When starting the 3rd week, you will discard 1st week's log; you
would need to rewrite 7 days worth of commits from week #2,
because the first commit of week #2 should now only have one
parent (C), and you would forget the commit on the last day of
week #1 as its parent (B). Which cascades through 7 commits you
made during week #2. You are not changing any trees, so this
should be quite efficient.
Then the first daily commit of 3rd week would have two parents,
the commit at the end of week #2 daily branch (D), and a new
commit (E) at the tip of the end-of-week log. Again, D and E
would have the identical trees.
o...... (week #3 daily)
/|
...You should be able to promote an insufficient-version index to a new-version index that's needs to be refreshed for every entry. (And then update-index would take care of the necessary rewrite-everything in the normal way). But I suspect that the right thing is to require that the repository be created with a "commits-include-directories-not-trees" flag, and this means that you always use the extra-detailed index, and the options only affect what information is filtered out in transit between the directory object and the index. Having more information in the index is merely a potential waste of space, not a correctness issue (we have extra information for trees in the index now, remember); it just means that there are more things that will cause git to reread the file, rather than declaring it unchanged with a stat(). For that matter, it may be best for the directory objects to record what information in them is real, and keep the "what's content" mask in the index as well. If it changes over the history of a repository, you want to I was thinking this could be internal to the directory object, but you probably want to support hardlinks shared between dentries in different directory objects, so you're probably right that this makes sense. Alternatively, you could use a single "directory" object for the whole state (including subdirectories), making hardlinks out of the object clearly impossible, or you could use some scheme for sharing sub-"directory" objects that would imply that hardlinks are within an object (the hard part here is finding things when their locations aren't predictable by name). -Daniel *This .sig left intentionally blank* -
So, I've been making little repositories for appropriately related stuff. For example, I have a repository for my ~/.bashrc, ~/.bash_profile, ~/.bash_completions/*, and such. I recall Linus's post in the "VCS Comparison Table" thread, and after thinking about it, I decided the best thing to do would be to have a couple extra files tracked in the repository, alongside other data. I use a backup shell script to copy things from my system to the repository, and then I run getfacl on it all to write out all the details to a 'facl' file in my repository. Then I can make a commit. Then there's a restore shell script to copy things back to my system, and restore ownership and permissions with setfacl. I store the backup and restore scripts in the repository. Paths are currently hard-coded. I'm sure there's a more flexible way to do this, though I'd need some means of representing the correspondence between content in the repository and files in my filesystem. -- epistemological humility Chris Riddoch -
