>>>>> "bill" == bill lam <cbill.lam@gmail.com> writes:
bill> Hello, this should be a simple question. How to backup a git
bill> repository but excluding files that not under versioned? If I
bill> cp or tar or rsync the directory. All non-versioned files are
bill> added.
I'd rsync just the .git directory.
--
It sounds churlish to bash a film that's as bereft of bad
thoughts as a tiny puppy. But when the puppy licks your face for 108
minutes, enough's enough.
-- Matt Zoller Seitz, NY Times
--Hi, Note that this is necessary if you want to keep the reflogs (clone would not copy them). Ciao, Dscho --
Johannes Schindelin wrote: >> I'd rsync just the .git directory. Thanks to all responders for quick reply. I still have a related question. svn has a hotcopy command to ensure integrity so that it is possible to backup without shutting down the svn server. If someone update the .git while I am performing backup using tar or rsync? Will the atomicity of that commit still preserve in my backup copy? regards, --
There's the risk that the backup will start, it will copy all of the objects, then a git commit happens, which adds more objects (after rsync has passed) and updates a "refs" entry to refer to one of them, and then rsync copies the "refs" directory. It's likewise possible to have part of the information for a commit copied and part of it not. This commit will be clearly broken, however (one or more objects not found). So, essentially, every commit goes through the stages of not at all written, partially written but invalid, and valid and correct. Independantly, which commit is the latest is updated atomically. It's possible for an ill-timed backup to get a branch updated to a commit that's not yet valid in the backup. In you restored from this, you'd need to use one of several methods (mainly reflogs) to get back to the last valid commit that got backed up. On the other hand, git will never, even in this sort of backup, end up with a commit that's valid but not completely correct. -Daniel *This .sig left intentionally blank* --
I think suggestions from old timers on this thread to first "git fetch" is to handle that concern. It may not get the commit that is being created simultaneously when such a fetch to backup repository is running (but that will be backed up during the next round), but at least the contents of the backup repository would be self contained and correct. So a nightly fetch (perhaps with --mirror) into a backup repository, and then after the fetch finishes, copying the backup repository to tape, would give you one copy a night. Copying out from the central repository to backup repository would be incremental, and until you repack the backup repository, the tape backup of that backup repository could also be made incremental, as fetch will be append-only into its objects/ part with updates to refs/ part. --
Yeah, "git fetch" is the right solution (although it's a pain to do a backup of "every repo under <path>" or "every repo anywhere under <path>" that way, which I suspect of being the real issue). I just wanted to get a note into this thread of what problems using rsync can and cannot have, since it's different (both more and less reliable) from what the original And simultaneous commit isn't really an issue; nothing will back up work you do right after the backup runs, and users can't tell whether they did the commit before or after the backup if it's close. -Daniel *This .sig left intentionally blank* --
Hi, [please do not cull the Cc: list] No, rsync is particularly dumb in that respect. The safest thing would be to back up the reflogs first (e.g. with rsync), then repack and then clone (the clone will transmit the objects referenced by the reflogs, too). Note: the same holds _not_ true for a simple fetch. But then, you usually do not want to back up reflogs anyway, since they are purely local and not visible to anybody else. Ciao, Dscho --
Is there a simple and efficient mechanism for incremental backups? It should be safe with respect to simultaneous repository access. Incrementals should be efficient weven when a user runs "git gc". Preferably I would like to have one file per day: myrepo.YYYY-MM-DD.increment (and occasionally myrepo.YYYY-MM-DD.full) I believe this is a common administration problem; something like "git-backup" script/tool would be nice so that not all the admins need consider these issues. Currently, I'm doing daily clones of repos, and I preserve those cloned directories. -- Heikki Orsila heikki.orsila@iki.fi http://www.iki.fi/shd --
Hi, Umm. "git fetch"? Like I said, it does not get the reflogs, but if you want to back up a repository, the safest is to clone once, and fetch later. Or you could set up a remote with the --mirror option, if you want to preserve the refs' namespaces. Ciao, Dscho --
Preferably some solution that does not require too much understanding of Git internals so that admins will actually use it, instead of hacking their own inefficient backup scripts. Could someone please write a "git-backup" script?-) -- Heikki Orsila heikki.orsila@iki.fi http://www.iki.fi/shd --
Hi, Heikki, why don't you just go with the "git fetch" approach I described? We do not need "git backup" when "git fetch" does already what you want. Ciao, Dscho --
I think that bundles (see git-bundle) would be what you want (please read GitFaq/GitTips/"Git in Nutshell" for explanation and use cases). "git fetch" (perhaps using bundle) would save state of refs (heads, remote branches, tags) and object repository. To save state of working area and index I think it would be best to use 'git-stash' before creating backup, and unstash after it; see documentation for git-stash. What is left is: reflogs, configuration, hooks, grafts and shallow, repository local excludes file, repository local attributes file[*1*]. [*1*] Which is not mentioned in Documentation/repository-layout.txt -- Jakub Narebski Poland ShadeHawk on #git --
I need efficient (small) daily increment files. What I could do is something like: 1. rm -rf yesterday ; mv today yesterday 2. git clone yesterday today 3. cd today && git fetch /path/to/repo 4. create an increment file (from yesterday to today) or a full The above should just be something like (why not make this easy?): 1. git backup /path/to/repo /backup/location/ And restoration should be something like: 1. git backup --restore /backup/location/foo /path/to/repo Am I missing something? -- Heikki Orsila heikki.orsila@iki.fi http://www.iki.fi/shd --
That was a mistake, a bare fetch is not enough. -- Heikki Orsila heikki.orsila@iki.fi http://www.iki.fi/shd --
Yeah, Heikki - I wonder if you're missing the point. In our case, we don't bother with repository backups here. Everyone developer has a full copy of the repository, and any one of them could be used to create a new "central" git repository. If our central git repository goes down - we've got 9 others floating around on different laptops and computers. You can't beat that kind of redundancy. You'd have to nuke Utah to take us out (btw: please don't nuke Utah). --
So you assume everyone syncs everyone else often enough. I don't think many organizations want to rely on that assumption. The point is to have a simple, efficient and manageable backup system that does _not_ require knowledge of Git internals. -- Heikki Orsila heikki.orsila@iki.fi http://www.iki.fi/shd --
Isn't that what Dscho and others have mentioned a few times now? Initially you git clone the repo, every few hours you have cron do a git pull and daily you do 'rm yesterday.tar.gz && mv today.tar.gz yesterday.tar.gz && tar czvf today.tar.gz .git '. Why would git need a 'backup script' for something so trivial? I reckon everybody wants a different type of backup too, so creating a 'git backup' would probably not be very usefull to most. -- Cheers, Sverre Rabbelier --
On Mon, May 12, 2008 at 10:13:27PM +0800, bill lam <cbill.lam@gmail.com> wr= the only problem can be when someone runs git-gc (as it runs git-prune) while you are doing the backup. other commands never remove objects.
| Karl Meyer | PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out" |
| David Miller | Slow DOWN, please!!! |
| S.Çağlar | Rescheduling interrupts |
| Renato S. Yamane | Error -71 on device descriptor read/all |
git: | |
| Sverre Rabbelier | Git vs Monotone |
| Sergei Organov | Newbie: report of first experience with git-rebase. |
| Paolo Ciarrocchi | Question about "git commit -a" |
| Matthieu Moy | git push to a non-bare repository |
| Richard Stallman | Real men don't attack straw men |
| Brandon Mercer | Solid State Hard Disk in OpenBSD |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Pau | acer aspire one dmesg? |
| Radu Rendec | Endianness problem with u32 classifier hash masks |
| Sami Farin | Linux 2.6.27.5 / SFQ/HTB scheduling problems |
| Jarek Poplawski | [PATCH take 2] pkt_sched: Protect gen estimators under est_lock. |
| Stephen Hemminger | Re: data received but not detected |
