login
Header Space

 
 

Re: how to backup git

Previous thread: Why repository grows after "git gc"? / Purpose of *.keep files? by Teemu Likonen on Monday, May 12, 2008 - 8:29 am. (35 messages)

Next thread: [user confusion] git-config error messages by Ingo Molnar on Monday, May 12, 2008 - 10:16 am. (3 messages)
To: <git@...>
Date: Monday, May 12, 2008 - 9:11 am

&gt;&gt;&gt;&gt;&gt; "bill" == bill lam &lt;cbill.lam@gmail.com&gt; writes:

    bill&gt; Hello, this should be a simple question.  How to backup a git
    bill&gt; repository but excluding files that not under versioned?  If I
    bill&gt; cp or tar or rsync the directory.  All non-versioned files are
    bill&gt; added.

I'd rsync just the .git directory.

-- 
It sounds churlish to bash a film that's as bereft of bad
thoughts as a tiny puppy.  But when the puppy licks your face for 108
minutes, enough's enough.
        -- Matt Zoller Seitz, NY Times

--
To: Eric Hanchrow <offby1@...>
Cc: <git@...>
Date: Monday, May 12, 2008 - 9:28 am

Hi,


Note that this is necessary if you want to keep the reflogs (clone would 
not copy them).

Ciao,
Dscho

--
To: <git@...>
Date: Monday, May 12, 2008 - 10:13 am

Johannes Schindelin wrote:
 &gt;&gt; I'd rsync just the .git directory.

Thanks to all responders for quick reply. I still have a related question. svn 
has a hotcopy command to ensure integrity so that it is possible to backup 
without shutting down the svn server. If someone update the .git while I am 
performing backup using tar or rsync? Will the atomicity of that commit still 
preserve in my backup copy?

regards,
--
To: bill lam <cbill.lam@...>
Cc: <git@...>
Date: Monday, May 12, 2008 - 5:19 pm

There's the risk that the backup will start, it will copy all of the 
objects, then a git commit happens, which adds more objects (after rsync 
has passed) and updates a "refs" entry to refer to one of them, and then 
rsync copies the "refs" directory.

It's likewise possible to have part of the information for a commit copied 
and part of it not. This commit will be clearly broken, however (one or 
more objects not found). 

So, essentially, every commit goes through the stages of not at all 
written, partially written but invalid, and valid and correct. 
Independantly, which commit is the latest is updated atomically. It's 
possible for an ill-timed backup to get a branch updated to a commit 
that's not yet valid in the backup. In you restored from this, you'd need 
to use one of several methods (mainly reflogs) to get back to the last 
valid commit that got backed up.

On the other hand, git will never, even in this sort of backup, end up 
with a commit that's valid but not completely correct.

	-Daniel
*This .sig left intentionally blank*
--
To: Daniel Barkalow <barkalow@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 6:26 pm

I think suggestions from old timers on this thread to first "git fetch" is
to handle that concern.  It may not get the commit that is being created
simultaneously when such a fetch to backup repository is running (but that
will be backed up during the next round), but at least the contents of the
backup repository would be self contained and correct.  So a nightly fetch
(perhaps with --mirror) into a backup repository, and then after the fetch
finishes, copying the backup repository to tape, would give you one copy a
night.  Copying out from the central repository to backup repository would
be incremental, and until you repack the backup repository, the tape
backup of that backup repository could also be made incremental, as fetch
will be append-only into its objects/ part with updates to refs/ part.


--
To: Junio C Hamano <gitster@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 7:43 pm

Yeah, "git fetch" is the right solution (although it's a pain to do a 
backup of "every repo under &lt;path&gt;" or "every repo anywhere under &lt;path&gt;" 
that way, which I suspect of being the real issue). I just wanted to get a 
note into this thread of what problems using rsync can and cannot have, 
since it's different (both more and less reliable) from what the original 

And simultaneous commit isn't really an issue; nothing will back up work 
you do right after the backup runs, and users can't tell whether they did 
the commit before or after the backup if it's close.

	-Daniel
*This .sig left intentionally blank*
--
To: bill lam <cbill.lam@...>
Cc: <git@...>
Date: Monday, May 12, 2008 - 11:08 am

Hi,

[please do not cull the Cc: list]


No, rsync is particularly dumb in that respect.  The safest thing would be 
to back up the reflogs first (e.g. with rsync), then repack and then clone 
(the clone will transmit the objects referenced by the reflogs, too).  
Note: the same holds _not_ true for a simple fetch.

But then, you usually do not want to back up reflogs anyway, since they 
are purely local and not visible to anybody else.

Ciao,
Dscho

--
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 11:27 am

Is there a simple and efficient mechanism for incremental backups?
It should be safe with respect to simultaneous repository access.
Incrementals should be efficient weven when a user runs "git gc". 
Preferably I would like to have one file per day: 
myrepo.YYYY-MM-DD.increment (and occasionally myrepo.YYYY-MM-DD.full)
I believe this is a common administration problem; 
something like "git-backup" script/tool would be nice so that not all 
the admins need consider these issues.

Currently, I'm doing daily clones of repos, and I preserve those cloned 
directories.

-- 
Heikki Orsila
heikki.orsila@iki.fi
http://www.iki.fi/shd
--
To: Heikki Orsila <shdl@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 1:07 pm

Hi,


Umm.  "git fetch"?

Like I said, it does not get the reflogs, but if you want to back up a 
repository, the safest is to clone once, and fetch later.  Or you could 
set up a remote with the --mirror option, if you want to preserve the 
refs' namespaces.

Ciao,
Dscho
--
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:07 pm

Preferably some solution that does not require too much understanding of 
Git internals so that admins will actually use it, instead of hacking 
their own inefficient backup scripts.

Could someone please write a "git-backup" script?-)

-- 
Heikki Orsila
heikki.orsila@iki.fi
http://www.iki.fi/shd
--
To: Heikki Orsila <shdl@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:21 pm

Hi,


Heikki, why don't you just go with the "git fetch" approach I described?  
We do not need "git backup" when "git fetch" does already what you want.

Ciao,
Dscho

--
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Heikki Orsila <shdl@...>, bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:54 pm

I think that bundles (see git-bundle) would be what you want (please
read GitFaq/GitTips/"Git in Nutshell" for explanation and use cases).

"git fetch" (perhaps using bundle) would save state of refs (heads,
remote branches, tags) and object repository.  To save state of
working area and index I think it would be best to use 'git-stash'
before creating backup, and unstash after it; see documentation for
git-stash.  What is left is: reflogs, configuration, hooks, grafts and
shallow, repository local excludes file, repository local attributes
file[*1*].

[*1*] Which is not mentioned in Documentation/repository-layout.txt
-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:36 pm

I need efficient (small) daily increment files. What I could do is 
something like:

1. rm -rf yesterday ; mv today yesterday

2. git clone yesterday today

3. cd today &amp;&amp; git fetch /path/to/repo

4. create an increment file (from yesterday to today) or a full

The above should just be something like (why not make this easy?):

1. git backup /path/to/repo /backup/location/

And restoration should be something like:

1. git backup --restore /backup/location/foo /path/to/repo

Am I missing something?

-- 
Heikki Orsila
heikki.orsila@iki.fi
http://www.iki.fi/shd
--
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:38 pm

That was a mistake, a bare fetch is not enough.

-- 
Heikki Orsila
heikki.orsila@iki.fi
http://www.iki.fi/shd
--
To: Heikki Orsila <shdl@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 2:58 pm

Yeah, Heikki - I wonder if you're missing the point.  In our case, we  
don't bother with repository backups here.  Everyone developer has a  
full copy of the repository, and any one of them could be used to  
create a new "central" git repository.  If our central git repository  
goes down - we've got 9 others floating around on different laptops  
and computers.  You can't beat that kind of redundancy.  You'd have to  
nuke Utah to take us out (btw: please don't nuke Utah).


--
To: Tim Harper <timcharper@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 3:10 pm

So you assume everyone syncs everyone else often enough. I don't think 
many organizations want to rely on that assumption. The point is to 
have a simple, efficient and manageable backup system that 
does _not_ require knowledge of Git internals.

-- 
Heikki Orsila
heikki.orsila@iki.fi
http://www.iki.fi/shd
--
To: Heikki Orsila <shdl@...>
Cc: Tim Harper <timcharper@...>, Johannes Schindelin <Johannes.Schindelin@...>, bill lam <cbill.lam@...>, <git@...>
Date: Monday, May 12, 2008 - 3:49 pm

Isn't that what Dscho and others have mentioned a few times now?
Initially you git clone the repo, every few hours you have cron do a
git pull and daily you do 'rm yesterday.tar.gz &amp;&amp; mv today.tar.gz
yesterday.tar.gz &amp;&amp; tar czvf today.tar.gz .git '. Why would git need a
'backup script' for something so trivial? I reckon everybody wants a
different type of backup too, so creating a 'git backup' would
probably not be very usefull to most.

-- 
Cheers,

Sverre Rabbelier
--
To: bill lam <cbill.lam@...>
Cc: <git@...>
Date: Monday, May 12, 2008 - 10:54 am

On Mon, May 12, 2008 at 10:13:27PM +0800, bill lam &lt;cbill.lam@gmail.com&gt; wr=

the only problem can be when someone runs git-gc (as it runs git-prune)
while you are doing the backup. other commands never remove objects.
Previous thread: Why repository grows after "git gc"? / Purpose of *.keep files? by Teemu Likonen on Monday, May 12, 2008 - 8:29 am. (35 messages)

Next thread: [user confusion] git-config error messages by Ingo Molnar on Monday, May 12, 2008 - 10:16 am. (3 messages)
speck-geostationary