login
Header Space

 
 

Re: git gc & deleted branches

Previous thread: [PATCH] doc: clarify definition of "update" for git-add -u by Jeff King on Thursday, May 8, 2008 - 1:25 pm. (1 message)

Next thread: [PATCH] cvsexportcommit: Create config option for CVS dir by Trent Piepho on Thursday, May 8, 2008 - 5:26 pm. (3 messages)
To: <git@...>
Date: Thursday, May 8, 2008 - 1:45 pm

Hello,

I'm trying to reclaim space from an abandoned branch (never involved in 
any merge) using 'git gc', but it doesn't appear to work:

   mkdir testrepo
   cd testrepo
   git init
   dd if=/dev/urandom bs=1024k count=10 of=file
   git add file
   git commit -a -m 'initial checkin'
   git checkout -b test
   dd if=/dev/urandom bs=1024k count=10 of=file
   git commit -a -m 'branch checkin'
   git checkout master
   du -s .    # returns 30960
   git branch -D test
   git gc
   du -s .    # returns 30916

Here I had expected ~20000 since the branch uses ~10000.

My config is

[gc]
         reflogExpire = 0
         reflogExpireUnreachable = 0
         rerereresolved = 0
         rerereunresolved = 0
         packrefs = 1

I also tried 'git-pack-refs --all' or 'git-pack-refs --prune' but to no 
avail.

What am I doing wrong?

Thanks for any hints.

Regards

Guido
--
To: Guido Ostkamp <git@...>
Cc: <git@...>
Date: Thursday, May 8, 2008 - 2:39 pm

git-gc uses a "safe" pruning mode, where it only prunes unreferenced
objects that are older than a certain period (this makes it safe to run
git-gc, even if other processes are creating objects at the same time).

So try

[gc]
        pruneExpire = now

Alternatively, you can just run 'git prune' manually instead of 'git

Those won't help at all; they are purely about moving refs from
individual files into the 'packed-refs' file.

-Peff
--
To: Jeff King <peff@...>
Cc: <git@...>
Date: Thursday, May 8, 2008 - 2:55 pm

Jeff, I tried it, but it has no effect (see below). There is only the 
master branch left, and only one commit therein, still it uses the space 
former occupied by the branch. I'm using git version 1.5.5.1.147.g867f.

Any further ideas?


$ git config -l
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
gc.reflogexpire=0
gc.reflogexpireunreachable=0
gc.rerereresolved=0
gc.rerereunresolved=0
gc.packrefs=1
gc.pruneexpire=now

$ git gc
Counting objects: 6, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)

$ git prune

$ git branch
* master

$ du -s .
30820   .

$ git log
commit 9717437cdcb2a4457f28f41db5f6fad9ca55b54e
Author: Testuser &lt;testuser@bianca.dialin.t-online.de&gt;
Date:   Thu May 8 19:40:06 2008 +0200

     initial checkin

$ ls -l
total 10240
-rw-r--r-- 1 testuser users 10485760 May  8 19:40 file

Regards

Guido
--
To: Guido Ostkamp <git@...>
Cc: <git@...>
Date: Thursday, May 8, 2008 - 4:51 pm

It worked fine for me; it's possible, as Brandon mentioned, that it is
in a pack already, and only a "repack -a" would get rid of it. FWIW, my
steps were:

  # ...same as you for repo and branch creation
  git branch -D test
  git config gc.reflogexpire 0
  git config gc.reflogexpireunreachable 0
  git config gc.pruneexpire now
  git gc
  du -s .git ;# shows 10396

-Peff
--
To: Guido Ostkamp <git@...>
Cc: Jeff King <peff@...>, <git@...>
Date: Thursday, May 8, 2008 - 4:07 pm

Possibly that object got packed? git-prune only removes loose objects.
Try 'git gc --prune' which will call git-repack with the -a option.

btw, this is _really_ a non-issue. It seems to keep coming up on the list.

Just know that each one of the config options that you set to zero, including
the one Jeff suggested setting to "now", is a safety mechanism that is there
to ensure that you never ever lose data and that mistakes are recoverable.

And be assured that the objects referenced by a deleted branch will be removed
from the repository eventually as long as 'git gc --prune' is run periodically.

-brandon
--
To: Brandon Casey <casey@...>
Cc: Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 4:56 pm

Yes, I want to chime in since I have been giving advice in such threads:
Please don't construe my help as any sort of endorsement of this
behavior. Git tries hard not to lose your data, and it is almost always
a bad idea to try to override these safety checks unless you really know
what you are doing.

And even then, try to consider balancing a bit of freed disk space (and
generally _no_ performance gain, because git is very good about not
looking at objects that aren't necessary to the current operation)
versus thinking "oops, I wish I still had that data" in a few days.

I can think offhand of only one time when it was truly useful for me to
prune aggressively, and it was a very special case: a pathologically
large repo for which I was doing a one-shot conversion from another
format (and I wanted to prune failed attempts).

-Peff
--
To: Brandon Casey <casey@...>
Cc: Jeff King <peff@...>, <git@...>
Date: Thursday, May 8, 2008 - 4:52 pm

I am aware of this. However, at work I am unfortunately bound to a very 
restrictive filesystem quota on central development servers, so every 
single byte counts in (our official versioning control system is ClearCase 
where less space is required due to working tree and history being 

Ok. I did not know about the 'prune' option yet as it neither mentioned in 
the "Git Tutorial" nor "Everyday Git", there only 'git gc' is used with no 
options.

Regards

Guido
--
To: Guido Ostkamp <git@...>
Cc: Brandon Casey <casey@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:01 pm

It is deprecated; see 25ee9731.

According to that commit message, prune is now a no-op. However, it
looks like it is still used for trigger a "repack -a" rather than
"repack -A". I don't know if it is worth making that behavior available
through some more sane command line option (I would think people who
really know that they want "repack -a" would just call it).

-Peff
--
To: Jeff King <peff@...>
Cc: Brandon Casey <casey@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:33 pm

I tried to look at this but found this option '-A' to be undocumented in 
the manpage (git/Documentation/git-repack.txt and what is generated from 
it).

Regards

Guido
--
To: Jeff King <peff@...>
Cc: Guido Ostkamp <git@...>, Brandon Casey <casey@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:15 pm

Well, actually this is a problem.

I think it is a good thing to deprecate gc --prune.  but if that means 
that repack -a is never used then unreferenced and expired objects will 
never be pruned if they're packed if one is always using 'git gc' as we 
are advocating.


Nicolas
--
To: Nicolas Pitre <nico@...>
Cc: Guido Ostkamp <git@...>, Brandon Casey <casey@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:17 pm

I thought that -A would eventually put them all into a single pack,
killing off the old packs.

-Peff
--
To: Jeff King <peff@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:23 pm

'-a' puts everything in a single pack and kills off old packs. Anything that
was unreachable is not repacked in the new pack.

'-A' does the same thing but it also repacks the unreachable objects that were
previously packed.

So if something gets packed that subsequently becomes unreachable it will never
be removed unless 'repack -a' is used.

Possibly --keep-unreachable should instead unpack the unreachable items which would
allow them to eventually be pruned based on pruneExpire. Then we could indeed
get rid of the --prune option to git-gc.

-brandon
--
To: Brandon Casey <casey@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:31 pm

Ah, indeed. I hadn't looked closely at the -A behavior before. So yes,
we are never killing off prunable packed objects. Probably we could use
the same solution as "git prune --expire"; perhaps a
"--keep-unreachable=2.weeks.ago"?

-Peff
--
To: Jeff King <peff@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:40 pm

The 'prune --expire' behavior is based on object mtime (i.e. file modification time).
That is lost once something is packed right?

I was thinking that either repack or pack-objects could be modified to unpack those
unreachable objects and leave them loose, and also give them the timestamp of the
pack file they came from. Then the --expire behavior of git-prune could work normally
and remove them. This seems like it would work nicely since prune follows repack in
git-gc.

-brandon

--
To: Brandon Casey <casey@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:44 pm

Yes. You would have to use the pack mtime. But of course you would have
to actually _leave_ them in a pack, or they would just keep getting

That is sensible, I think.

-Peff
--
To: Jeff King <peff@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 5:53 pm

I had the impression that unreachable objects would not be packed. Maybe it
was more of an assumption.

-brandon

--
To: Brandon Casey <casey@...>
Cc: Nicolas Pitre <nico@...>, Guido Ostkamp <git@...>, <git@...>
Date: Thursday, May 8, 2008 - 6:48 pm

Look in builtin-pack-objects.c:1981-1982. We basically just say "if it's
in a pack now, then it should go into the new pack."

-Peff
--
Previous thread: [PATCH] doc: clarify definition of "update" for git-add -u by Jeff King on Thursday, May 8, 2008 - 1:25 pm. (1 message)

Next thread: [PATCH] cvsexportcommit: Create config option for CVS dir by Trent Piepho on Thursday, May 8, 2008 - 5:26 pm. (3 messages)
speck-geostationary