Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior

Previous thread: [PATCH v2 07/11] gitweb: add 'remotes' action by Giuseppe Bilotta on Thursday, November 13, 2008 - 6:49 pm. (15 messages)

Next thread: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior by Brandon Casey on Thursday, November 13, 2008 - 7:20 pm. (1 message)
To: <git@...>
Date: Thursday, November 13, 2008 - 7:22 pm

Once upon a time, repack had only a single option which began with the first
letter of the alphabet. Then, a second was created which would repack
unreachable objects into the newly created pack so that git-gc --auto could
be invented. But, the -a option was still necessary so that it could be
called every now and then to discard the unreachable objects that were being
repacked over and over and over into newly generated packs. Later, -A was
changed so that instead of repacking the unreachable objects, it ejected
them from the pack so that they resided in the object store in loose form,
to be garbage collected by prune-packed according to normal expiry rules.

And so, -a lost its raison d'etre.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---

This is on top of bc/maint-keep-pack

-brandon

Documentation/git-repack.txt | 25 ++++++++++++-------------
git-repack.sh | 8 ++++----
2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index aaa8852..d04d5c2 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -32,21 +32,20 @@ OPTIONS
pack everything referenced into a single pack.
Especially useful when packing a repository that is used
for private development and there is no need to worry
- about people fetching via dumb protocols from it. Use
- with '-d'. This will clean up the objects that `git prune`
- leaves behind, but `git fsck --full` shows as
- dangling.
+ about people fetching via dumb protocols from it. If used
+ with '-d' , then any unreachable objects in a previous pack will
+ become loose, unpacked objects, instead of being left in the
+ old pack. Unreachable objects are never intentionally added to
+ a pack, even when repacking. This option prevents unreachable
+ objects from being immediately deleted by way of being left in
+ the old pack and then removed. Instead, the loose unreachable
+ objects will be pruned ...

To: Brandon Casey <casey@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 8:02 pm

I didn't check all the (proposed) commits for that branch, so just let
me know if I'm missing anything, but doesn't this change mean that you
just lose what "-ad" did?

We have:
-a Create a new pack, containing all reachable objects
-A Same as -a
-ad Same as -a, and drop all old packs and loose objects
-Ad Sama as -ad, but keep unreachable objects loose

-Ad is nice regarding it's safety-net value, but eg. after a large
filter-branch run, when refs/original and the reflogs have been cleaned,
you just want to get rid of all those old unreachable objects,
immediately. For example after importing and massaging some large
history from SVN, the -Ad behaviour is definitely _not_ what I want
there. Writing a few thousand loose objects just to prune them is just a
waste of time.

Björn
--

To: Björn Steinbrink <B.Steinbrink@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 8:53 pm

hmm. That's a good point. Even though I think it is likely that the thousand
loose objects that are written will be small commit objects and not blobs,
this use case may be enough to trump the safety benefit provided by the
proposed change.

-brandon

--

To: Brandon Casey <casey@...>
Cc: Björn <B.Steinbrink@...>, <git@...>
Date: Thursday, November 13, 2008 - 10:22 pm

The problem is even small commit objects take a full 4k (or whatever
your filesystem block size is) when they are ejected as loose objects.
As a result, the current "git gc" defaults can end up requiring far
*more* disk space than before, certainly while it is running, and
sometimes even after the "git gc" completes. (I then end up running
"git prune" to complete deletion of the ejected objects.)

Sometimes this gets so annoying that I'll run the individual commands
run by git-gc by hand, except I use git repack -ad instead of git
repack -A. If we are going to get rid of the distinction between git
repack -a and git repack -A, perhaps there can be a config option to
force the immediate ejection of the unreachable objects, instead of
creating loose objects?

If the goal is safety, it would be nice if git repack could create a
separate pack that only contained unreachable objects, and then have
git prune be able to remove a pack if it only contains unreachable
objects.

- Ted
--

To: Brandon Casey <casey@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 9:25 pm

No, actually I just totally ignored the fact that -a of course already
deletes the loose objects. The packed unreachable objects are in the old

When you only fix up merge commits, author information and such things,
then yes, most objects will be commits. And then it's not even that bad.

But a more interesting case is when in your old SCM you had multiple
projects in one repo, and you can't sanely separate them before the
import. So you might end up using the subdirectory filter a few times,
or even just drop a bunch of branches in each copy of your import.

And another one is when you had accidently commited some huge, useless
files, and as you're switching to git now anyway, you want to get rid of
them, so you use an index-filter to drop them.

For those two cases, -Ad vs -ad can make a huge difference. I remember
someone on #git using a subdirectory filter on some project and trying
to get the repo to a sane size afterwards. -Ad took basically forever,

IMHO, "git gc" already provides enough safety. I tend to see "gc" as the
regular "just use it" tool, while repack gives me more control over how
I want things to be done, without forcing me to use the real plumbing or
to fumble around with the configuration for gc. And when I want control,
I'm generally prepared to shoot myself in the foot.

Björn
--

To: Björn Steinbrink <B.Steinbrink@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 9:36 pm

Actually, I had forgotten that repack deletes any loose objects at all.

I think you're right. Thanks for providing an example of a real use case.

-brandon

--

To: Brandon Casey <casey@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 9:48 pm

Ugh, right. -a does not delete loose objects without -d. So, ignoring
the .keep stuff, my initial description was even right and I just
confused myself afterwards :-/

Thanks,
Björn
--

Previous thread: [PATCH v2 07/11] gitweb: add 'remotes' action by Giuseppe Bilotta on Thursday, November 13, 2008 - 6:49 pm. (15 messages)

Next thread: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior by Brandon Casey on Thursday, November 13, 2008 - 7:20 pm. (1 message)