As you know, repo.or.cz uses alternates in order to reduce the space that the
repositories of forked projects require.Recently, it happened that a fork (4msysgit.git) became broken because it was
using an object that was pruned away from the repository that it was
borrowing from (mingw.git). This happened even though 4msysgit did not use
the branch of mingw.git that was rebased and whose objects were pruned. The
reason is that a merge in 4msysgit.git resulted in a blob that was also in
the rebased branch.To avoid such situations I propose to introduce "attic" packs. They contain
objects that are unreachable by the local set of refs. Otherwise they are
used like regular packs.git-repack produces "attic" packs like this:
- Places objects of the local object store that are unreachable in an "attic"
pack.
- Copies objects that are reachable but borrowed from an alternate and are
only in the alternates' "attic" packs into the local regular pack.git-prune removes "attic" packs.
Then the strategy of garbage collection can be arranged in the following way:
- Repack by starting at the "most complete" repo and work towards the "most
borrowing" ones. During this phase "attic" packs are created. Borrowing repos
get a chance to salvage objects before the alternates prune them away.- Prune by starting at the "most borrowing" repo and work towards the "most
complete" ones. During this phase the "attic" packs are cleaned up.What do you think? Is this a way for a solution?
-- Hannes
-
I would imagine that would work as long as it can be controlled
when all the involved repositories are repacked and pruned, such
as on repo.or.cz case (but on the other hand it is not really
controlled well there and that is the reason you wrote the
message X-<).-
Well, I think in many situations pack and prune can be controlled. To be
precise, if alternates are used pack and prune *must* be controlled.
Currently, the control is very simple: "don't prune" (and I don't recall ATM
what you must not do when you repack).Anyway, judging from the responses so far it seems that people can live
with "don't prune" (or not using alternates) ;-) Repositories getting broken
this way isn't exactly my itch, either, so... I spelled out a possible
solution if someone wants to pick up the topic.-- Hannes
-
Because my point was not "don't prune is good enough", I think
you are judging from too small number of responses (in fact,
zero).My point was that even the existing setup that is well known to
the public (i.e. repo.or.cz) does not seem to be controlled, and
adding a nicer mechanism (e.g. I do not think there currently is
a canned way to prepare a pack that contains only unreachable
objects --- you need to script it anew) for a better control may
not help the situation much, unless it is actually used.
-
When specifying --attic=<prefix>, the objects that would be lost when
calling repack with -d will be put into a packfile (or multiple
packfiles), using the file name prefix <prefix>.Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---This implements the idea of Hannes.
The plan for repo.or.cz is now to invoke repack with
"--attic=attic" and copied attic-*.{idx,pack} to all the forks'
object stores, then delete the original attic-*{.idx,pack}.The beauty of that approach is that the order in which the
repositories are repacked is no longer important.This patch is marked RFC since there is a severe bottleneck
here: the new pack's index is sorted and made unique and every
SHA-1 displayed twice, then the old pack's index is sorted and
made unique. Then the combined result is sorted and only the
now-unique SHA-1s are actually packed.(The sort is not necessary if there is only _one_ pack.
However, we cannot guarantee that.)Of course, this is quick 'n dirty, and the price to be paid
is a substantial performance hit: in my tests, linux-2.6.git
needed half a second to show its pack's index, but that
sed 's/^.* //' | sort | uniq | sed p mantra needs 19 seconds.The obvious thing is to exploit the fact that the pack indices
are already sorted:I started patch git-show-index so it takes an argument
--missing-objects, followed by the new pack index file names,
followed by --, followed be the old pack index file names.Then it would traverse all of them simultaneously, outputting
only the SHA-1s of objects that are in an old pack, but not
in any of the new packs.Two issues: there might be a whole lot of pack files (Pasky
told me today that in one instance there were 416 pack files!)
and that might well exceed the maximum number of open files.Second issue: there are two different pack index formats, and
the code is not easily refactored AFAICT.Probably a better method would be not to read the f...
Your 1.5 hours was spent wisely to come up with that idea ;-).
To make sure I understand your idea correctly, the procedure to repack a
repository in a fork-friendly way is:(1) find the project directly forked from you;
(2) hardlink all packs under your object store to their object store;
(3) repack -a -l and prune.
I think that would work as long as you do the above as a unit and handle
one repository at a time. Otherwise I think you risk losing necessary
objects when hierarchical forks are involved. E.g. if you have a
project X that has a fork Y which in turn has fork Z.* Step 1 is run for X, Y and Z.
* Step 2 is run for Y and Z.
* Step 3 is run for Z.At this point, Z is still borrowing objects from Y and X through Y, and
it will not keep objects it is borrowing from X through Y. Then if the
procedure is intermixed like this, a bad thing happens.* Step 2 is run for X.
* Step 3 is run for Y.
* Step 3 is run for X.Step 3 for Y would lose objects Y was borrowing from X that were not
used by Y itself. At this point, Z is still usable as the objects it is
borrowing from X though Y have not been pruned from X. But Step 3 for X
will lose them, rendering Z unusable.-
Hi,
Exactly. See
http://repo.or.cz/w/repo.git?a=commitdiff;h=fba501deabd349afbe3b8bf89f38...
for a tired proposal.
Note that "prune" is not (yet) an option, since it could possibly destroy
objects which are needed in an ongoing push operation.However, we could do exactly the same as with reflogs: introduce a grace
Well, in theory you could also iterate over all projects and hard link the
packs/objects of their alternates, and _then_ iterate and repack. But it
is simpler and more obvious in the case of repo.or.cz to do all in one
iteration, because we can order the repository names easily so that
forkees come first, _and_ we have an easy way to find out what are the
forks of a project.Ciao,
Dscho-
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, bygit prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---On Thu, 29 Nov 2007, Johannes Schindelin wrote:
> Note that "prune" is not (yet) an option [for repo.or.cz], since
> it could possibly destroy objects which are needed in an ongoing
> push operation.
>
> However, we could do exactly the same as with reflogs: introduce
> a grace period (with loose objects, we can use the ctime...)and this patch does that (except using mtime as ctime, for reasons
explained in the commit message.Obviously, this patch is asking for a cousin, changing
git-gc to use this option, and maybe introduce a config
variable gc.pruneAge.Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 18 ++++++++++++++++++
3 files changed, 42 insertions(+), 2 deletions(-)diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object databaseSYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]DESCRIPTION
-----------
@@ -31,6 +31,9 @@ OPTIONS
\--::
Do not interpret any more argument...
But I think you can use more portable -t for setting mtime to
1970/01/01, but I had a feeling that earlier we were bitten by
non-portability of "touch" and introduced test-chmtime.-
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, bygit prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---On Thu, 29 Nov 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > The implementation uses st.st_mtime rather than st.st_ctime,
> > because it can be tested better, using 'touch -d <time>' (and
> > omitting the test when the platform does not support that
> > command line switch).
>
> But I think you can use more portable -t for setting mtime to
> 1970/01/01, but I had a feeling that earlier we were bitten by
> non-portability of "touch" and introduced test-chmtime.Somehow that slipped by me. This patch uses test-chmtime.
Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 17 +++++++++++++++++
3 files changed, 41 insertions(+), 2 deletions(-)diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object databaseSYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]DESCRIPTION
-----------
@@ -31,6 +31,9 @@ OPTIONS
\--::
...
Does this now make git-prune safe for automatic running?
I suppose you could still be actively manipulating refs that point to
very old objects.-Peff
-
Hi,
That's why I want to have it configurable from git-gc.
Ciao,
Dscho-
Here you could throw in:
git prune --expire=1.hour.ago &&
test 20 = $(git count-objects | sed "s/ .*//") &&
test -f $BLOB_FILE &&-- Hannes
-
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, bygit prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---On Thu, 29 Nov 2007, Johannes Sixt wrote:
> Johannes Schindelin schrieb:
> > +test_expect_success 'prune --expire' '
> > +
> > + BLOB=$(echo aleph | git hash-object -w --stdin) &&
> > + BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
> > + test 20 = $(git count-objects | sed "s/ .*//") &&
> > + test -f $BLOB_FILE &&
> > + git reset --hard &&
>
> Here you could throw in:
>
> git prune --expire=1.hour.ago &&
> test 20 = $(git count-objects | sed "s/ .*//") &&
> test -f $BLOB_FILE &&
>
> to test that the object is not pruned (and the alternate
> --expire syntax).Good idea!
Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 21 +++++++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object databaseSYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-p...
| Ingo Molnar | Re: containers (was Re: -mm merge plans for 2.6.23) |
| Greg Kroah-Hartman | [PATCH 009/196] Chinese: add translation of sparse.txt |
| holzheu | Re: [RFC/PATCH] Documentation of kernel messages |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Antonio Almeida | HTB accuracy for high speed |
