Re: [PATCH 5/6] Teach "fsck" not to follow subproject links

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Dana How
Date: Thursday, April 12, 2007 - 11:32 am

On 4/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:

These arguments all seem pretty convincing to me --
maybe the problem is that I'm not a "*developer*" right now.
Instead I'm part of a multi-developer *site*.
Below I talk about a possible way we could use git
without changing it (since I recognize this would be a minority usage pattern).

We use perforce to manage a mixed hardware/software project
(I'm the 55GB check-out guy, remember?).  We have at least 3 different
kinds of data with different usage patterns, and using perforce for
everything in one centralized server was not the best solution.

Each user ("client") has their own worktree and the perforce
repository is on a shared central server.  You can consider perforce
to have the equivalent of git's index, but it is stored on the server,
in one file ("db.have") covering all clients.  Obviously that becomes a
bottleneck -- and recently db.have got larger than the total cache RAM on
the server, which really slowed things down until we moved to a larger
server.  But repository architecture aside,  the real problem has been
perforce's usability.  Frequently one contributor,  having gotten ahead
of the team,  needs to share this more recent work with only a few
people.  This could be done with p4 branching,  but this is really clunky.
So instead the work is pushed out (submitted) to everyone, causing
instability; this is partially remedied by doing it in smaller chunks.
Another perforce problem is that tagging consumes a lot of server
space (and may slow things down as well).

Some of this data will stay in perforce, some will move into revision
control built-in to some of our other tools, and I'd like to try to move some
of it into git.  The main attraction for the last group is the lightweight
branching that would allow early/tentative work to be easily shared.
I think the subproject work currently being discussed is going to
be very helpful as well -- the perforce equivalent is chaotic.

We could give each user a work tree and an object repository,
and then have a "release" repository.  Unfortunately,  this would be
slower to use than the current perforce "solution": users would check
in to their local repository, at the speed of gzip, anyone checking
it out would do so at the speed of gzip, and all work would need
to be resubmitted (using perforce jargon here) to the central repo,
again at the speed of gzip.  Currently, people either submit or
check out from the central repo, and it's all done at the speed of
a network copy.  This speed issue is important because of
the size of a commit we'd like to share (but not yet release):
about 40 files, half of them control files of several KB each, 1/4 of them
design files of several MB each, and the last 1/4 detailed design
files 100X larger.  These 40 files will reference (include) 50 others
of several KB each sprinkled through-out the hierarchy, a few of which
might have changed.  And yes, almost all of these are generated files,
but the generation time, and the instability of the tool and script environment,
preclude forcing the other users to regenerate them, like you would
with a .o file.

So, there are 2 alternative set-ups. In one, everyone uses a shared
object repository (everyone's .git/objects is a symlink to it). In this
repository, objects/. , objects/?? , objects/pack , and objects/info all
have "sticky" set, and we do the appropriate machinations to make
all files read-only. There would be an additional phantom user "git"
who owns the shared object repository (the only user whose .git/objects
is not a symlink).  Users would commit to their own repositories,
which would write data to the shared object repository and
update their refs (e.g. HEAD). To "release", push to the ~git repository.
This push would be like a current push -- fast-forward only, figure out the list
of objects that need to be transmitted -- but instead of transmitting the
objects, change their ownership to ~git and then update ~git's refs.
Since users can share local commits, maybe the ~git ownership
change should happen at commit time.  This all seems do-able
without change in git; instead I'd add a few bash wrapper scripts
(and see below for fsck and pack/prune).

Another setup is like the previous, but make the central repo have
its own hidden object repository. You would push to it using the
standard git command.

Finally, users could run git-fsck [with misleading output];
they could run git-prune{,-packed}, but these commands wouldn't
be able to delete anything.  If we don't want users to pack,
then ~git/.git/objects/pack would be writable only by ~git.
So basically, normal people wouldn't do the things in this paragraph.

To do meaningful and safe fsck/prune on the shared repository
as ~git,  I'd add some scripting.  If you require all users'
GIT_DIR's to look like /home/USER/*/.git , then you can get all
their refs and do a meaningful fsck.  If not, you could do a fsck
--unreachable as ~git and filter the result by date and/or type.
(This sort of corresponds to abandoned changesets in perforce.)
Once you have an fsck method you like, its filtered output (i.e.,
--unreachable objects you want to keep) can be fed to git-prune.

Care would also be required with git-repack/git-prune-packed,
but it seems mostly addressable with scheduling.

If I proceed down this path,  I'd like to implement this procedure
without any change in git's .c or .sh files.  It's clear this is a
minority use and should not depend on anything being maintained
for it inside git.  I would write a few bash scripts and a README/HOWTO
for possible inclusion in contrib.

BTW,
has anyone ever thought of writing an "Administrator's Manual" for git?

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Mon Apr 9, 9:12 pm)
[PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Linus Torvalds, (Mon Apr 9, 9:14 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Mon Apr 9, 9:46 pm)
Re: [PATCH 6/6] Teach core object handling functions about ..., Frank Lichtenheld, (Tue Apr 10, 1:40 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 6:04 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 8:13 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 8:48 am)
Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Josef Weidendorfer, (Tue Apr 10, 8:54 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 9:07 am)
Re: [PATCH 6/6] Teach core object handling functions about ..., Josef Weidendorfer, (Tue Apr 10, 9:28 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 9:43 am)
Re: [PATCH 6/6] Teach core object handling functions about ..., Josef Weidendorfer, (Tue Apr 10, 10:23 am)
Re: [PATCH 6/6] Teach core object handling functions about ..., Josef Weidendorfer, (Tue Apr 10, 12:29 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 12:32 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 1:11 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 1:52 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Sam Ravnborg, (Tue Apr 10, 2:02 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Nicolas Pitre, (Tue Apr 10, 2:03 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 2:27 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Martin Waitz, (Wed Apr 11, 1:32 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Wed Apr 11, 1:42 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Martin Waitz, (Wed Apr 11, 1:57 am)
Re: [PATCH 6/6] Teach core object handling functions about ..., Josef Weidendorfer, (Thu Apr 12, 8:12 am)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Dana How, (Thu Apr 12, 11:32 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), J. Bruce Fields, (Sun Apr 15, 4:21 pm)