Re: [PATCH/v2] git-basis, a script to manage bases for git-bundle

Previous thread: [PATCH] bisect: error out when given any good rev that is not an ancestor of the bad rev by Christian Couder on Monday, June 30, 2008 - 3:42 pm. (11 messages)

Next thread: [PATCH] Teach "git apply" to prepend a prefix with "--root=<root>" by Johannes Schindelin on Monday, June 30, 2008 - 4:44 pm. (8 messages)
From: Adam Brewster
Date: Monday, June 30, 2008 - 3:49 pm

Git-basis is a perl script that remembers bases for use by git-bundle.
Code from rev-parse was borrowed to allow git-bundle to handle --stdin.

Signed-off-by: Adam Brewster &lt;adambrewster@gmail.com&gt;
---
As promised, here's another patch with documentation.  The code is
identical to the previous version.

I know this is a minor patch, but I think the result is a more usable
git-bundle feature, and I'd like to see it included in future releases
if there are no objections.

 Documentation/git-basis.txt |   82 +++++++++++++++++++++++++++++++++++++++++++
 bundle.c                    |   22 ++++++++++-
 git-basis                   |   71 +++++++++++++++++++++++++++++++++++++
 3 files changed, 173 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/git-basis.txt
 create mode 100755 git-basis

diff --git a/Documentation/git-basis.txt b/Documentation/git-basis.txt
new file mode 100644
index 0000000..3624890
--- /dev/null
+++ b/Documentation/git-basis.txt
@@ -0,0 +1,82 @@
+git-basis(1)
+============
+
+NAME
+----
+git-basis - Track sets of references available on remote systems (bases)
+
+SYNOPSIS
+--------
+[verse]
+'git-basis' &lt;basis&gt; [&lt;basis&gt;...]
+'git-basis' --update &lt;basis&gt; [&lt;basis&gt;...] &lt; &lt;object list or bundle&gt;
+
+DESCRIPTION
+-----------
+Maintains lists of objects that are known to be accessible on remote
+computer systems that are not accessible by network.
+
+OPTIONS
+-------
+
+basis::
+       List of bases to operate on.  Any valid filename can be
+       the name of a basis.  Bases that do not exist are taken
+       to be empty.
+
+--update::
+       Tells git-basis to read a list of objects from stdin and
+       add them to each of the given bases.  git-basis produces
+       no output when this option is given.  Bases will be created
+       if necessary.
+
+object list or bundle::
+       Git-basis --update reads object names, one per line from stdin.
+       Leading caret (&quot;^&quot;) characters are ignored, as is anything
+       after ...
From: Jeff King
Date: Tuesday, July 1, 2008 - 2:51 am

I don't use bundles myself, so I can't comment on how useful this is for
a bundle-based workflow. But it seems like a sensible idea in general.


When a new feature depends on other, more generic improvements
to existing code, it is usually split into two patches. E.g.,

  1/2: add --stdin to git-bundle
  2/2: add git-basis

with the advantages that:

 - it is slightly easier to review each change individually
 - it is easier for other features to build on the generic improvement
   without requiring part 2, especially if part 2 is questionable

As it happens in this case, I think in this case the change was already
easy to read, being logically separated by file, so I am nitpicking



Yikes. This fails if $d contains an apostrophe. You'd want to use
quotemeta to properly shell out. But there's no need at all to shell out


Why make a hash when the only thing we ever do with it is &quot;keys %new&quot;?

Style: I know we are not consistent within git, but it is usually better
to use local variables for filehandles these days. I.e.,


So the basis just grows forever? That is, each time we do a bundle and
basis update, we add a line for every changed ref, and we never delete
any lines. But having a commit implies having all of its ancestors, so
in the normal case (i.e., no rewind or rebase) we can simply replace old
objects if we know they are a subset of the new ones (which you can
discover with git-merge-base). For the rewind/rebase case, probably
these lists should get pruned eventually for non-existent objects.

But maybe it is not worth worrying about this optimization at first, and
we can see if people complain. In that case, it is perhaps worth a note

I don't think there are any portability issues with 'date' (especially
since it appears to be just a comment here, so we don't really care
about the format), but in general I think it is nicer to use perl's date

Notably absent: any tests.

-Peff
--

From: Adam Brewster
Date: Tuesday, July 1, 2008 - 6:36 pm

Hi Jeff,

Thank you for your feedback.  I have made most of the code changes you
suggested, and am in the process of writing tests, but it looks like
some others on the list have more serious objections, so I'll hold of
on that until I think it might actually be accepted.


Makes sense,  I thought it was small enough for one commit, but I'll

Okay.  I can also include a third patch for the code I cut-and-pasted.

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 11a7eae..73fe334 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -582,7 +582,7 @@ static void read_revisions_from_stdin(struct rev_info *revs)
        while (fgets(line, sizeof(line), stdin) != NULL) {
                int len = strlen(line);
                if (len &amp;&amp; line[len - 1] == '\n')
-                       line[--len] = 0;
+                       line[--len] = '\0';
                if (!len)
                        break;





If all goes well then you're right, but I thought old objects should
be kept around  in case the user has some reason to manually delete
them.  As it is, you can go into the basis file and delete everything
past a given date line and be back where you were.  If I delete the
redundant objects, then that's not always possible.

It'd be nice if it could prune old objects (maybe older than 6 months,
or settable by git-config) that are redundant, but I currently have no
need for such functionality.

I also hadn't thought about rebasing.  Objects that don't exist
shouldn't hurt anything though.  Just a waste of a little disk space.


Maybe I'm a idiot, but I can't find any built-in date to string
functions that do nice things like print the date the way the user
says he likes to look at dates.

I updated the comment line to be &quot;# &lt;git-date&gt; // `date`&quot; where
git-date is as per git-fast-import (seconds since 1969 +/-TZ).  If
automatic pruning ever happens, the git-date will be used, so `date`

Working on those.  I'll also include tests for ...
From: Jay Soffian
Date: Tuesday, July 1, 2008 - 7:10 pm

perldoc -f localtime

j.
--

From: Adam Brewster
Date: Tuesday, July 1, 2008 - 7:16 pm

But of course one function returns two very different things depending
on what's on the left side of the equals sign.  That makes perfect

Thank you.
Adam
--

From: Jay Soffian
Date: Tuesday, July 1, 2008 - 7:21 pm

Context should be the very first thing taught in any Perl tutorial,
lest ye end up in jail:


j.
--

From: Jeff King
Date: Tuesday, July 1, 2008 - 8:21 pm

I think in this instance it is not too big a deal either way.  I am just

Heh. I said &quot;usually&quot;, but I guess even Junio makes mistakes (unless I
am dreaming such a style directive, but ISTR it being mentioned before
on the list). I wouldn't bother with the style cleanup in rev-list
(usually for such small things, we just wait until touching that part of

Ah, true. And there will be duplicates here, if you have multiple refs
at the same spot in your bundle list. So it should remain as you have

Hmm, and that might be useful. Probably the best thing would be to leave
it as-is for now, then, with a note. Then we can decide the best pruning


Great. Glancing over Junio's comments, though, it might make sense to
integrate this more tightly with git-bundle, in which case the perl
stuff would go away. So I'll let you work out with him which is the best
route.

-Peff
--

From: Jakub Narebski
Date: Wednesday, July 2, 2008 - 2:44 am

Well, there is one situation where either separate git-bases program
(which is a good start; it can be named git-bundle--bases; there are
some precedents for that ;-)), or allowing to create 'bases' file
without creating bundle would be good to have.  Namely situation
where two computers are _sometimes off-line (disconnected)_.  If you
want to transfer new commits from machine B to machine A, you would
generate 'bases' file on machine A, then transfer this file using some
off-line medium, then generate bundle on machine B using those bases,
etc.

-- 
Jakub Narebski
Poland
--

From: Jeff King
Date: Thursday, July 3, 2008 - 12:59 pm

Yes, certainly it is more flexible to have them split. I find Adam's
argument the most compelling, though. Think about moving commits as a
multi-step protocol:

  1. Local -&gt; Remote: Here are some new commits, basis..current
  2. Remote -&gt; Local: OK, I am now at current.
  3. Local: update basis to current

git-push has the luxury of asking for &quot;basis&quot; each time, so we know it
is correct. But with bundles, we can't do that. And failing to update
&quot;basis&quot; means we will send some extra commits next time. But updating
&quot;basis&quot; when we shouldn't means that the next bundle will be broken.

So I think even if people _do_ want to update &quot;basis&quot; when they create
the bundle (because it is more convenient, and they are willing to
accept the possibility of losing sync), it is trivial to create that
workflow on top of the separate components. But I can see why somebody
might prefer the separate components, and it is hard to create them if
the feature is lumped into &quot;git-bundle&quot; (meaning in such a way that you
cannot perform the steps separately; obviously git-bundle --basis would
be equivalent).

But I am not a bundle user, so that is just my outsider perspective.

-Peff
--

From: Adam Brewster
Date: Thursday, July 3, 2008 - 4:38 pm

How does everybody feel about the following:

- Leave git-basis as a small perl script.

- Add a -b/--basis option in git-bundle that calls git-basis.  Any
objects mentioned in the output would be excluded from the bundle.
Multiple --basis options will call git-basis once with several
arguments to generate the intersection of specified bases.

- (maybe) Add an option &quot;--update-bases&quot; to automatically call
git-basis --update after the bundle is created successfully.

- Change the syntax a bit so git-basis --show does what git-basis
alone does now (because the user will no longer need to interact with
that command).

There's still plenty of potential for improvements, like a --gc mode
to clean up basis files, a --rewind option to undo an incorrect
--update, or improvements in the way it calculates intersections, but
I think that with these changes the system is as simple as possible
while maximizing flexibility, utility, and usability.

Adam
--

From: Johannes Schindelin
Date: Thursday, July 3, 2008 - 5:44 pm

Hi,



So the only function of -b would be to fork() &amp;&amp; exec() a _shell_ script?  

Rather, have it as a feature to auto-detect if there is a &quot;.basis&quot; file of 
the same basename (or, rather &quot;.state&quot;, a I find &quot;basis&quot; less than 
descriptive), and rewrite it if it was there.



Rather hard, would you not think?  The information is either not there, or 


I am not convinced.  This sort of feature belongs into git-bundle.  It 
certainly does not deserve being blessed by yet-another git-* command, 
when we are constantly being bashed for having _way_ too many _already_.

Ciao,
Dscho

--

From: Adam Brewster
Date: Thursday, July 3, 2008 - 7:04 pm

I'm not exactly sure what your objection is here, but I guess it's
either you don't want a separate tool called git-basis that does all
of the things I've described, or you don't like my choice of perl to
do the work.

If the former, I suggest that my approach is consistent with the
philosophy of doing one thing and doing it well.  I acknowledge that
it could do it's one job better (see potential improvements in my last
email), but that doesn't seem to be your complaint.

If the latter, I wonder what practical advantage comes from redoing
what I've already done in C.  It would be slightly faster, but I'm not
worried about saving a few milliseconds in a process that takes at
least a couple of minutes (considering of course the time it takes to

Not quite.  It would be more like a shortcut for

git-basis --show basis | git-bundle create bundle -all --stdin or
git-bundle create bundle --all &lt;( git-basis --show basis )

(the latter of which of course wouldn't work because git-basis doesn't
take filenames.

The bundle creation is still done by git-bundle.  git-basis is just
deciding what should (not) be included in the bundle.

It seems like similar architectures have been accepted to support the

Just because I'm creating a bundle, doesn't guarantee that the bundle
will be installed on any particular remote system, so I think that
updating the basis without being told to do so by the user is a bad
idea.  For example, when creating a bundle, I find it's best to
exclude objects that I know exist on ALL of my systems, so that I can
pull from it anywhere, but usually I only end up sending it to one
system.

With regard to the use of the word basis, it comes from the
documentation of git-bundle.  It's been a while since I took linear
algebra, but if I remember correctly, a basis is a set of vectors that
describe a vector space, such that a combination of those vectors
yields any point in the space.  The analogy isn't perfect, but I think
it's pretty close.  The word state ...
From: Mark Levedahl
Date: Friday, July 4, 2008 - 9:47 am

Actually, I would like to see &quot;normal&quot; interface for git-bundle handled 
by git-push, git-fetch, and git-remote. We fixed the retrieve side to 
use git-fetch before first integration, but didn't understand the 
semantics for creation well-enough to put into git-push. Right now, we 
can do &quot;git remote add bundle-nick &lt;path-to-bundle&gt;&quot; and set up the remote.

We should have &quot;git push bundle-nick&quot;  create the new bundle, updating 
the basis refs kept somewhere in refs/* (possibly refs/remotes, possibly 
refs/bundles?).

However, we need two helpers to maintain the basis refs, both I believe 
should be sub-commands of git-remote:

- a &quot;rewind&quot; function to roll the refs back to a previous state because 
the bundle didn't get applied, whatever. This is well supported by 
reflogs, is &quot;expire anything *newer* than time&quot;, and for convenience 
should apply to all refs for the given remote so the user doesn't have 
to invoke per branch on the remote. e.g., &quot;git remote rewind bundle-nick 
3.days.ago&quot;.

- a &quot;prune&quot; function to remove any branch for the remote that is not 
known to the local refs/heads/* hierarchy. This is needed to support 
cleaning up pruned topic branches. Could be a special behavior of &quot;git 
remote prune&quot; triggered by the remote being a bundle, but that might be 
confusing a perhaps need a new sub-command name. Perhaps, &quot;git remote 
prune-non-local bundle-nick&quot;

If we did the above, then git-bundle can be relegated to plumbing and 
bundles become better integrated to the porcelain.

Mark

--

From: Jakub Narebski
Date: Friday, July 4, 2008 - 1:55 pm

I think that because &quot;git push directory&quot; (local push) with error in
directory name, which otherwise would result in error, can be mistaken
for &quot;git push bundle&quot;, that we would want to either use pseudo-protocol
(&quot;git push bundle://path/to/bundle&quot;) or some extension...

-- 
Jakub Narebski
Poland
--

From: Jeff King
Date: Friday, July 4, 2008 - 12:51 pm

I was thinking about Mark's approach, and I think there are two distinct
differences from yours:

  1. he updates the basis upon bundle creation, rather than as a
     separate step (and I have already commented on this)

  2. he stores the basis in the refs hierarchy

I actually think '2' makes a lot of sense. Storing the basis as refs
gets you:

  - an easy implementation; you use existing git tools

  - correct reachability analysis, since the refs will be taken into
    account by git-fsck, meaning you won't ever accidentally prune
    your basis objects

  - free logging of your basis history, in the form of reflogs

  - free gc in the usual reflog way

IIRC, Mark suggested putting them under refs/remotes/&lt;bundle&gt;, and you
objected that you didn't want to clutter that hierarchy. If that is a
problem, you can always use refs/basis/&lt;bundle&gt;, which will be ignored
by gitk and &quot;git branch -a&quot;, but will be correctly handled by other
tools.

And then suddenly your perl script gets a lot simpler, and is either a
short shell script, or even better, can be written in C as part of
git-bundle. So you would have something like &quot;git bundle --update-basis
&lt;basis&gt;&quot; instead of &quot;git-basis&quot;, and a config option like
&quot;bundle.autoUpdateBasis&quot; to update the basis whenever you create a
bundle.

-Peff
--

Previous thread: [PATCH] bisect: error out when given any good rev that is not an ancestor of the bad rev by Christian Couder on Monday, June 30, 2008 - 3:42 pm. (11 messages)

Next thread: [PATCH] Teach "git apply" to prepend a prefix with "--root=<root>" by Johannes Schindelin on Monday, June 30, 2008 - 4:44 pm. (8 messages)