Git-basis is a perl script that remembers bases for use by git-bundle. Code from rev-parse was borrowed to allow git-bundle to handle --stdin. Signed-off-by: Adam Brewster <adambrewster@gmail.com> --- As promised, here's another patch with documentation. The code is identical to the previous version. I know this is a minor patch, but I think the result is a more usable git-bundle feature, and I'd like to see it included in future releases if there are no objections. Documentation/git-basis.txt | 82 +++++++++++++++++++++++++++++++++++++++++++ bundle.c | 22 ++++++++++- git-basis | 71 +++++++++++++++++++++++++++++++++++++ 3 files changed, 173 insertions(+), 2 deletions(-) create mode 100644 Documentation/git-basis.txt create mode 100755 git-basis diff --git a/Documentation/git-basis.txt b/Documentation/git-basis.txt new file mode 100644 index 0000000..3624890 --- /dev/null +++ b/Documentation/git-basis.txt @@ -0,0 +1,82 @@ +git-basis(1) +============ + +NAME +---- +git-basis - Track sets of references available on remote systems (bases) + +SYNOPSIS +-------- +[verse] +'git-basis' <basis> [<basis>...] +'git-basis' --update <basis> [<basis>...] < <object list or bundle> + +DESCRIPTION +----------- +Maintains lists of objects that are known to be accessible on remote +computer systems that are not accessible by network. + +OPTIONS +------- + +basis:: + List of bases to operate on. Any valid filename can be + the name of a basis. Bases that do not exist are taken + to be empty. + +--update:: + Tells git-basis to read a list of objects from stdin and + add them to each of the given bases. git-basis produces + no output when this option is given. Bases will be created + if necessary. + +object list or bundle:: + Git-basis --update reads object names, one per line from stdin. + Leading caret ("^") characters are ignored, as is anything + after ...
I don't use bundles myself, so I can't comment on how useful this is for a bundle-based workflow. But it seems like a sensible idea in general. When a new feature depends on other, more generic improvements to existing code, it is usually split into two patches. E.g., 1/2: add --stdin to git-bundle 2/2: add git-basis with the advantages that: - it is slightly easier to review each change individually - it is easier for other features to build on the generic improvement without requiring part 2, especially if part 2 is questionable As it happens in this case, I think in this case the change was already easy to read, being logically separated by file, so I am nitpicking Yikes. This fails if $d contains an apostrophe. You'd want to use quotemeta to properly shell out. But there's no need at all to shell out Why make a hash when the only thing we ever do with it is "keys %new"? Style: I know we are not consistent within git, but it is usually better to use local variables for filehandles these days. I.e., So the basis just grows forever? That is, each time we do a bundle and basis update, we add a line for every changed ref, and we never delete any lines. But having a commit implies having all of its ancestors, so in the normal case (i.e., no rewind or rebase) we can simply replace old objects if we know they are a subset of the new ones (which you can discover with git-merge-base). For the rewind/rebase case, probably these lists should get pruned eventually for non-existent objects. But maybe it is not worth worrying about this optimization at first, and we can see if people complain. In that case, it is perhaps worth a note I don't think there are any portability issues with 'date' (especially since it appears to be just a comment here, so we don't really care about the format), but in general I think it is nicer to use perl's date Notably absent: any tests. -Peff --
Hi Jeff,
Thank you for your feedback. I have made most of the code changes you
suggested, and am in the process of writing tests, but it looks like
some others on the list have more serious objections, so I'll hold of
on that until I think it might actually be accepted.
Makes sense, I thought it was small enough for one commit, but I'll
Okay. I can also include a third patch for the code I cut-and-pasted.
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 11a7eae..73fe334 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -582,7 +582,7 @@ static void read_revisions_from_stdin(struct rev_info *revs)
while (fgets(line, sizeof(line), stdin) != NULL) {
int len = strlen(line);
if (len && line[len - 1] == '\n')
- line[--len] = 0;
+ line[--len] = '\0';
if (!len)
break;
If all goes well then you're right, but I thought old objects should
be kept around in case the user has some reason to manually delete
them. As it is, you can go into the basis file and delete everything
past a given date line and be back where you were. If I delete the
redundant objects, then that's not always possible.
It'd be nice if it could prune old objects (maybe older than 6 months,
or settable by git-config) that are redundant, but I currently have no
need for such functionality.
I also hadn't thought about rebasing. Objects that don't exist
shouldn't hurt anything though. Just a waste of a little disk space.
Maybe I'm a idiot, but I can't find any built-in date to string
functions that do nice things like print the date the way the user
says he likes to look at dates.
I updated the comment line to be "# <git-date> // `date`" where
git-date is as per git-fast-import (seconds since 1969 +/-TZ). If
automatic pruning ever happens, the git-date will be used, so `date`
Working on those. I'll also include tests for ...But of course one function returns two very different things depending on what's on the left side of the equals sign. That makes perfect Thank you. Adam --
Context should be the very first thing taught in any Perl tutorial, lest ye end up in jail: j. --
I think in this instance it is not too big a deal either way. I am just Heh. I said "usually", but I guess even Junio makes mistakes (unless I am dreaming such a style directive, but ISTR it being mentioned before on the list). I wouldn't bother with the style cleanup in rev-list (usually for such small things, we just wait until touching that part of Ah, true. And there will be duplicates here, if you have multiple refs at the same spot in your bundle list. So it should remain as you have Hmm, and that might be useful. Probably the best thing would be to leave it as-is for now, then, with a note. Then we can decide the best pruning Great. Glancing over Junio's comments, though, it might make sense to integrate this more tightly with git-bundle, in which case the perl stuff would go away. So I'll let you work out with him which is the best route. -Peff --
Well, there is one situation where either separate git-bases program (which is a good start; it can be named git-bundle--bases; there are some precedents for that ;-)), or allowing to create 'bases' file without creating bundle would be good to have. Namely situation where two computers are _sometimes off-line (disconnected)_. If you want to transfer new commits from machine B to machine A, you would generate 'bases' file on machine A, then transfer this file using some off-line medium, then generate bundle on machine B using those bases, etc. -- Jakub Narebski Poland --
Yes, certainly it is more flexible to have them split. I find Adam's argument the most compelling, though. Think about moving commits as a multi-step protocol: 1. Local -> Remote: Here are some new commits, basis..current 2. Remote -> Local: OK, I am now at current. 3. Local: update basis to current git-push has the luxury of asking for "basis" each time, so we know it is correct. But with bundles, we can't do that. And failing to update "basis" means we will send some extra commits next time. But updating "basis" when we shouldn't means that the next bundle will be broken. So I think even if people _do_ want to update "basis" when they create the bundle (because it is more convenient, and they are willing to accept the possibility of losing sync), it is trivial to create that workflow on top of the separate components. But I can see why somebody might prefer the separate components, and it is hard to create them if the feature is lumped into "git-bundle" (meaning in such a way that you cannot perform the steps separately; obviously git-bundle --basis would be equivalent). But I am not a bundle user, so that is just my outsider perspective. -Peff --
How does everybody feel about the following: - Leave git-basis as a small perl script. - Add a -b/--basis option in git-bundle that calls git-basis. Any objects mentioned in the output would be excluded from the bundle. Multiple --basis options will call git-basis once with several arguments to generate the intersection of specified bases. - (maybe) Add an option "--update-bases" to automatically call git-basis --update after the bundle is created successfully. - Change the syntax a bit so git-basis --show does what git-basis alone does now (because the user will no longer need to interact with that command). There's still plenty of potential for improvements, like a --gc mode to clean up basis files, a --rewind option to undo an incorrect --update, or improvements in the way it calculates intersections, but I think that with these changes the system is as simple as possible while maximizing flexibility, utility, and usability. Adam --
Hi, So the only function of -b would be to fork() && exec() a _shell_ script? Rather, have it as a feature to auto-detect if there is a ".basis" file of the same basename (or, rather ".state", a I find "basis" less than descriptive), and rewrite it if it was there. Rather hard, would you not think? The information is either not there, or I am not convinced. This sort of feature belongs into git-bundle. It certainly does not deserve being blessed by yet-another git-* command, when we are constantly being bashed for having _way_ too many _already_. Ciao, Dscho --
I'm not exactly sure what your objection is here, but I guess it's either you don't want a separate tool called git-basis that does all of the things I've described, or you don't like my choice of perl to do the work. If the former, I suggest that my approach is consistent with the philosophy of doing one thing and doing it well. I acknowledge that it could do it's one job better (see potential improvements in my last email), but that doesn't seem to be your complaint. If the latter, I wonder what practical advantage comes from redoing what I've already done in C. It would be slightly faster, but I'm not worried about saving a few milliseconds in a process that takes at least a couple of minutes (considering of course the time it takes to Not quite. It would be more like a shortcut for git-basis --show basis | git-bundle create bundle -all --stdin or git-bundle create bundle --all <( git-basis --show basis ) (the latter of which of course wouldn't work because git-basis doesn't take filenames. The bundle creation is still done by git-bundle. git-basis is just deciding what should (not) be included in the bundle. It seems like similar architectures have been accepted to support the Just because I'm creating a bundle, doesn't guarantee that the bundle will be installed on any particular remote system, so I think that updating the basis without being told to do so by the user is a bad idea. For example, when creating a bundle, I find it's best to exclude objects that I know exist on ALL of my systems, so that I can pull from it anywhere, but usually I only end up sending it to one system. With regard to the use of the word basis, it comes from the documentation of git-bundle. It's been a while since I took linear algebra, but if I remember correctly, a basis is a set of vectors that describe a vector space, such that a combination of those vectors yields any point in the space. The analogy isn't perfect, but I think it's pretty close. The word state ...
Actually, I would like to see "normal" interface for git-bundle handled by git-push, git-fetch, and git-remote. We fixed the retrieve side to use git-fetch before first integration, but didn't understand the semantics for creation well-enough to put into git-push. Right now, we can do "git remote add bundle-nick <path-to-bundle>" and set up the remote. We should have "git push bundle-nick" create the new bundle, updating the basis refs kept somewhere in refs/* (possibly refs/remotes, possibly refs/bundles?). However, we need two helpers to maintain the basis refs, both I believe should be sub-commands of git-remote: - a "rewind" function to roll the refs back to a previous state because the bundle didn't get applied, whatever. This is well supported by reflogs, is "expire anything *newer* than time", and for convenience should apply to all refs for the given remote so the user doesn't have to invoke per branch on the remote. e.g., "git remote rewind bundle-nick 3.days.ago". - a "prune" function to remove any branch for the remote that is not known to the local refs/heads/* hierarchy. This is needed to support cleaning up pruned topic branches. Could be a special behavior of "git remote prune" triggered by the remote being a bundle, but that might be confusing a perhaps need a new sub-command name. Perhaps, "git remote prune-non-local bundle-nick" If we did the above, then git-bundle can be relegated to plumbing and bundles become better integrated to the porcelain. Mark --
I think that because "git push directory" (local push) with error in directory name, which otherwise would result in error, can be mistaken for "git push bundle", that we would want to either use pseudo-protocol ("git push bundle://path/to/bundle") or some extension... -- Jakub Narebski Poland --
I was thinking about Mark's approach, and I think there are two distinct
differences from yours:
1. he updates the basis upon bundle creation, rather than as a
separate step (and I have already commented on this)
2. he stores the basis in the refs hierarchy
I actually think '2' makes a lot of sense. Storing the basis as refs
gets you:
- an easy implementation; you use existing git tools
- correct reachability analysis, since the refs will be taken into
account by git-fsck, meaning you won't ever accidentally prune
your basis objects
- free logging of your basis history, in the form of reflogs
- free gc in the usual reflog way
IIRC, Mark suggested putting them under refs/remotes/<bundle>, and you
objected that you didn't want to clutter that hierarchy. If that is a
problem, you can always use refs/basis/<bundle>, which will be ignored
by gitk and "git branch -a", but will be correctly handled by other
tools.
And then suddenly your perl script gets a lot simpler, and is either a
short shell script, or even better, can be written in C as part of
git-bundle. So you would have something like "git bundle --update-basis
<basis>" instead of "git-basis", and a config option like
"bundle.autoUpdateBasis" to update the basis whenever you create a
bundle.
-Peff
--
