Linux: Continued git Development

Submitted by Jeremy
on April 14, 2005 - 3:59am

Rapid git development [story] continues, though most discussions have moved from the lkml to a new git mailing list. Petr Baudis maintains a git-pasky branch of the content manager which he describes as, "my set of scripts upon Linus Torvald's git, which aims to provide a humanly usable interface, to a degree similar to a SCM tool." Junio Hamano has also been working on a "quick and dirty" Perl script to handle automatic merging between multiple git trees when possible.

During the development discussions, a link was provided to a Register article which invented a quote by Linus to make a point regarding the recent BitKeeper excitement [story]. Linus' calmly noted that he's a fairly regular reader of the site, "and hey, being opinionated and a bit over the top is what makes the site worthwhile. It's obviously what motivates the people." He continues, "and then, occasionally, when they bite you, hey, that's the price of having a high profile. I worry more about sometimes not listening to critics than I do about the critics themselves." Finally, he reflects, "thick skin is the name of the game. I'd not get any work done otherwise."


From: Linus Torvalds [email blocked]
To: Petr Baudis [email blocked]
Subject: git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3)
Date: 	Wed, 13 Apr 2005 11:22:02 -0700 (PDT)



On Wed, 13 Apr 2005, Petr Baudis wrote:
>
> Dear diary, on Wed, Apr 13, 2005 at 07:01:34PM CEST, I got a letter
> where Daniel Barkalow [email blocked] told me that...
> > For future reference, git is unhappy if you actually do this, because your
> > HEAD won't match the (empty) contents of the new directory. The easiest
> > thing is to cp -r your original, replace the shared stuff with links, and
> > go from there.
> 
> How is it unhappy?

I think it's just Daniel being unhappy because he didn't do the read-tree
+ checkout-cache + update-cache steps ;)

Btw, I'm going to stop cc'ing linux-kernel on git issues (after this
email, which also acts as an announcement for people who haven't noticed
already), since anybody who is interested in git can just use the
"git@vger.kernel.org" mailing list:

	echo 'subscribe git' | mail [email blocked]

to get you subscribed (and you'll get a message back asking you to
authorize it to avoid spam - if you don't get anything back, it failed).

		Linus


From: Petr Baudis [email blocked] Subject: [ANNOUNCE] git-pasky-0.4 Date: Thu, 14 Apr 2005 02:19:38 +0200 Hello, I'm happy to announce git-pasky-0.4, my set of scripts upon Linus Torvald's git, which aims to provide a humanly usable interface, to a degree similar to a SCM tool. You can get it at http://pasky.or.cz/~pasky/dev/git/ See the READMEs etc for some introduction. It is difficult to sum up the changes, since so much has changed, including almost the complete tree history, which was cleaned up and sanitized. Things should be faster, better, less buggy and generally smoother. My immediate plans are to support several working trees connected to a single object database. The scenes are set, prepared, and it should be easy. Then, supporting merges in a separate temporary tree will be a breeze. ;-) Have fun, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
From: Petr Baudis [email blocked] Subject: Merge with git-pasky II. Date: Thu, 14 Apr 2005 02:29:02 +0200 Hello Linus, I think my tree should be ready for merging with you. It is the final tree and I've already switched my main branch for it, so it's what people doing git pull are getting for some time already. Its main contents are all of my shell scripts. Apart of that, some tiny fixes scattered all around can be found there, as well as some patches which went through the mailing list. My last merge with you concerned your commit 39021759c903a943a33a28cfbd5070d36d851581. It's again rsync://pasky.or.cz/git/ this time my HEAD is fba83970090ef54c6eb86dcc2c2d5087af5ac637. Note that my rsync tree still contains even my old branch; I thought I'd leave it around in the public objects database for some time, shall anyone want to have a look at the history of some of the scripts. But if you want it gone, tell me and I will prune it (and perhaps offer it in /git-old/ or whatever). I'm using the following: fsck-cache --unreachable $(commit-id) | grep unreachable \ | cut -d ' ' -f 2 | sed 's/^\(..\)/.git\/objects\/\1\//' \ | xargs rm Thanks, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
From: Junio C Hamano [email blocked] Subject: Re: Merge with git-pasky II. Date: Thu, 14 Apr 2005 00:05:42 -0700 >>>>> "LT" == Linus Torvalds [email blocked] writes: LT> On that note - I've been avoiding doing the merge-tree thing, in the hope LT> that somebody else does what I've described. I now have a Perl script that uses rev-tree, cat-file, diff-tree, show-files (with one modification so that it can deal with pathnames with embedded newlines), update-cache (with one modification so that I can add an entry for a file that does not exist to the dircache) and merge (from RCS). Quick and dirty. The changes to show-files is to give it an optional '-z' flag, which chanegs record terminator to NUL character instead of LF. The script git-merge.perl takes two head commits. It basically follows what you described as I remember ;-): 1. runs rev-tree with --edges to find the common anscestor. 2. creates a temporary directory "./,,merge-temp"; create a symlink ./,,merge-temp/.git/objects that points at .git/objects. 3. sets up dircache there, initially populated with this common ancestor tree. No files are checked out. Just set up .git/index and that's it. 4. runs diff-tree to find what has been changed in each head. 5. for each path involved: 5.0 if neither heads change it, leave it as is; 5.1 if only one head changes a path and the other does not, just get the changed version; 5.2 if both heads change it, check all three out and run merge. It does not currently commit. You can go to ./,,merge-temp/ and see show-diff to see the result of the merge. Files added in one head has already been run "update-cache" when the script ends, but changed and merged files are not---dircache still has the common ancestor view. So show-diff you will be seeing may be enormous and not very useful if two forks were done in the distant past. After reviewing the merge result, you can update-cache, write-tree and commit-tree as usual, but with one caveat: do not run "show-files | xargs update-cache" if you are running git-merge.perl without -f flag! By default, git-merge.perl creates absolute minimum number of files in ./,,merge-temp---only the merged files are left there so that you can inspect them. You will not see unmodified files nor files changed only by one side of the merge. If you give '-o' (oneside checkout) flag to git-merge.perl, then the files only one side of the merge changed are also checked out in ./,,merge-temp. If you give '-f' (full checkout) flag to git-merge.perl, then in addition to what '-o' checks out, unchanged files are checked out in ./,,merge-temp. This default is geared towards a huge tree with small merges (favorite case of Linus, if I understand correctly). Running 'show-diff' in such a sparsely populated merge result tree gives you huge results because recent show-diff shows diffs with empty files. I added a '-r' flag to show-diff, which squelches diffs with empty files. Also to implement 'changed only by one-side' without actually checking the file out, I needed to add one option to 'update-cache'. --cacheinfo flag is used this way: $ update-cache --cacheinfo mode sha1 path and adds the pathname with mode and sha1 to the .git/index without actually requiring you to have such a file there. Signed-off-by: Junio C Hamano [email blocked] [patch]
From: Linus Torvalds [email blocked] Subject: Re: Merge with git-pasky II. Date: Thu, 14 Apr 2005 01:06:56 -0700 (PDT) On Thu, 14 Apr 2005, Junio C Hamano wrote: > > I now have a Perl script that uses rev-tree, cat-file, > diff-tree, show-files (with one modification so that it can deal > with pathnames with embedded newlines), update-cache (with one > modification so that I can add an entry for a file that does not > exist to the dircache) and merge (from RCS). Quick and dirty. That's exactly what I wanted. Q'n'D is how the ball gets rolling. In the meantime I wrote a very stupid "merge-tree" which does things slightly differently, but I really think your approach (aka my original approach) is actually a lot faster. I was just starting to worry that the ball didn't start, so I wrote an even hackier one. My really hacky one is called "merge-tree", and it really only merges one directory. For each entry in the directory it says either select <mode> <sha1> path or merge <mode>-><mode>,<mode> <sha1>-><sha1>,<sha1> path depending on whether it could directly select the right object or not. It's actually exactly the same algorithm as the first one, but I was afraid the first one would be so abstract that it (a) might not work and (b) wouldn't get people to work it out. This "one directory at a time with very explicit output" thing is much more down-to-earth, but it's also likely slower because it will need script help more often. That said, I don't know. MOST of the time there will be just a single "directory" entry that needs merging, and then the script would just need to recurse into that directory with the new "tree" objects. So it might not be too horrible. But I'm really happy that you seem to have implemented my first suggestion and I seem to have been wasting my time. > 5. for each path involved: > > 5.0 if neither heads change it, leave it as is; > 5.1 if only one head changes a path and the other does not, just > get the changed version; > 5.2 if both heads change it, check all three out and run merge. You missed one case: 5.0.1 if both heads change it to the same thing, take the new thing but maybe you counted that as 5.0 (it _should_ fall out automatically from the fact that "diff-tree" between the two destination trees shows no difference for such a file). Now, arguably, your 5.2 will do things right, but the thing is, it's actually fairly _common_ that both heads have changed something to the same thing. Namely if there was a previous merge that already handled that case, but that previous merge may not be a proper parent of the new commits. So from a performance standpoint you really don't want to consider that to be a merge - you just pick up the new contents directly. See? (My stupid "merge-tree" should show the algorithm in painful obviousity. Of course, my stipid merge-tree may also be painfully buggy. You be the judge). > It does not currently commit. You can go to ./,,merge-temp/ and > see show-diff to see the result of the merge. Files added in > one head has already been run "update-cache" when the script > ends, but changed and merged files are not---dircache still has > the common ancestor view. That sounds good. > Also to implement 'changed only by one-side' without actually > checking the file out, I needed to add one option to > 'update-cache'. --cacheinfo flag is used this way: > > $ update-cache --cacheinfo mode sha1 path Yes. My "merge-tree" needs the exact same thing. Looks good from your explanation, but I'm too tired to look at the code. It's 1AM, and the kids get up at 7. I'm not much of a hacker, I usually crash by 10PM these days ;^) Linus
From: Junio C Hamano [email blocked] Subject: Re: Merge with git-pasky II. Date: Thu, 14 Apr 2005 01:39:36 -0700 >>>>> "LT" == Linus Torvalds [email blocked] writes: LT> But I'm really happy that you seem to have implemented my first LT> suggestion and I seem to have been wasting my time. Thanks for the kind words. >> 5. for each path involved: >> >> 5.0 if neither heads change it, leave it as is; >> 5.1 if only one head changes a path and the other does not, just >> get the changed version; >> 5.2 if both heads change it, check all three out and run merge. LT> You missed one case: LT> 5.0.1 if both heads change it to the same thing, take the new thing LT> but maybe you counted that as 5.0 (it _should_ fall out automatically from LT> the fact that "diff-tree" between the two destination trees shows no LT> difference for such a file). Actually I am not handling that. It really is 5.1a---the exact same code path as 5.1 can be used for this case, and as you point out it is really a quite important optimization. I have to handle the following cases. I think I currently do wrong things to them: 5.1a both head modify to the same thing. 5.1b one head removes, the other does not do anything. 5.1c both head remove. 5.3 one head removes, the other head modifies. Handling of 5.1a, 5.1b and 5.1c are obvious. 5.1a Update dircache to the same new thing. Without -f or -o flag do not touch ,,merge-temp/. directory; with -f or -o, leave the new file in ,,merge-temp/. 5.1b Remove the path from dircache and do not have the file in ,,merge-temp/. directory regardless of -f or -o flags. 5.1c Same as 5.1b I am not sure what to do with 5.3. My knee-jerk reaction is to leave the modified result in ,,merge-temp/$path~ without touching dircache. If the merger wants to pick it up, he can rename $path~ to $path temporarily, run show-diff on it (I think giving an option to show-diff to specify paths would be helpful for this workflow), to decide if he wants to keep the file or not. Suggestions?
From: Linus Torvalds [email blocked] Subject: Re: Merge with git-pasky II. Date: Thu, 14 Apr 2005 02:10:22 -0700 (PDT) On Thu, 14 Apr 2005, Junio C Hamano wrote: > > I have to handle the following cases. I think I currently do > wrong things to them: > > 5.1a both head modify to the same thing. > 5.1b one head removes, the other does not do anything. > 5.1c both head remove. > 5.3 one head removes, the other head modifies. There's another interesting set of cases: one side creates a file, and the other one creates a directory. > I am not sure what to do with 5.3. My very _strong_ preference is to just inform the user about a merge that cannot be performed, and not let it be automated. BIG warning, with some way for the user to specify the end result. The thing is, these are pretty rare cases. But in order to make people feel good about the _common_ case, it's important that they feel safe about the rare one. Put another way: if git tells me when it can't do something (with some specificity), I can then fix the situation up and try again. I might curse a while, and maybe it ends up being so common that I might even automate it, but at least I'll be able to trust the end result. In contrast, if git does something that _may_ be nonsensical, then I'll worry all the time, and not trust git. That's much worse than an occasional curse. So the rule should be: only merge when it's "obviously the right thing". If it's not obvious, the merge should _not_ try to guess what the right thing is. It's much better to fail loudly. (That's especially true early on. There may be cases that end up being obvious after some usage. But I'd rather find them by having git be too stupid, than find out the hard way that git lost some data because it thought it was ok to remove a file that had been modified) Linus
From: Junio C Hamano [email blocked] To: Linus Torvalds [email blocked] Subject: Re: Merge with git-pasky II. Date: Thu, 14 Apr 2005 04:14:13 -0700 Here is a diff to update the git-merge.perl script I showed you earlier today ;-). It contains the following updates against your HEAD (bb95843a5a0f397270819462812735ee29796fb4). * git-merge.perl command we talked about on the git list. I've covered the changed-to-the-same case etc. I still haven't done anything about file-vs-directory case yet. It does warn when it needed to run merge to automerge and let merge give a warning message about conflicts if any. In modify/remove cases, modified in one but removed in the other files are left in either $path~A~ or $path~B~ in the merge temporary directory, and the script issues a warning at the end. * show-files and ls-tree updates to add -z flag to NUL terminate records; this is needed for git-merge.perl to work. * show-diff updates to add -r flag to squelch diffs for files not in the working directory. This is mainly useful when verifying the result of an automated merge. * update-cache updates to add "--cacheinfo mode sha1" flag to register a file that is not in the current working directory. Needed for minimum-checkout merging by git-merge.perl. [patch]
From: Linus Torvalds [email blocked] Subject: Re: Re: Merge with git-pasky II. Date: Wed, 13 Apr 2005 20:51:50 -0700 (PDT) On Thu, 14 Apr 2005, Petr Baudis wrote: > > http://www.theregister.co.uk/2005/04/11/torvalds_attack/ ... I'm nothing > like a regular reader of (R), but I thought the guys have at least a bit > of sense. Duh. :/ Or is April 11 now yet another joke day after April 1? I actually _am_ a fairly regular reader, and hey, being opinionated and a bit over the top is what makes the site worthwhile. It's obviously what motivates the people. And then, occasionally, when they bite you, hey, that's the price of having a high profile. I worry more about sometimes not listening to critics than I do about the critics themselves. Thick skin is the name of the game. I'd not get any work done otherwise. On that note - I've been avoiding doing the merge-tree thing, in the hope that somebody else does what I've described. I really do suck at scripting things, yet this is clearly something where using C to do a lot of the stuff is pointless. Almost all the parts do seem to be there, ie Daniel did the "common parent" part, and the rest really does seem to be more about scripting than writing more C plumbing stuff.. Linus

Related Links:
AttachmentSize
get-merge.perl.diff8.02 KB
get-merge.perl-2.diff10.77 KB

Darcs can fill this void

Charles Goodwin (not verified)
on
April 14, 2005 - 11:19am

There has been quite a lot of excitement on the Darcs lists since the search for a new SCM began for Linux. One of Darcs' primary problems was performance and this opportunity has motivated people, especially David Roundy, to seriously address the performance failings of Darcs. Indeed, he even speculated that git could be used as a component of Darcs.

If a project like Darcs were overcome peoples reservations, are the prospects of git such that it's accelerated development has made it the primary candidate for the job of the Linux SCM even if there were better candidates around? Wouldn't it be better to use a long-standing, well-tested, actively maintained SCM than an ad-hoc program that's bound to suffer severe growing pains as features are tacked on without it's design being suited to them.

Re: Darcs can fill this void

SecMF
on
April 14, 2005 - 1:55pm

Wouldn't it be better to use a long-standing, well-tested, actively maintained SCM than an ad-hoc program that's bound to suffer severe growing pains as features are tacked on without it's design being suited to them.

I think it's the other way around. All the possible candidates show there performance isn't up to the task. Git seems to be (become) an engine that enables the SCM to be scripted without performance loss. I think that should tell you that the design is sound.

It would be interesting to see how darcs or monotone perform when they use git as there engine.

What is git doing?

Anonymous (not verified)
on
April 15, 2005 - 1:44am

It would be nice if you would explain to me what git does WRT to a system like Darcs, monotone... I know what Darcs and so on do and I know that some of those systems have an API to extend them (eg. subversion has ana api and you can use ruby to extend it).

So: Torvalds et al. coded git. Git is looking for changes and how "commit" them in a local filesystem. _AFTER_ that for example Darcs would be called to really commit into a public repository? If this is true git seems to me like a preprocessor or something like that.

Please enlighten me :-)

Re: What is git doing?

SecMF
on
April 15, 2005 - 6:27am

git can not be used as an extention to SCM, but rather as the low-level method an SCM uses to store files etc. So darcs would not be called after the commit, but darcs would call git in order to DO the commit.

Isn't Linus forbidden to work upon a Source Control System?

Marco Menardi (not verified)
on
April 21, 2005 - 4:52am

Hi, forgive my question (and my ignorance on the subject), but I remember something in bitkeeper license agreement that prevents you to working upon a competitive product for at least also 1 year after your license expired...
So how can Linus work on this project since he had to sigh the bitkeeper license (even if free of charge)?

re: forbidden?

Superstoned (not verified)
on
April 21, 2005 - 6:22am

maybe, but they terminated the license. I'd say shit to them...

I doubt that Linus' license agreement read quite that way.

ken (not verified)
on
April 27, 2005 - 3:56pm

As I seem to recall, the BitKeeper license -- at least the generally used one -- prevented most any sort of parallel development; for example, there was a while there where Larry was making noise about anyone using BitKeeper for ReiserFS4 violating the license (due to 4's database-esque/revision mechanisms). Linus, realizing that he might *have* to work on stuff like that, probably made Larry word his license a bit differently before signing on ye olde dotted line. I may well be wrong, but I'd be surprised...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.