login
Header Space

 
 

Re: Comments on recursive merge..

Previous thread: Problem in update from the linux-2.6 repository by Marcel Holtmann on Monday, November 7, 2005 - 12:18 pm. (5 messages)

Next thread: [PATCH] Small bugfixes for http-push.c by Johannes Schindelin on Monday, November 7, 2005 - 2:23 pm. (1 message)
To: Junio C Hamano <junkio@...>, Fredrik Kuivinen <freku045@...>
Cc: Git Mailing List <git@...>
Date: Monday, November 7, 2005 - 12:48 pm

Guys,

  I just hit my first real rename conflict, and very timidly tried the 
"recursive" strategy in the hopes that I wouldn't need to do things by 
hand.

It resolved things beautifully. Good job. 

My only worry is that I don't read python, so I don't really know how it 
does what it does, which makes me nervous. Can somebody (Fredrik?) add 
some documentation about the merge strategy and how it works.

Considering that the stupid resolve strategy really requires you to know 
how git works when rename conflicts happen (things left in unmerged state 
are really quite hard to handle by hand unless you know exactly what 
you're doing), I'd almost suggest making "recursive" the default. I'm a 
bit nervous about it, but knowing how it works would probably put most of 
that to rest.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, Fredrik Kuivinen <freku045@...>, Git Mailing List <git@...>
Date: Monday, November 7, 2005 - 6:58 pm

It would be great if the recursive strategy could get some more
testing. I have tested it on a thousand commits or so in a few kernel
repositories and haven't found any bugs, but it could be due to errors
in the test setup, testing the wrong repositories or just being lucky. Some
real-world testing would be great.

- Fredrik

-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>, Fredrik Kuivinen <freku045@...>
Date: Monday, November 7, 2005 - 8:13 pm

This is the first time I see you pleased by something in git
that was done without very close supervision from you.  All the
credits for this one goes to Fredrik, of course, but it is a
small victory for me as the maintainer as well, and I am very

Another thing to consider is if it is fast enough for everyday
trivial merges.

In any case, I've been thinking about teaching git-merge to look
into .git/config to make it overridable which strategy to use by
default.  This would eliminate the hardcoded 'resolve for
two-head, octopus for more' rule from git-pull.  Then we could
ship git-merge with the default rule of 'recursive for two-head,
octopus for more', and if it turns out to be premature, you can
update your config file to use resolve for two-head case while
we sort things out.  That way, recursive would get wider test
coverage, and people who really need a working merge this minute
can choose to run resolve in emergency without specifying '-s
resolve' on the command line of 'git pull' every time.


-
To: Junio C Hamano <junkio@...>
Cc: <git@...>, Fredrik Kuivinen <freku045@...>
Date: Monday, November 7, 2005 - 8:33 pm

That sounds like a backhanded way of saying that I'm micromanagering, 

Hmm. True. The _really_ trivial in-index case triggers for me pretty 
often, but I haven't done any statistics. It might be only 50% of the 
time.

Is the recursive thing noticeably slower for the "easy" cases (ie things 
that the old regular resolve strategy does well)?

It's certainly an option to just do what I just did, namely use the 
default one until it breaks, and then just do "git reset --hard" and re-do 
the pull with "-s recursive". A bit sad, and it would be good to have 
coverage on the recursive strategy..

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Friday, November 11, 2005 - 6:25 pm

Just for fun, I randomly picked two heads/master commits from
linux-2.6 repository (one was when I happened to have pulled the
last time, and the other was when I thought this might be an
interesting exercise and pulled again), and fed the commits
between the two to a little script that looks at commits and
tries to stat what they did (the script ignores renames so they
appear as deletes and adds).

Here is what the script spitted out:

        Total commit objects: 3957
        Trivial Merges: 72 (1.82%)
        Merges: 225 (5.69%)
        Number of paths touched by non-merge commits:
                average 4.50, median 2, min 2, max 199
        Number of merge parents:
                average 2.00, median 2, min 2, max 2
        Number of merge bases:
                average 1.00, median 1, min 1, max 1
        File level merges:
                average 37.61, median 8, min 0, max 555
        Number of changed paths from the first parent:
                average 379.09, median 66, min 1, max 7553
        File level 3-ways:
                average 1.96, median 1, min 0, max 37
        Paths deleted:
                average 47.56, median 15, min 0, max 554

This counts what happened in individual devleoper's trees,
subsystem maintainer trees and your tree, not just what you saw
yourself.

Some observations.

 - Trivial Merges count is surprisingly high.  About 1/3 of
   merges are pure in-index merges.

 - Most of the commits (developer commits, not merges) are
   small and touches only a couple of paths.

 - Nobody does octopus ;-).

 - We did not have multi-base merge case during the period
   looked at (but the sample count is very low).

 - merge-one-file was called for only a handful (median 8)
   files, which is negligibly small compared to the total 17K
   files in the kernel tree, and fairly small compared to the
   number of changed paths from the first parent (meaning,
   read-tree trivial collapsing helped majorly).  Among them,
   the number of path...
To: Junio C Hamano <junkio@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Saturday, November 12, 2005 - 2:35 am

Mind sharing the script?

It'be nice to know if these stats are typical, or unusual when you get
numbers from a variety of other trees.
To: Ryan Anderson <ryan@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Saturday, November 12, 2005 - 3:44 am

Very unpolished but here they are.

I misread the trivial count in my original message.  Trivial and
Merge are counted separately, so among 3957 commits, merges were
297 (72 trivials and 225 others).

-- &gt;8 -- cut here -- &gt;8 --
Subject: [PATCH] GIT commit statistics

A set of scripts that read the existing commit history, and
show various stats.

Sample usage:

    # Arguments are given to git-rev-list; defaults to ORIG..
    # if not given, to retrace what was just pulled.
    $ ./contrib/jc-git-stat-1.sh v0.99.9g..maint |
      ./contrib/jc-git-stat-1-log.perl
    Total commit objects: 43
    Trivial Merges: 1 (2.33%)
    Merges: 1 (2.33%)
    Number of paths touched by non-merge commits:
	    average 3.00, median 2, min 2, max 18
    Number of merge parents:
	    average 2.50, median 3, min 2, max 3
    Number of merge bases:
	    average 1.00, median 1, min 1, max 1
    File level merges:
	    average 0.50, median 1, min 0, max 1
    Number of changed paths from the first parent:
	    average 28.00, median 52, min 4, max 52
    File level 3-ways:
	    average 1.00, median 1, min 1, max 1

 * "Trivial Merges" are the ones done by read-tree --trivial;
 * "Merges" are other merges;
 * "File level merges" are paths not collapsed by read-tree 3-way (i.e.
   given to merge-one-file);
 * "File level 3-ways" are paths merge-one-file would have run 'merge';

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;

---

 contrib/jc-git-stat-1-log.perl |   87 ++++++++++++++++++++++++++++++++++++++++
 contrib/jc-git-stat-1-mof.sh   |   59 +++++++++++++++++++++++++++
 contrib/jc-git-stat-1.sh       |   74 ++++++++++++++++++++++++++++++++++
 3 files changed, 220 insertions(+), 0 deletions(-)
 create mode 100755 contrib/jc-git-stat-1-log.perl
 create mode 100755 contrib/jc-git-stat-1-mof.sh
 create mode 100755 contrib/jc-git-stat-1.sh

applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
2cb3da4b260ed82dc379a11d91f55fe774a2ea49
diff --git a/contrib/jc-git-stat-1-log...
To: Junio C Hamano <junkio@...>
Cc: Ryan Anderson <ryan@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Saturday, November 12, 2005 - 8:19 am

Related to this, I've been wondering whether it'd be possible to teach
git to rebase local patches, even if that means rewriting local
history. When you are dealing with team shared repo, the sequences of
pull/push end up being quite messy, full of little meaningless merges.
Similarly, when dealing with an upstream, my tree gets slowly out of
sync and slightly messy. Eventually I get a new checkout, and rebase
any pending patches with git-format-patch and git-am.

The same process would be much easier if I could just cg-update from
the repo and get it to try and actually rebase my local commits --
rewriting history as if I had committed them after the update. Of
course, it'd be cheating... but we cheat all the time anyway, we only
sweat harder at it ;-)

cheers,


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: Junio C Hamano <junkio@...>, Ryan Anderson <ryan@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Sunday, November 13, 2005 - 7:11 am

Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter

I've been replying only to this, but missed the team shared repo case,

Well, only pulls make up for merges. But what's wrong with that? This is
just what you get when using distributed VCS, and what's the point in
skewing the history to look different than how did it happen? I'd just
get used to it. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Sunday, November 13, 2005 - 6:59 am

The key is not to let your tree go "slowly" out of sync, from my
experience.  When Linus was the maintainer, I used to do the
equivalent of the following all the time to keep up with his
tree while keeping my history clean [*1*].

Frequently [*2*], I tried to see if Linus made something new and
interesting.  My "origin" branch was always copy of Linus head.

	$ git fetch origin

The command would say "Fast forward".  So what did he do?

	$ git show-branch master origin
        ! [origin] Separate LDFLAGS and CFLAGS.
         * [master] Rename lost+found to lost-found.
        --
         + [master] Rename lost+found to lost-found.
         + [master^] Fix compilation warnings in pack-redundant.c
         + [master~2] Debian: build-depend on libexpat-dev.
        +  [origin] Separate LDFLAGS and CFLAGS.
        ++ [master~3] Split gitk into seperate RPM package

Ah, the commit master~3 was what he had the last time I pulled
from him, and since then he made a commit while I did three.  I
could do "git pull . origin" at this point, but that would
result in a useless mini-merge.  My tree is not public so I can
freely rebase to clean things up.

	$ git rebase origin
	$ git show-branch
        ! [origin] Separate LDFLAGS and CFLAGS.
         * [master] Rename lost+found to lost-found.
        --
         + [master] Rename lost+found to lost-found.
         + [master^] Fix compilation warnings in pack-redundant.c
         + [master~2] Debian: build-depend on libexpat-dev.
        ++ [origin] Separate LDFLAGS and CFLAGS.

Now I am fast-forward, so I could ask him to pull from me [*3*].

I think each of your developers can do the same, treating the
"project shared repository" as "Linus repository" and pull that
into the "origin" branch, and when the "master" is ready, push
it back into the shared repository (which is equivalent of Linus
pulling everything from me while doing nothing else in his
repository).

For a sizable change that deserves a topic branch with a long
seq...
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Sunday, November 13, 2005 - 4:42 pm

What happens if there are conflicts during git-rebase? I'm thinking of
adding an '-r' option to cg-update that will rebase instead of
merging, if the rebase is clean.

Is there a cheap way to ask from a shell script whether the merge is
truly trivial? I thought git-diff-tree would help me here, but it
doesn't...


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Sunday, November 13, 2005 - 11:33 pm

Well, obviously you could resolve them ;-).  But if you are
rebasing just to reduce trivial mini-merges, it might make more
sense to honestly record the merge if the rebase involves
conflict resolution.  After all, the reason rebase got conflicts
is because the development trail by somebody else that has been
already committed to the shared "master" branch overlapped what
you were doing in your "master" branch, isn't it?

In your message you indicated that you use "format-patch" piped
to "am".  I think that is a better approach than "rebase" these
days; the conflict can be handled easier with that approach, and
if you use "--3way" flag you do not even have to worry about
patches in your branch that is already there in the shared
"master" (your "origin") branch.

So instead of running "git rebase origin" at this point, I may
do something like this [*1*]:

	$ git-reset --hard origin
        $ git-format-patch -k --stdout origin ORIG_HEAD | git am -3 -k

The first step rewinds my "master" (the original is stored in
ORIG_HEAD), and the second step extracts the commits that were
in my master but not in origin in a patch form, an replay them
on top of the "master" (which was rewound to "origin").

"git-am" would stop at the first unapplicable patch if there is
a conflict, leaving the conflicting patch in .dotest/patch.
I have to fix it up before going further.  Here is how.

1. "git am" 3-way fallback would have kicked in, because I have
   all the blobs the patch is supposed to apply to, and my
   working tree and index is in a state just like when I am
   resolving a conflicting merge after a pull.  Clean up the
   conflict in the working tree, build-test and all as usual.

2. Run "git diff HEAD &gt;.dotest/patch" to record what the patch
   should have been if it were to apply cleanly on top of the
   previous state.  If I did a noteworthy adjustment to the
   patch, I might also edit .dotest/final-commit to update the
   commit log message.

3. Then reset the working tree...
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Monday, November 14, 2005 - 12:01 am

Hmmm. But doesn't deal well with binary changes. We deal with a large
set of projects, and while we don't manage that many binary files, it

Cool! Abusing that, perhaps I could teach git-rebase to take a
'--trivial-only' flag, and then cg-update --rebase could try
git-rebase --trivial-only, and fall back on cg-merge...

cheers,


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Monday, November 14, 2005 - 2:06 am

It shouldn't be too tricky to enhance "git am" (git-apply called
at around line 49 in it) to grok binary differences for this
purpose, because you would have both pre- and post-image blob in
your object database, because the patch is being used only to
replay what you have in your reository, and it records their
abbreviated SHA1 name.

I've never felt need to "merge" the binary files myself and had
never got around doing this, but if you are interested, it would
go something like this:

. Make "index %.7s..%.7s" that abbreviates pre- and post- image
  blob SHA1s in diff.c configurable to spit out full 40 bytes.
  Call that option --full-index-sha1.

. The updated "git rebase" that uses the "format-patch | am" I
  outlined would pass the --full-index-sha1 to format-patch
  (which is pased onto underlying diff-tree -p).

. In apply.c, check if all of the following holds:

    * we have both the full 40-byte old_sha1_prefix[] and
      new_sha1_prefix[]; and

    * what the index records matches old_sha1_prefix[]; and

    * the new blob is found in the object database;

  and for such a path:

    * change parse_chunk() not to barf even on a binary patch.

    * change apply_data() to just declare the patch application
      result is the new blob recorded in the patch.

Hmm.

-
To: Junio C Hamano <junkio@...>, Petr Baudis <pasky@...>
Cc: <git@...>
Date: Monday, November 14, 2005 - 4:51 am

I'm curious. What would be the advantages of this over git-read-tree
-m for use within a single repo? I keep thinking that I need an
intra-repo way of doing it (arguably faster and more reliable),
instead of git-format-patch|git-am, which is less reliable and slower.

OTOH,  if this is heading towards teaching git-am how to apply changes
to binary files based on known SHA1s, this will give birth to a type
of patch that applies only if you have the objects beforehand. Is that
enough to get by? Perhaps we need a format to fully describe binary

That's quite a bit of C hacking... I'm game for all the Perl and shell
scripts in git, but I know better than start learning C in _this_
project with you, Linus and the whole list watching me make the fool
;-)

So noone hacks this bit of C you are proposing I'll eventually get
something done with cg-update and git-rebase.

In the meantime, the user experience of working with a small team and
a shared repo has improved significantly by switching from cg-update
to cg-fetch &amp;&amp; git-rebase origin. So much so that I suspect that it'd
be a big win for cg-update to default to rebase on merges from
'origin'.

cheers,


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Monday, November 14, 2005 - 11:00 pm

I'll be pushing out something along the lines I said in the
proposed updates branch tonight.  I've even run 1 (one) test
;-).  The relevant commits are these:

+  [pu~5^2] rebase: make it usable for binary files as well.
+  [pu~5^2^] diff: --full-index
+  [pu~5^2~2] apply: allow-binary-replacement.
+  [pu~5^2~3] Rewrite rebase to use git-format-patch piped to git-am.

Note that this does not make generated "binary patch" usable
across repositories yet.  Since our patches are reversible out
of principle, making it usable across repositories involves
recording both the pre- and post-image of binary blobs in the
patch output, which may be a useful option in some cases but not
necessary for the immediate application of intra-repo rebasing.

-
To: Martin Langhoff <martin.langhoff@...>
Cc: Petr Baudis <pasky@...>, <git@...>
Date: Monday, November 14, 2005 - 5:27 am

On a large tree I had an impression that applying patch and then
falling back on 3-way is faster, but other than that, nothing,
really.  Just being able to use a single mechanism, which does

I'd rather not see our "patch" go in the direction of recording
both pre- and post-image of blob for binary files, which is what
we would end up doing if we really want to do binary flexibly.

Well, that may be nice as an option, but not by default.

An option halfway in between would be to record the pre-image
SHA1 and post- blob, perhaps compressed-uuencoded.  This would
limit us to the case that the recipient has not touched the
binary file and replacing it, but in practice that might be
enough.

I'm willing to do the C hackery myself if there is enough
interest in it.


-
To: Martin Langhoff <martin.langhoff@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, November 14, 2005 - 5:25 am

Dear diary, on Mon, Nov 14, 2005 at 09:51:29AM CET, I got a letter

I still don't understand what's the point. And it is confusing for
the user, since the history suddenly doesn't reflect how the GIT
development happened (which is what the history is about), and it is
downright deadly as soon as more than one branch gets around, which
makes it even more confusing for the user (you can do cg-update, yeah
- oh wait, unless you do X, then you need to remember to always do
cg-update -R or whatever).

All the woes just to get rid of the merge commits. What's wrong on merge
commits? If they irritate you so much, cg-log -M. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, November 14, 2005 - 5:25 pm

Well, if you have a team of 5 working closely, doing commit/update
several times a day, soon the following things happen:

 - everyone has sightly different histories, even if they all have the same tree
 - in the repo, there are 'update' merges galore.
 - as soon as you _actually_ have branches, it's really hard to
distinguish branch merges from 'same branch update before commit'
merges.

by replacing the daily/hourly intra-team, self-branch `cg-update` with
`cg-fetch &amp;&amp; git-rebase` our history makes much more sense _and_
looking at gitk I can see clearly again the interesting merges (ie:
merges with other branches).

In practice, it's exactly what happens with git's history -- the
merges are actually from rebased patches, that's why the history is
"readable".

does that make sense?

cheers,


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Saturday, November 12, 2005 - 3:04 pm

Hi,


I thought about this problem for a while. You described a typical use case 
(in a small team with one central repo), where pushes are done only from a 
cleaned up branch, but the work branches stay.

If the tree corresponding to the central HEAD matches the tree 
corresponding of your private merge branch at a given stage, a simple 
graft should be your solution.

Example: You pull and push from/to origin on a central server. Your 
(dirty) work branch is master. Now, at a given time, 
origin^{tree}==master~15^{tree}. You could then generate a graft which 
tells git that all parents of origin are also parents of master~15.

If at a given stage, origin^{tree}==master^{tree}, that graft would make 
your next merge a fast forward, effectively cleaning up your branch 
without loosing your history.

The graft could be generated by a simple script.

Would this help you?

Ciao,
Dscho

-
To: Martin Langhoff <martin.langhoff@...>
Cc: Junio C Hamano <junkio@...>, Ryan Anderson <ryan@...>, Linus Torvalds <torvalds@...>, <git@...>, <catalin.marinas@...>
Date: Saturday, November 12, 2005 - 8:53 am

Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter

I'm a bit reluctant about this functionality available in cg-update, but
then people will start to want commit stack and stuff, while they should
be just already long using StGIT for tracking their patches.

Actually, I wanted to also implement e-mail functionality to cg-mkpatch,
but I'm not sure now - perhaps people wanting that should really just
use StGIT. Cogito or GIT core is not very suitable for keeping your
patches against someone else's tree if he is not going to GIT-merge with
you, exactly because it's not really very convenient to update your
patches.

On the same note, I would like StGIT to drop functionality not really
belonging to patch stack manager (stg add, stg rm, stg status, ...) so
that its commandset gets smaller and more focused - but before I would
suggest dropping stg status, cg-status must be able to do conflicts
tracking, so I will dedicate another mail to this sometime in the
future, with a more detailed proposal.

So, is there any reason why you want this in GIT/Cogito and don't want
to use StGIT?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: Martin Langhoff <martin.langhoff@...>, Junio C Hamano <junkio@...>, Ryan Anderson <ryan@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Tuesday, November 15, 2005 - 6:04 am

This was the case with the first StGIT implementations but I slowly
began to want to only use StGIT and not switch to something else for
trivial SCM operations. I eventually added 'stg commit' which stores
the patches permanently into the base of the stack to enable some kind
of maintainer mode for StGIT. My main use for this was to import
patches directly into the main branch and not keep a separate one and

The gitmergeonefile.py script in StGIT adds every conflict to the
.git/conflicts file which is read by 'stg status'. My goal is not to
leave any unmerged entried in the index even if there are conflicts.
Maybe this could be changed and .git/conflicts file avoided entirely.

Anyway, while I'll try not to add more SCM functionality to StGIT, I
don't think I should remove the existing add/rm/status functionality.
It's just handy not to use a different command when you want a new
file added to a patch.

--
Catalin
-
To: Petr Baudis <pasky@...>
Cc: Catalin Marinas <catalin.marinas@...>, Martin Langhoff <martin.langhoff@...>, Junio C Hamano <junkio@...>, Ryan Anderson <ryan@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Tuesday, November 15, 2005 - 11:29 am

petr,

currently it isn't recommended to use StGIT with other porcelains.

so either: make it completely safe to use StGIT with other porcelains, 
or add a minimal amount of SCM-like functionality so users don't miss it 
and try to use other porcelains and trash their repositories.

overall i agree with catalin-- it's just easier to use a single tool.
To: Junio C Hamano <junkio@...>, Paul Mackerras <paulus@...>
Cc: Git Mailing List <git@...>
Date: Friday, November 11, 2005 - 6:53 pm

I actually don't think that is surprisingly high, and would actually have 
expected it to be closer to 50%.

On the other hand, the merges that end up being pure fast-forwards aren't 
counted as merges at all (since they don't show up as commits), so maybe 
that's what skews my preception of a big percentage of merges as being 

This is something where I think the kernel is perhaps unusual, especially 
for a big project. We really do encourage people to make lots of small and 
well-defined changes, and the whole flow of development has been geared 

I do think octopus is really cool, and still think seeing that five-way 
octopus-merge in gitk in the git history was really really cool.

It doesn't look as good any more, btw: do "gitk" on the current git tree, 
and search for "Octopus merge", and you'll see some of the history lines 
crossing each other. Paul?

But yeah, it's a pretty special thing. I think its coolness factor way 

Again, this is possibly because the kernel has already had a few years of 
distributed SCM usage under its belt, and we've tried to not only merge 
in a timely manner, but also try to keep history reasonably clean and not 
have a lot of cross-merging back and forth. That cuts down on multi-base 

I definitely think this is true for any big project.

Small projects will inevitably have changes that modify large portions of 
the source base. But with small projects, it doesn't really matter _what_ 
you do, you can do it fast.

Big projects (at least the sane kind) will never have lots of changes that 
modify a very big percentage of the source-tree. It's just too painful 

Well, it's self-re-inforcing. It was designed for the kernel usage 
patterns, so using the kernel to confirm that it's the "right design" is a 
bit self-serving. Sure, it's a good sign that my mental model of what the 
usage patters are does actually match reality, but at the same time it 
might be more interesting to see if other projects that use git end up 
using it the sa...
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Friday, November 11, 2005 - 8:42 pm

That is what I meant by "it does not show just what you saw".
It shows the whole community experience by everybody who has
commit there.  Being _the_ integration point, I imagine that
most of the pure fast-forwards you saw were real merges for
somebody else, and that merge being fast/safe/convenient

That is true.  My point was that it's not like git was done
right only for _you_, sacrificing subsystem people.  The sample
4k commits show that the assumption of the usage pattern git is
optimized for actually holds (commits being small, merges being
mostly trivial) for kernel people other than you.

It is yet to be seen if the same assumption holds for other
projects.

-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Tuesday, November 8, 2005 - 12:21 pm

This does two things:

 - It changes the hardcoded default merge strategy for two-head
   git-pull from resolve to recursive.

 - .git/config file acquires two configuration items.
   pull.twohead names the strategy for two-head case, and
   pull.octopus names the strategy for octopus merge.

IOW you are paranoid, you can have the following lines in your
.git/config file and keep using git-merge-resolve when pulling
one remote:

	[pull]
		twohead = resolve

OTOH, you can say this:

	[pull]
		twohead = resolve
		twohead = recursive

to try quicker resolve first, and when it fails, fall back to
recursive.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;

---

  Linus Torvalds &lt;torvalds@osdl.org&gt; writes:

  &gt; Hmm. True. The _really_ trivial in-index case triggers for me pretty 
  &gt; often, but I haven't done any statistics. It might be only 50% of the 
  &gt; time.
  &gt;...
  &gt; It's certainly an option to just do what I just did, namely use the 
  &gt; default one until it breaks, and then just do "git reset --hard" and re-do 
  &gt; the pull with "-s recursive". A bit sad, and it would be good to have 
  &gt; coverage on the recursive strategy..

  Hopefully something like this would make people aware of
  recursive and give it a wider coverage and chance to mature.

 git-pull.sh |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

applies-to: 75922cf23cc070e2d5220d961a8f645f1bc8bb60
3acc20beaf0df9ce11a1b7aabf8c9dc7507a9b44
diff --git a/git-pull.sh b/git-pull.sh
index 2358af6..3b875ad 100755
--- a/git-pull.sh
+++ b/git-pull.sh
@@ -79,10 +79,22 @@ case "$merge_head" in
 	exit 0
 	;;
 ?*' '?*)
-	strategy_default_args='-s octopus'
+	var=`git-var -l | sed -ne 's/^pull\.octopus=/-s /p'`
+	if test '' = "$var"
+	then
+		strategy_default_args='-s octopus'
+	else
+		strategy_default_args=$var
+	fi
 	;;
 *)
-	strategy_default_args='-s resolve'
+	var=`git-var -l | sed -ne 's/^pull\.twohead=/-s /p'`
+	if test '' = "$v...
To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, <git@...>, Fredrik Kuivinen <freku045@...>
Date: Tuesday, November 8, 2005 - 7:58 am

Hi,


IIRC recursive does nothing else than recursively merging the merge-bases 
(granted, in a clever way). So if there is only one merge-base, the only 
slow-down would be the startup of python (which is probably worth it, 

We already have a fallback list: after really-trivial, try automatic, ...,
try resolve. Why not just add recursive? So, if even resolve failed, just 
try once more, with recursive.

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Linus Torvalds <torvalds@...>, Junio C Hamano <junkio@...>, <git@...>, Fredrik Kuivinen <freku045@...>
Date: Tuesday, November 8, 2005 - 5:02 pm

I haven't done any real measurements but my feeling is that the
recursive strategy is at least not very much slower than the resolve
strategy.

In the single-common-ancestor case I can think of the following things
which may make a difference speed wise:

* The recursive strategy is written in Python
* The code for finding common ancestors is also written in Python and
  is probably a bit slower than git-merge-base.
* git-diff-tree -M --diff-filter=R &lt;common ancestor&gt; &lt;branch&gt; is
  executed twice, once for each branch.

On the positive side the code which corresponds to git-merge-one-file
in the git-resolve case is also written in python, we can therefore

I don't think this is a very good idea for two reasons. The first one
is that there are some merge scenarios involving renames which should
be conflicts but are cleanly merged by git-resolve.

The second reason is that with the fall back list the recursive
strategy will only be used in the strange corner cases and will thus
not get nearly the same amount of testing it would get if it was the
first choice (or directly after the really-trivial merge).

- Fredrik
-
To: Fredrik Kuivinen <freku045@...>
Cc: Linus Torvalds <torvalds@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 7:04 pm

Hi,


Two very valid points. You convinced me.

Ciao,
Dscho

-
To: Fredrik Kuivinen <freku045@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 5:52 pm

Btw, what part of git-merge-bases is it that makes it not be practical?

			Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Fredrik Kuivinen <freku045@...>, Johannes Schindelin <Johannes.Schindelin@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 6:36 pm

The problem is in the multiple-common-ancestors case. If we have three
common ancestors, A, B and C, we will start with merging A with B. The
result is a new 'virtual' commit object (not stored in the object
database), lets call it V. We are then going to merge V with C. To do
that we need to get the common ancestor(s) of V and C, and as V
doesn't exist in the database we can't use git-merge-base.

I haven't given it a lot of thought though, it might be possible to
use git-merge-base in some way and get the same results as we get now.

It would certainly be possible to use git-merge-base in the first
iteration and use the python code only when we actually have any
'virtual' commit objects.

- Fredrik
-
To: Fredrik Kuivinen <freku045@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 7:05 pm

Hmm. That's really the same as the merge-base of "C _and_ (A _or_ B)", 
isn't it?

So we should be able to do that even without ever seeing the virtual 
merge. In fact, it really is pretty trivial from a technical standpoint: 
the "A _or_ B" part is really just inserting both A and B with the same 
"flags" value (see the "merge_base()" function in merge-base.c).

So in general, "merge-base" could trivially be extended to have any number 
of "OR commits" on either side, as long as there is just one "and".

It could also be extended to have multiple "and" cases, but that has 
actually already been done by "git-show-branch", I think. It's all the 
same logic, except it uses more than just two bits.

So _technically_ it should be easy to do, it would just need some sane 

Just handling the first case specially would be sufficient for 99% of all 
uses.  And if the multi-parent cases are slightly slower, I don't think 
anybody cares.

In fact, we _always_ do the first git-merge-base in git-merge.sh anyway 
(actually, not git-merge-base, but "git-show-branch --merge-base"). So 
that's really done already for the "test if it's trivial" case..

Junio, that points out that "git-merge-base" is another program that could 
just be removed, since it's really supreceded by git-show-branch. Or did I 
miss something?


			Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Fredrik Kuivinen <freku045@...>, Johannes Schindelin <Johannes.Schindelin@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 8:32 pm

Dear diary, on Wed, Nov 09, 2005 at 12:05:43AM CET, I got a letter

Wow, I didn't know git-show-branch could do that (even though it's a bit
unnatural to expect this from command named this way; then again,
there's git-rev-parse...).

BTW, git-show-branch is also by orders of magnitude faster (not that
this would be any major timesaver). Median 0.006s vs. median 0.124s.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: Fredrik Kuivinen <freku045@...>, Johannes Schindelin <Johannes.Schindelin@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 8:51 pm

Ouch. That makes me suspicious. One reason git-merge-base is slow is 
because it's being pretty careful about some pathological examples of 
dates being just the wrong way around, and it might just be that the 
reason git-show-branch is faster is because it isn't doing that part 
right.

So yes, git-merge-base does extra work, but it does so because I think it 
needs to.

Junio? You even wrote the comment about the case in git-merge-base, I'm 
wondering whether it's a bug that we use the fast-and-cheap algorithm in 
git-show-branch..

Of course, arguably you can first try the fast-and-cheap thing, and if 
that gives a merge parent that is acceptable, why not? So maybe it's the 
right thing for the "let's see if this is trivial" case, but I think it 
might _think_ some cases are trivial that really shouldn't, because they 
actually have two merge parents.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Tuesday, November 8, 2005 - 8:59 pm

I did show-branch soon after we worked on those pathlogical
merge-base fix, so I would be a bit surprised if I did it
without using all the knowledge from that exercise, but I do not
remember offhand.  The core logic should be simple
generalization of two-head merge-base to N heads.

-
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Tuesday, November 8, 2005 - 9:22 pm

Hmm.

Look at the "join_revs()" logic, and tell me I'm crazy.

It does:

	struct commit *commit = pop_one_commit(list_p);
	int still_interesting = !!interesting(*list_p);

in that order: it looks whether there are any interesting commits left 
_after_ it has popped the top-of-stack.

Which means that "still_interesting" can go down to zero if we just popped 
the last interesting thing off the stack.

Which seems wrong, because the thing we just popped off the stack could 
easily itself be interesting (in fact, it should be so, 99% of the time), 
and can cause other interesting commits to be populated back onto the 
list. So the "still_interesting" flag seems to be wrongly computed: the 
way it is computed now, it's meaningless.

In contrast, the "merge_base()" thing does

	while (interesting(list)) {
		..
	}

which means that we really will walk the list until there is nothing 
interesting left. Which is admittedly expensive, but it was how we got rid 
of the pathological case.

But maybe I'm just missing something really subtle. Maybe git-show-branch 
does some really clever optimization that is valid.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Tuesday, November 8, 2005 - 9:42 pm

Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.  You are right.

The problem is most of the time hidden, because we usually do
one extra round (extra usually starts from 0 and we break out
after we say "not interesting anymore" and extra &lt; 0).

Obviously, I was not thinking clearly.

-
To: Linus Torvalds <torvalds@...>
Cc: Fredrik Kuivinen <freku045@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 7:18 pm

Hi,


IIRC, git-show-branch has a limit on the number of refs it can take.

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Fredrik Kuivinen <freku045@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Tuesday, November 8, 2005 - 8:18 pm

Well, git-merge-base does too. git-merge-base only takes two refs ;)

In general, you need to keep track of one bit per ref, and since we have 
a 32-bit "flags" word and need a couple of bits for other maintenance 
info, pretty much anything that figures out common heads will be limited 
some way. 

This is only a limit for the "and" logic - the "or" logic (if we implement 
it) will just share the same status bit for all the refs that are "ored 
together" and thus has no limits. 

Oh, and the "and" logic can be extended by running the program multiple 
times, so it's not a "hard" limit, it's just an issue of convenience.

That said, anybody who ever does an octopus of more than just a few heads 
deserves to be shot, so I don't think the limit should matter. The 
recursive strategy should only add the "or" kind of refs, and it 
shouldn't be a problem (apart from just how to describe them).

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Wednesday, November 9, 2005 - 2:10 am

Come to think of it, git-merge-octopus does AND.  If I am
merging topic branches 1, 2, 3,... N into my master, internally
it does an equivalent of merging 1 into master, then 2 into the
result, then C into that result,..., and it uses merge-base of
all the heads merged so far and the original master to pivot on.

And I think this is *wrong*.  The merge base of each step when
merging head N does not have to be older than merge base of the
original master and head N, but currently that is not what it
does.  I should be ORing them ideally, but even if I do not, I
should be able to just use the merge base of head N and original
master.



-
To: Fredrik Kuivinen <freku045@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Tuesday, November 8, 2005 - 5:47 pm

There are two reasons to avoid git-merge choose from more than
one strategy.

1. The whole idea that git-merge implements "goodness" metric is
   bogus.  It does not know what merge strategy is good and that
   is the reason it punts and has the user choose his preferred
   strategy.

2. When it is going to loop over more than one strategy, it
   stashes away the current working tree state, so that the
   second and subsequent strategies can begin from a clean slate
   (including local modifications since the current head).  If
   we try only one, there is no such cost involved.

I think the patch I sent out last night to change the recursive
as the default strategy and make it overridable from the
configuration mechanism would be a better way to give people
more exposure to the greatness of recursive while protecting
them from potential glitches if any.


-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>, Fredrik Kuivinen <freku045@...>
Date: Monday, November 7, 2005 - 8:59 pm

Sorry, that is not what I meant to say at all.

You used to micromanage, but it was _very_ good for git back
then.  I admit that only once I found you too picky and
difficult to work with while I was fixing a bad premature-free
bug in the diffcore-rename code, but overall your attention to
detail well paid off.

Since I inherited the project, we added quite a lot of stuff,
but I was still unsure if we are making good progress, or just
stagnating with only small enhancements and obvious fixes.

But Fredrik merge turns out to be a spectacular success as you
found out, which is a triumph for Fredrik, but at the same time
it means I was not doing too bad myself ;-).

-
To: Junio C Hamano <junkio@...>, Fredrik Kuivinen <freku045@...>
Cc: Git Mailing List <git@...>
Date: Monday, November 7, 2005 - 12:56 pm

Btw, one thing that it does is print out too much information.

In particular, I had renames on both sides of the merge (in case anybody 
wants to see which one I'm talking about: it's the current top-of-head 
commit in the kernel archives: 333c47c847c90aaefde8b593054d9344106333b5).

Now, renames that you've done yourself you really don't want to hear 
about, at least if the other side didn't change anything in that file.

Renames that the _other_ side has done (the one you're merging) you may or 
may not want to know about, regardless of whether they happened to files 
that are changed. But since "git pull" will do a "git-apply --stat" at the 
end and show the renames there, I'd argue that the merge strategy itself 
should be quiet about any renames that are trivial.

So how about talking about renames only if you end up also doing a 
file-level merge? As it is, doing the merge talked about renames that I 
had merged earlier in my own branch, which is just confusing.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, Fredrik Kuivinen <freku045@...>, Git Mailing List <git@...>
Date: Monday, November 7, 2005 - 7:19 pm

Sounds like a good idea. How about something like the following?

--

It isn't really interesting to know about the renames that have
already been committed to the branch you are working on. Furthermore,
the 'git-apply --stat' at the end of git-(merge|pull) will tell us
about any renames in the other branch.

With this commit only renames which require a file-level merge will
be printed.

Signed-off-by: Fredrik Kuivinen &lt;freku045@student.liu.se&gt;


---

 git-merge-recursive.py |   22 +++++++++++++++-------
 1 files changed, 15 insertions(+), 7 deletions(-)

applies-to: 5af1b5b93257ecfe993bb24975bf596faa342758
89c029b439603630a53ee4e4d0cb7931111afd2a
diff --git a/git-merge-recursive.py b/git-merge-recursive.py
index 626d854..9983cd9 100755
--- a/git-merge-recursive.py
+++ b/git-merge-recursive.py
@@ -162,10 +162,13 @@ def mergeTrees(head, merge, common, bran
 # Low level file merging, update and removal
 # ------------------------------------------
 
+MERGE_NONE = 0
+MERGE_TRIVIAL = 1
+MERGE_3WAY = 2
 def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode,
               branch1Name, branch2Name):
 
-    merge = False
+    merge = MERGE_NONE
     clean = True
 
     if stat.S_IFMT(aMode) != stat.S_IFMT(bMode):
@@ -178,7 +181,7 @@ def mergeFile(oPath, oSha, oMode, aPath,
             sha = bSha
     else:
         if aSha != oSha and bSha != oSha:
-            merge = True
+            merge = MERGE_TRIVIAL
 
         if aMode == oMode:
             mode = bMode
@@ -207,7 +210,8 @@ def mergeFile(oPath, oSha, oMode, aPath,
             os.unlink(orig)
             os.unlink(src1)
             os.unlink(src2)
-            
+
+            merge = MERGE_3WAY
             clean = (code == 0)
         else:
             assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode))
@@ -577,14 +581,16 @@ def processRenames(renamesA, renamesB, b
                 updateFile(False, ren1.dstSha, ren1.dstMode, dstName1)
                 updateFile(Fals...
To: Fredrik Kuivinen <freku045@...>
Cc: Linus Torvalds <torvalds@...>, Git Mailing List <git@...>
Date: Monday, November 7, 2005 - 7:54 pm

The rest looks good to me, but are you sure about this part?  I
have a feeling that the above "and" should be "or", meaning, we
check to see if there is _any_ change, and default to TRIVIAL,
but later we would find that we need a real merge and then
promote it to MERGE_3WAY.

-
To: Junio C Hamano <junkio@...>
Cc: Fredrik Kuivinen <freku045@...>, Linus Torvalds <torvalds@...>, Git Mailing List <git@...>
Date: Wednesday, November 9, 2005 - 6:36 am

You are right. The code actually do the right thing, but it does it by
accident. Please apply the following patch.

---

merge-recursive: Fix limited output of rename messages

The previous code did the right thing, but it did it by accident.

Signed-off-by: Fredrik Kuivinen &lt;freku045@student.liu.se&gt;


---

 git-merge-recursive.py |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

applies-to: bb7dd65e1d945edbe0137a761ebc388c7394067a
f56613498cd7fb7013f532a04e63b580314ed957
diff --git a/git-merge-recursive.py b/git-merge-recursive.py
index 9983cd9..3657875 100755
--- a/git-merge-recursive.py
+++ b/git-merge-recursive.py
@@ -162,13 +162,10 @@ def mergeTrees(head, merge, common, bran
 # Low level file merging, update and removal
 # ------------------------------------------
 
-MERGE_NONE = 0
-MERGE_TRIVIAL = 1
-MERGE_3WAY = 2
 def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode,
               branch1Name, branch2Name):
 
-    merge = MERGE_NONE
+    merge = False
     clean = True
 
     if stat.S_IFMT(aMode) != stat.S_IFMT(bMode):
@@ -181,7 +178,7 @@ def mergeFile(oPath, oSha, oMode, aPath,
             sha = bSha
     else:
         if aSha != oSha and bSha != oSha:
-            merge = MERGE_TRIVIAL
+            merge = True
 
         if aMode == oMode:
             mode = bMode
@@ -211,7 +208,6 @@ def mergeFile(oPath, oSha, oMode, aPath,
             os.unlink(src1)
             os.unlink(src2)
 
-            merge = MERGE_3WAY
             clean = (code == 0)
         else:
             assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode))
@@ -590,7 +586,7 @@ def processRenames(renamesA, renamesB, b
                 if merge or not clean:
                     print 'Renaming', fmtRename(path, ren1.dstName)
 
-                if merge == MERGE_3WAY:
+                if merge:
                     print 'Auto-merging', ren1.dstName
 
                 if not clean:
@@ -668,7 +664,7 @@ def processRenames...
Previous thread: Problem in update from the linux-2.6 repository by Marcel Holtmann on Monday, November 7, 2005 - 12:18 pm. (5 messages)

Next thread: [PATCH] Small bugfixes for http-push.c by Johannes Schindelin on Monday, November 7, 2005 - 2:23 pm. (1 message)
speck-geostationary