Guys, I just hit my first real rename conflict, and very timidly tried the "recursive" strategy in the hopes that I wouldn't need to do things by hand. It resolved things beautifully. Good job. My only worry is that I don't read python, so I don't really know how it does what it does, which makes me nervous. Can somebody (Fredrik?) add some documentation about the merge strategy and how it works. Considering that the stupid resolve strategy really requires you to know how git works when rename conflicts happen (things left in unmerged state are really quite hard to handle by hand unless you know exactly what you're doing), I'd almost suggest making "recursive" the default. I'm a bit nervous about it, but knowing how it works would probably put most of that to rest. Linus -
It would be great if the recursive strategy could get some more testing. I have tested it on a thousand commits or so in a few kernel repositories and haven't found any bugs, but it could be due to errors in the test setup, testing the wrong repositories or just being lucky. Some real-world testing would be great. - Fredrik -
This is the first time I see you pleased by something in git that was done without very close supervision from you. All the credits for this one goes to Fredrik, of course, but it is a small victory for me as the maintainer as well, and I am very Another thing to consider is if it is fast enough for everyday trivial merges. In any case, I've been thinking about teaching git-merge to look into .git/config to make it overridable which strategy to use by default. This would eliminate the hardcoded 'resolve for two-head, octopus for more' rule from git-pull. Then we could ship git-merge with the default rule of 'recursive for two-head, octopus for more', and if it turns out to be premature, you can update your config file to use resolve for two-head case while we sort things out. That way, recursive would get wider test coverage, and people who really need a working merge this minute can choose to run resolve in emergency without specifying '-s resolve' on the command line of 'git pull' every time. -
That sounds like a backhanded way of saying that I'm micromanagering, Hmm. True. The _really_ trivial in-index case triggers for me pretty often, but I haven't done any statistics. It might be only 50% of the time. Is the recursive thing noticeably slower for the "easy" cases (ie things that the old regular resolve strategy does well)? It's certainly an option to just do what I just did, namely use the default one until it breaks, and then just do "git reset --hard" and re-do the pull with "-s recursive". A bit sad, and it would be good to have coverage on the recursive strategy.. Linus -
Just for fun, I randomly picked two heads/master commits from
linux-2.6 repository (one was when I happened to have pulled the
last time, and the other was when I thought this might be an
interesting exercise and pulled again), and fed the commits
between the two to a little script that looks at commits and
tries to stat what they did (the script ignores renames so they
appear as deletes and adds).
Here is what the script spitted out:
Total commit objects: 3957
Trivial Merges: 72 (1.82%)
Merges: 225 (5.69%)
Number of paths touched by non-merge commits:
average 4.50, median 2, min 2, max 199
Number of merge parents:
average 2.00, median 2, min 2, max 2
Number of merge bases:
average 1.00, median 1, min 1, max 1
File level merges:
average 37.61, median 8, min 0, max 555
Number of changed paths from the first parent:
average 379.09, median 66, min 1, max 7553
File level 3-ways:
average 1.96, median 1, min 0, max 37
Paths deleted:
average 47.56, median 15, min 0, max 554
This counts what happened in individual devleoper's trees,
subsystem maintainer trees and your tree, not just what you saw
yourself.
Some observations.
- Trivial Merges count is surprisingly high. About 1/3 of
merges are pure in-index merges.
- Most of the commits (developer commits, not merges) are
small and touches only a couple of paths.
- Nobody does octopus ;-).
- We did not have multi-base merge case during the period
looked at (but the sample count is very low).
- merge-one-file was called for only a handful (median 8)
files, which is negligibly small compared to the total 17K
files in the kernel tree, and fairly small compared to the
number of changed paths from the first parent (meaning,
read-tree trivial collapsing helped majorly). Among them,
the number of path...Mind sharing the script? It'be nice to know if these stats are typical, or unusual when you get numbers from a variety of other trees.
Very unpolished but here they are.
I misread the trivial count in my original message. Trivial and
Merge are counted separately, so among 3957 commits, merges were
297 (72 trivials and 225 others).
-- >8 -- cut here -- >8 --
Subject: [PATCH] GIT commit statistics
A set of scripts that read the existing commit history, and
show various stats.
Sample usage:
# Arguments are given to git-rev-list; defaults to ORIG..
# if not given, to retrace what was just pulled.
$ ./contrib/jc-git-stat-1.sh v0.99.9g..maint |
./contrib/jc-git-stat-1-log.perl
Total commit objects: 43
Trivial Merges: 1 (2.33%)
Merges: 1 (2.33%)
Number of paths touched by non-merge commits:
average 3.00, median 2, min 2, max 18
Number of merge parents:
average 2.50, median 3, min 2, max 3
Number of merge bases:
average 1.00, median 1, min 1, max 1
File level merges:
average 0.50, median 1, min 0, max 1
Number of changed paths from the first parent:
average 28.00, median 52, min 4, max 52
File level 3-ways:
average 1.00, median 1, min 1, max 1
* "Trivial Merges" are the ones done by read-tree --trivial;
* "Merges" are other merges;
* "File level merges" are paths not collapsed by read-tree 3-way (i.e.
given to merge-one-file);
* "File level 3-ways" are paths merge-one-file would have run 'merge';
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
contrib/jc-git-stat-1-log.perl | 87 ++++++++++++++++++++++++++++++++++++++++
contrib/jc-git-stat-1-mof.sh | 59 +++++++++++++++++++++++++++
contrib/jc-git-stat-1.sh | 74 ++++++++++++++++++++++++++++++++++
3 files changed, 220 insertions(+), 0 deletions(-)
create mode 100755 contrib/jc-git-stat-1-log.perl
create mode 100755 contrib/jc-git-stat-1-mof.sh
create mode 100755 contrib/jc-git-stat-1.sh
applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
2cb3da4b260ed82dc379a11d91f55fe774a2ea49
diff --git a/contrib/jc-git-stat-1-log...Related to this, I've been wondering whether it'd be possible to teach git to rebase local patches, even if that means rewriting local history. When you are dealing with team shared repo, the sequences of pull/push end up being quite messy, full of little meaningless merges. Similarly, when dealing with an upstream, my tree gets slowly out of sync and slightly messy. Eventually I get a new checkout, and rebase any pending patches with git-format-patch and git-am. The same process would be much easier if I could just cg-update from the repo and get it to try and actually rebase my local commits -- rewriting history as if I had committed them after the update. Of course, it'd be cheating... but we cheat all the time anyway, we only sweat harder at it ;-) cheers, martin -
Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter I've been replying only to this, but missed the team shared repo case, Well, only pulls make up for merges. But what's wrong with that? This is just what you get when using distributed VCS, and what's the point in skewing the history to look different than how did it happen? I'd just get used to it. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
The key is not to let your tree go "slowly" out of sync, from my
experience. When Linus was the maintainer, I used to do the
equivalent of the following all the time to keep up with his
tree while keeping my history clean [*1*].
Frequently [*2*], I tried to see if Linus made something new and
interesting. My "origin" branch was always copy of Linus head.
$ git fetch origin
The command would say "Fast forward". So what did he do?
$ git show-branch master origin
! [origin] Separate LDFLAGS and CFLAGS.
* [master] Rename lost+found to lost-found.
--
+ [master] Rename lost+found to lost-found.
+ [master^] Fix compilation warnings in pack-redundant.c
+ [master~2] Debian: build-depend on libexpat-dev.
+ [origin] Separate LDFLAGS and CFLAGS.
++ [master~3] Split gitk into seperate RPM package
Ah, the commit master~3 was what he had the last time I pulled
from him, and since then he made a commit while I did three. I
could do "git pull . origin" at this point, but that would
result in a useless mini-merge. My tree is not public so I can
freely rebase to clean things up.
$ git rebase origin
$ git show-branch
! [origin] Separate LDFLAGS and CFLAGS.
* [master] Rename lost+found to lost-found.
--
+ [master] Rename lost+found to lost-found.
+ [master^] Fix compilation warnings in pack-redundant.c
+ [master~2] Debian: build-depend on libexpat-dev.
++ [origin] Separate LDFLAGS and CFLAGS.
Now I am fast-forward, so I could ask him to pull from me [*3*].
I think each of your developers can do the same, treating the
"project shared repository" as "Linus repository" and pull that
into the "origin" branch, and when the "master" is ready, push
it back into the shared repository (which is equivalent of Linus
pulling everything from me while doing nothing else in his
repository).
For a sizable change that deserves a topic branch with a long
seq...What happens if there are conflicts during git-rebase? I'm thinking of adding an '-r' option to cg-update that will rebase instead of merging, if the rebase is clean. Is there a cheap way to ask from a shell script whether the merge is truly trivial? I thought git-diff-tree would help me here, but it doesn't... martin -
Well, obviously you could resolve them ;-). But if you are
rebasing just to reduce trivial mini-merges, it might make more
sense to honestly record the merge if the rebase involves
conflict resolution. After all, the reason rebase got conflicts
is because the development trail by somebody else that has been
already committed to the shared "master" branch overlapped what
you were doing in your "master" branch, isn't it?
In your message you indicated that you use "format-patch" piped
to "am". I think that is a better approach than "rebase" these
days; the conflict can be handled easier with that approach, and
if you use "--3way" flag you do not even have to worry about
patches in your branch that is already there in the shared
"master" (your "origin") branch.
So instead of running "git rebase origin" at this point, I may
do something like this [*1*]:
$ git-reset --hard origin
$ git-format-patch -k --stdout origin ORIG_HEAD | git am -3 -k
The first step rewinds my "master" (the original is stored in
ORIG_HEAD), and the second step extracts the commits that were
in my master but not in origin in a patch form, an replay them
on top of the "master" (which was rewound to "origin").
"git-am" would stop at the first unapplicable patch if there is
a conflict, leaving the conflicting patch in .dotest/patch.
I have to fix it up before going further. Here is how.
1. "git am" 3-way fallback would have kicked in, because I have
all the blobs the patch is supposed to apply to, and my
working tree and index is in a state just like when I am
resolving a conflicting merge after a pull. Clean up the
conflict in the working tree, build-test and all as usual.
2. Run "git diff HEAD >.dotest/patch" to record what the patch
should have been if it were to apply cleanly on top of the
previous state. If I did a noteworthy adjustment to the
patch, I might also edit .dotest/final-commit to update the
commit log message.
3. Then reset the working tree...Hmmm. But doesn't deal well with binary changes. We deal with a large set of projects, and while we don't manage that many binary files, it Cool! Abusing that, perhaps I could teach git-rebase to take a '--trivial-only' flag, and then cg-update --rebase could try git-rebase --trivial-only, and fall back on cg-merge... cheers, martin -
It shouldn't be too tricky to enhance "git am" (git-apply called
at around line 49 in it) to grok binary differences for this
purpose, because you would have both pre- and post-image blob in
your object database, because the patch is being used only to
replay what you have in your reository, and it records their
abbreviated SHA1 name.
I've never felt need to "merge" the binary files myself and had
never got around doing this, but if you are interested, it would
go something like this:
. Make "index %.7s..%.7s" that abbreviates pre- and post- image
blob SHA1s in diff.c configurable to spit out full 40 bytes.
Call that option --full-index-sha1.
. The updated "git rebase" that uses the "format-patch | am" I
outlined would pass the --full-index-sha1 to format-patch
(which is pased onto underlying diff-tree -p).
. In apply.c, check if all of the following holds:
* we have both the full 40-byte old_sha1_prefix[] and
new_sha1_prefix[]; and
* what the index records matches old_sha1_prefix[]; and
* the new blob is found in the object database;
and for such a path:
* change parse_chunk() not to barf even on a binary patch.
* change apply_data() to just declare the patch application
result is the new blob recorded in the patch.
Hmm.
-I'm curious. What would be the advantages of this over git-read-tree -m for use within a single repo? I keep thinking that I need an intra-repo way of doing it (arguably faster and more reliable), instead of git-format-patch|git-am, which is less reliable and slower. OTOH, if this is heading towards teaching git-am how to apply changes to binary files based on known SHA1s, this will give birth to a type of patch that applies only if you have the objects beforehand. Is that enough to get by? Perhaps we need a format to fully describe binary That's quite a bit of C hacking... I'm game for all the Perl and shell scripts in git, but I know better than start learning C in _this_ project with you, Linus and the whole list watching me make the fool ;-) So noone hacks this bit of C you are proposing I'll eventually get something done with cg-update and git-rebase. In the meantime, the user experience of working with a small team and a shared repo has improved significantly by switching from cg-update to cg-fetch && git-rebase origin. So much so that I suspect that it'd be a big win for cg-update to default to rebase on merges from 'origin'. cheers, martin -
I'll be pushing out something along the lines I said in the proposed updates branch tonight. I've even run 1 (one) test ;-). The relevant commits are these: + [pu~5^2] rebase: make it usable for binary files as well. + [pu~5^2^] diff: --full-index + [pu~5^2~2] apply: allow-binary-replacement. + [pu~5^2~3] Rewrite rebase to use git-format-patch piped to git-am. Note that this does not make generated "binary patch" usable across repositories yet. Since our patches are reversible out of principle, making it usable across repositories involves recording both the pre- and post-image of binary blobs in the patch output, which may be a useful option in some cases but not necessary for the immediate application of intra-repo rebasing. -
On a large tree I had an impression that applying patch and then falling back on 3-way is faster, but other than that, nothing, really. Just being able to use a single mechanism, which does I'd rather not see our "patch" go in the direction of recording both pre- and post-image of blob for binary files, which is what we would end up doing if we really want to do binary flexibly. Well, that may be nice as an option, but not by default. An option halfway in between would be to record the pre-image SHA1 and post- blob, perhaps compressed-uuencoded. This would limit us to the case that the recipient has not touched the binary file and replacing it, but in practice that might be enough. I'm willing to do the C hackery myself if there is enough interest in it. -
Dear diary, on Mon, Nov 14, 2005 at 09:51:29AM CET, I got a letter I still don't understand what's the point. And it is confusing for the user, since the history suddenly doesn't reflect how the GIT development happened (which is what the history is about), and it is downright deadly as soon as more than one branch gets around, which makes it even more confusing for the user (you can do cg-update, yeah - oh wait, unless you do X, then you need to remember to always do cg-update -R or whatever). All the woes just to get rid of the merge commits. What's wrong on merge commits? If they irritate you so much, cg-log -M. ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
Well, if you have a team of 5 working closely, doing commit/update several times a day, soon the following things happen: - everyone has sightly different histories, even if they all have the same tree - in the repo, there are 'update' merges galore. - as soon as you _actually_ have branches, it's really hard to distinguish branch merges from 'same branch update before commit' merges. by replacing the daily/hourly intra-team, self-branch `cg-update` with `cg-fetch && git-rebase` our history makes much more sense _and_ looking at gitk I can see clearly again the interesting merges (ie: merges with other branches). In practice, it's exactly what happens with git's history -- the merges are actually from rebased patches, that's why the history is "readable". does that make sense? cheers, martin -
Hi,
I thought about this problem for a while. You described a typical use case
(in a small team with one central repo), where pushes are done only from a
cleaned up branch, but the work branches stay.
If the tree corresponding to the central HEAD matches the tree
corresponding of your private merge branch at a given stage, a simple
graft should be your solution.
Example: You pull and push from/to origin on a central server. Your
(dirty) work branch is master. Now, at a given time,
origin^{tree}==master~15^{tree}. You could then generate a graft which
tells git that all parents of origin are also parents of master~15.
If at a given stage, origin^{tree}==master^{tree}, that graft would make
your next merge a fast forward, effectively cleaning up your branch
without loosing your history.
The graft could be generated by a simple script.
Would this help you?
Ciao,
Dscho
-Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter I'm a bit reluctant about this functionality available in cg-update, but then people will start to want commit stack and stuff, while they should be just already long using StGIT for tracking their patches. Actually, I wanted to also implement e-mail functionality to cg-mkpatch, but I'm not sure now - perhaps people wanting that should really just use StGIT. Cogito or GIT core is not very suitable for keeping your patches against someone else's tree if he is not going to GIT-merge with you, exactly because it's not really very convenient to update your patches. On the same note, I would like StGIT to drop functionality not really belonging to patch stack manager (stg add, stg rm, stg status, ...) so that its commandset gets smaller and more focused - but before I would suggest dropping stg status, cg-status must be able to do conflicts tracking, so I will dedicate another mail to this sometime in the future, with a more detailed proposal. So, is there any reason why you want this in GIT/Cogito and don't want to use StGIT? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
This was the case with the first StGIT implementations but I slowly began to want to only use StGIT and not switch to something else for trivial SCM operations. I eventually added 'stg commit' which stores the patches permanently into the base of the stack to enable some kind of maintainer mode for StGIT. My main use for this was to import patches directly into the main branch and not keep a separate one and The gitmergeonefile.py script in StGIT adds every conflict to the .git/conflicts file which is read by 'stg status'. My goal is not to leave any unmerged entried in the index even if there are conflicts. Maybe this could be changed and .git/conflicts file avoided entirely. Anyway, while I'll try not to add more SCM functionality to StGIT, I don't think I should remove the existing add/rm/status functionality. It's just handy not to use a different command when you want a new file added to a patch. -- Catalin -
petr, currently it isn't recommended to use StGIT with other porcelains. so either: make it completely safe to use StGIT with other porcelains, or add a minimal amount of SCM-like functionality so users don't miss it and try to use other porcelains and trash their repositories. overall i agree with catalin-- it's just easier to use a single tool.
I actually don't think that is surprisingly high, and would actually have expected it to be closer to 50%. On the other hand, the merges that end up being pure fast-forwards aren't counted as merges at all (since they don't show up as commits), so maybe that's what skews my preception of a big percentage of merges as being This is something where I think the kernel is perhaps unusual, especially for a big project. We really do encourage people to make lots of small and well-defined changes, and the whole flow of development has been geared I do think octopus is really cool, and still think seeing that five-way octopus-merge in gitk in the git history was really really cool. It doesn't look as good any more, btw: do "gitk" on the current git tree, and search for "Octopus merge", and you'll see some of the history lines crossing each other. Paul? But yeah, it's a pretty special thing. I think its coolness factor way Again, this is possibly because the kernel has already had a few years of distributed SCM usage under its belt, and we've tried to not only merge in a timely manner, but also try to keep history reasonably clean and not have a lot of cross-merging back and forth. That cuts down on multi-base I definitely think this is true for any big project. Small projects will inevitably have changes that modify large portions of the source base. But with small projects, it doesn't really matter _what_ you do, you can do it fast. Big projects (at least the sane kind) will never have lots of changes that modify a very big percentage of the source-tree. It's just too painful Well, it's self-re-inforcing. It was designed for the kernel usage patterns, so using the kernel to confirm that it's the "right design" is a bit self-serving. Sure, it's a good sign that my mental model of what the usage patters are does actually match reality, but at the same time it might be more interesting to see if other projects that use git end up using it the sa...
That is what I meant by "it does not show just what you saw". It shows the whole community experience by everybody who has commit there. Being _the_ integration point, I imagine that most of the pure fast-forwards you saw were real merges for somebody else, and that merge being fast/safe/convenient That is true. My point was that it's not like git was done right only for _you_, sacrificing subsystem people. The sample 4k commits show that the assumption of the usage pattern git is optimized for actually holds (commits being small, merges being mostly trivial) for kernel people other than you. It is yet to be seen if the same assumption holds for other projects. -
This does two things: - It changes the hardcoded default merge strategy for two-head git-pull from resolve to recursive. - .git/config file acquires two configuration items. pull.twohead names the strategy for two-head case, and pull.octopus names the strategy for octopus merge. IOW you are paranoid, you can have the following lines in your .git/config file and keep using git-merge-resolve when pulling one remote: [pull] twohead = resolve OTOH, you can say this: [pull] twohead = resolve twohead = recursive to try quicker resolve first, and when it fails, fall back to recursive. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Linus Torvalds <torvalds@osdl.org> writes: > Hmm. True. The _really_ trivial in-index case triggers for me pretty > often, but I haven't done any statistics. It might be only 50% of the > time. >... > It's certainly an option to just do what I just did, namely use the > default one until it breaks, and then just do "git reset --hard" and re-do > the pull with "-s recursive". A bit sad, and it would be good to have > coverage on the recursive strategy.. Hopefully something like this would make people aware of recursive and give it a wider coverage and chance to mature. git-pull.sh | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) applies-to: 75922cf23cc070e2d5220d961a8f645f1bc8bb60 3acc20beaf0df9ce11a1b7aabf8c9dc7507a9b44 diff --git a/git-pull.sh b/git-pull.sh index 2358af6..3b875ad 100755 --- a/git-pull.sh +++ b/git-pull.sh @@ -79,10 +79,22 @@ case "$merge_head" in exit 0 ;; ?*' '?*) - strategy_default_args='-s octopus' + var=`git-var -l | sed -ne 's/^pull\.octopus=/-s /p'` + if test '' = "$var" + then + strategy_default_args='-s octopus' + else + strategy_default_args=$var + fi ;; *) - strategy_default_args='-s resolve' + var=`git-var -l | sed -ne 's/^pull\.twohead=/-s /p'` + if test '' = "$v...
Hi, IIRC recursive does nothing else than recursively merging the merge-bases (granted, in a clever way). So if there is only one merge-base, the only slow-down would be the startup of python (which is probably worth it, We already have a fallback list: after really-trivial, try automatic, ..., try resolve. Why not just add recursive? So, if even resolve failed, just try once more, with recursive. Ciao, Dscho -
I haven't done any real measurements but my feeling is that the recursive strategy is at least not very much slower than the resolve strategy. In the single-common-ancestor case I can think of the following things which may make a difference speed wise: * The recursive strategy is written in Python * The code for finding common ancestors is also written in Python and is probably a bit slower than git-merge-base. * git-diff-tree -M --diff-filter=R <common ancestor> <branch> is executed twice, once for each branch. On the positive side the code which corresponds to git-merge-one-file in the git-resolve case is also written in python, we can therefore I don't think this is a very good idea for two reasons. The first one is that there are some merge scenarios involving renames which should be conflicts but are cleanly merged by git-resolve. The second reason is that with the fall back list the recursive strategy will only be used in the strange corner cases and will thus not get nearly the same amount of testing it would get if it was the first choice (or directly after the really-trivial merge). - Fredrik -
Hi, Two very valid points. You convinced me. Ciao, Dscho -
Btw, what part of git-merge-bases is it that makes it not be practical? Linus -
The problem is in the multiple-common-ancestors case. If we have three common ancestors, A, B and C, we will start with merging A with B. The result is a new 'virtual' commit object (not stored in the object database), lets call it V. We are then going to merge V with C. To do that we need to get the common ancestor(s) of V and C, and as V doesn't exist in the database we can't use git-merge-base. I haven't given it a lot of thought though, it might be possible to use git-merge-base in some way and get the same results as we get now. It would certainly be possible to use git-merge-base in the first iteration and use the python code only when we actually have any 'virtual' commit objects. - Fredrik -
Hmm. That's really the same as the merge-base of "C _and_ (A _or_ B)", isn't it? So we should be able to do that even without ever seeing the virtual merge. In fact, it really is pretty trivial from a technical standpoint: the "A _or_ B" part is really just inserting both A and B with the same "flags" value (see the "merge_base()" function in merge-base.c). So in general, "merge-base" could trivially be extended to have any number of "OR commits" on either side, as long as there is just one "and". It could also be extended to have multiple "and" cases, but that has actually already been done by "git-show-branch", I think. It's all the same logic, except it uses more than just two bits. So _technically_ it should be easy to do, it would just need some sane Just handling the first case specially would be sufficient for 99% of all uses. And if the multi-parent cases are slightly slower, I don't think anybody cares. In fact, we _always_ do the first git-merge-base in git-merge.sh anyway (actually, not git-merge-base, but "git-show-branch --merge-base"). So that's really done already for the "test if it's trivial" case.. Junio, that points out that "git-merge-base" is another program that could just be removed, since it's really supreceded by git-show-branch. Or did I miss something? Linus -
Dear diary, on Wed, Nov 09, 2005 at 12:05:43AM CET, I got a letter Wow, I didn't know git-show-branch could do that (even though it's a bit unnatural to expect this from command named this way; then again, there's git-rev-parse...). BTW, git-show-branch is also by orders of magnitude faster (not that this would be any major timesaver). Median 0.006s vs. median 0.124s. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
Ouch. That makes me suspicious. One reason git-merge-base is slow is because it's being pretty careful about some pathological examples of dates being just the wrong way around, and it might just be that the reason git-show-branch is faster is because it isn't doing that part right. So yes, git-merge-base does extra work, but it does so because I think it needs to. Junio? You even wrote the comment about the case in git-merge-base, I'm wondering whether it's a bug that we use the fast-and-cheap algorithm in git-show-branch.. Of course, arguably you can first try the fast-and-cheap thing, and if that gives a merge parent that is acceptable, why not? So maybe it's the right thing for the "let's see if this is trivial" case, but I think it might _think_ some cases are trivial that really shouldn't, because they actually have two merge parents. Linus -
I did show-branch soon after we worked on those pathlogical merge-base fix, so I would be a bit surprised if I did it without using all the knowledge from that exercise, but I do not remember offhand. The core logic should be simple generalization of two-head merge-base to N heads. -
Hmm.
Look at the "join_revs()" logic, and tell me I'm crazy.
It does:
struct commit *commit = pop_one_commit(list_p);
int still_interesting = !!interesting(*list_p);
in that order: it looks whether there are any interesting commits left
_after_ it has popped the top-of-stack.
Which means that "still_interesting" can go down to zero if we just popped
the last interesting thing off the stack.
Which seems wrong, because the thing we just popped off the stack could
easily itself be interesting (in fact, it should be so, 99% of the time),
and can cause other interesting commits to be populated back onto the
list. So the "still_interesting" flag seems to be wrongly computed: the
way it is computed now, it's meaningless.
In contrast, the "merge_base()" thing does
while (interesting(list)) {
..
}
which means that we really will walk the list until there is nothing
interesting left. Which is admittedly expensive, but it was how we got rid
of the pathological case.
But maybe I'm just missing something really subtle. Maybe git-show-branch
does some really clever optimization that is valid.
Linus
-Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh. You are right. The problem is most of the time hidden, because we usually do one extra round (extra usually starts from 0 and we break out after we say "not interesting anymore" and extra < 0). Obviously, I was not thinking clearly. -
Hi, IIRC, git-show-branch has a limit on the number of refs it can take. Ciao, Dscho -
Well, git-merge-base does too. git-merge-base only takes two refs ;) In general, you need to keep track of one bit per ref, and since we have a 32-bit "flags" word and need a couple of bits for other maintenance info, pretty much anything that figures out common heads will be limited some way. This is only a limit for the "and" logic - the "or" logic (if we implement it) will just share the same status bit for all the refs that are "ored together" and thus has no limits. Oh, and the "and" logic can be extended by running the program multiple times, so it's not a "hard" limit, it's just an issue of convenience. That said, anybody who ever does an octopus of more than just a few heads deserves to be shot, so I don't think the limit should matter. The recursive strategy should only add the "or" kind of refs, and it shouldn't be a problem (apart from just how to describe them). Linus -
Come to think of it, git-merge-octopus does AND. If I am merging topic branches 1, 2, 3,... N into my master, internally it does an equivalent of merging 1 into master, then 2 into the result, then C into that result,..., and it uses merge-base of all the heads merged so far and the original master to pivot on. And I think this is *wrong*. The merge base of each step when merging head N does not have to be older than merge base of the original master and head N, but currently that is not what it does. I should be ORing them ideally, but even if I do not, I should be able to just use the merge base of head N and original master. -
There are two reasons to avoid git-merge choose from more than one strategy. 1. The whole idea that git-merge implements "goodness" metric is bogus. It does not know what merge strategy is good and that is the reason it punts and has the user choose his preferred strategy. 2. When it is going to loop over more than one strategy, it stashes away the current working tree state, so that the second and subsequent strategies can begin from a clean slate (including local modifications since the current head). If we try only one, there is no such cost involved. I think the patch I sent out last night to change the recursive as the default strategy and make it overridable from the configuration mechanism would be a better way to give people more exposure to the greatness of recursive while protecting them from potential glitches if any. -
Sorry, that is not what I meant to say at all. You used to micromanage, but it was _very_ good for git back then. I admit that only once I found you too picky and difficult to work with while I was fixing a bad premature-free bug in the diffcore-rename code, but overall your attention to detail well paid off. Since I inherited the project, we added quite a lot of stuff, but I was still unsure if we are making good progress, or just stagnating with only small enhancements and obvious fixes. But Fredrik merge turns out to be a spectacular success as you found out, which is a triumph for Fredrik, but at the same time it means I was not doing too bad myself ;-). -
Btw, one thing that it does is print out too much information. In particular, I had renames on both sides of the merge (in case anybody wants to see which one I'm talking about: it's the current top-of-head commit in the kernel archives: 333c47c847c90aaefde8b593054d9344106333b5). Now, renames that you've done yourself you really don't want to hear about, at least if the other side didn't change anything in that file. Renames that the _other_ side has done (the one you're merging) you may or may not want to know about, regardless of whether they happened to files that are changed. But since "git pull" will do a "git-apply --stat" at the end and show the renames there, I'd argue that the merge strategy itself should be quiet about any renames that are trivial. So how about talking about renames only if you end up also doing a file-level merge? As it is, doing the merge talked about renames that I had merged earlier in my own branch, which is just confusing. Linus -
Sounds like a good idea. How about something like the following?
--
It isn't really interesting to know about the renames that have
already been committed to the branch you are working on. Furthermore,
the 'git-apply --stat' at the end of git-(merge|pull) will tell us
about any renames in the other branch.
With this commit only renames which require a file-level merge will
be printed.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
---
git-merge-recursive.py | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)
applies-to: 5af1b5b93257ecfe993bb24975bf596faa342758
89c029b439603630a53ee4e4d0cb7931111afd2a
diff --git a/git-merge-recursive.py b/git-merge-recursive.py
index 626d854..9983cd9 100755
--- a/git-merge-recursive.py
+++ b/git-merge-recursive.py
@@ -162,10 +162,13 @@ def mergeTrees(head, merge, common, bran
# Low level file merging, update and removal
# ------------------------------------------
+MERGE_NONE = 0
+MERGE_TRIVIAL = 1
+MERGE_3WAY = 2
def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode,
branch1Name, branch2Name):
- merge = False
+ merge = MERGE_NONE
clean = True
if stat.S_IFMT(aMode) != stat.S_IFMT(bMode):
@@ -178,7 +181,7 @@ def mergeFile(oPath, oSha, oMode, aPath,
sha = bSha
else:
if aSha != oSha and bSha != oSha:
- merge = True
+ merge = MERGE_TRIVIAL
if aMode == oMode:
mode = bMode
@@ -207,7 +210,8 @@ def mergeFile(oPath, oSha, oMode, aPath,
os.unlink(orig)
os.unlink(src1)
os.unlink(src2)
-
+
+ merge = MERGE_3WAY
clean = (code == 0)
else:
assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode))
@@ -577,14 +581,16 @@ def processRenames(renamesA, renamesB, b
updateFile(False, ren1.dstSha, ren1.dstMode, dstName1)
updateFile(Fals...The rest looks good to me, but are you sure about this part? I have a feeling that the above "and" should be "or", meaning, we check to see if there is _any_ change, and default to TRIVIAL, but later we would find that we need a real merge and then promote it to MERGE_3WAY. -
You are right. The code actually do the right thing, but it does it by
accident. Please apply the following patch.
---
merge-recursive: Fix limited output of rename messages
The previous code did the right thing, but it did it by accident.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
---
git-merge-recursive.py | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
applies-to: bb7dd65e1d945edbe0137a761ebc388c7394067a
f56613498cd7fb7013f532a04e63b580314ed957
diff --git a/git-merge-recursive.py b/git-merge-recursive.py
index 9983cd9..3657875 100755
--- a/git-merge-recursive.py
+++ b/git-merge-recursive.py
@@ -162,13 +162,10 @@ def mergeTrees(head, merge, common, bran
# Low level file merging, update and removal
# ------------------------------------------
-MERGE_NONE = 0
-MERGE_TRIVIAL = 1
-MERGE_3WAY = 2
def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode,
branch1Name, branch2Name):
- merge = MERGE_NONE
+ merge = False
clean = True
if stat.S_IFMT(aMode) != stat.S_IFMT(bMode):
@@ -181,7 +178,7 @@ def mergeFile(oPath, oSha, oMode, aPath,
sha = bSha
else:
if aSha != oSha and bSha != oSha:
- merge = MERGE_TRIVIAL
+ merge = True
if aMode == oMode:
mode = bMode
@@ -211,7 +208,6 @@ def mergeFile(oPath, oSha, oMode, aPath,
os.unlink(src1)
os.unlink(src2)
- merge = MERGE_3WAY
clean = (code == 0)
else:
assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode))
@@ -590,7 +586,7 @@ def processRenames(renamesA, renamesB, b
if merge or not clean:
print 'Renaming', fmtRename(path, ren1.dstName)
- if merge == MERGE_3WAY:
+ if merge:
print 'Auto-merging', ren1.dstName
if not clean:
@@ -668,7 +664,7 @@ def processRenames...
| kernel module to intercept socket creation | 10 minutes ago | Linux kernel |
| Image size changing during each build | 34 minutes ago | Linux kernel |
| Creating a device from a kernel module (mknod style) | 1 hour ago | Linux kernel |
| Soft lock bug | 5 hours ago | Linux kernel |
| sysctl - dynamic registration problem | 11 hours ago | Linux kernel |
| Question on swap as ramdisk partition | 14 hours ago | Linux kernel |
| serial driver xmit problem | 18 hours ago | Linux kernel |
| Generic Netlink subsytem | 19 hours ago | Linux kernel |
| 'Report spam filter error' page broken | 21 hours ago | KernelTrap Suggestions and Feedback |
| Netfilter kernel module | 1 day ago | Linux kernel |
