Re: inexplicable failure to merge recursively across cherry-picks

Previous thread: [PATCH 4/6] manual: add some markup. by Ralf Wildenhues on Tuesday, October 9, 2007 - 2:03 pm. (10 messages)

Next thread: stg branch command failures. by Aneesh Kumar on Tuesday, October 9, 2007 - 8:31 pm. (3 messages)
From: martin f krafft
Date: Tuesday, October 9, 2007 - 6:55 pm

Hi folks,

I hope this is not a daily series of mine, being confused about Git
merging, but I've run my head against a wall again and before
I crush my skull, I'd prefer to reach out to you to help me regain
an understanding.

This is about the new mdadm for Debian packaging effort. You can
clone from git://git.debian.org/git/pkg-mdadm/mdadm-new.git and see
the repo at
http://git.debian.org/?p=3Dpkg-mdadm/mdadm-new.git;a=3Dsummary. Do not
track as this repo is subject to change.

My master branch was last merged with upstream's mdadm-2.6.2 tag
(commit 263a535). Since then, I've committed a couple of changes to
master including three cherry-picks from upstream since mdadm-2.6.2.

I tagged upstream at the point which I want to merge into master:
mdadm-2.6.3+200709292116+4450e59. When I merge that tag into master,
I get a merge conflict on Monitor.c:

  <<<<<<< HEAD:Monitor.c
                                  if (mse->devnum !=3D MAXINT &&
  =3D=3D=3D=3D=3D=3D=3D
                                  if (mse->devnum !=3D INT_MAX &&
  >>>>>>> upstream:Monitor.c

There are five commits between mdadm-2.6.2 and
mdadm-2.6.3+200709292116+4450e59 that affect Monitor.c:

  01d9299
  e4dc510
* 66f8bbb
  98127a6
  4450e59

The third commit, the one with the asterisk is the one that
I cherry-picked (as 845eef9); the other two cherries I picked do not
touch Monitor.c.

The fifth/last commit (4450e59) is the one responsible for the
change which seems to cause the conflict. It is the *only* commit
since the common ancestor of *both* branches that touches the
conflicting lines.

The fourth commit (98127a6) inserts a single line at the top of the
file, so that's nothing that would cause a conflict.

To be honest, I can't explain it. But I didn't give up.

I branched master2 off 845eef9b~1, cherry-picked the first two
commits that touch Monitor.c, cherry-picked all the commits
845eef9b..master into master2 and merge upstream...

=2E.. to get exactly the same conflict in exactly ...
From: Linus Torvalds
Date: Tuesday, October 9, 2007 - 7:54 pm

Side note - run

	gitk --merge

when you have a merge conflict, and it will basically show you the thing 
graphically (ie history as it is relevant to the merge, and only to the 
files that get conflicts).

But basically, both sides have modified the code *around* that line, and 
they have modified it differently.

Do this in your partial merge tree on 'master':

	git diff ...mdadm-2.6.3+200709292116+4450e59 Monitor.c
	git diff mdadm-2.6.3+200709292116+4450e59... Monitor.c

which will show you the diff from the common base ancestor. And in 
particular, it will show how one branch did this:

	@@ -399,9 +401,8 @@ int Monitor(mddev_dev_t devlist,
	                        struct mdstat_ent *mse;
	                        for (mse=mdstat; mse; mse=mse->next)
	                                if (mse->devnum != MAXINT &&
	-                                   (strcmp(mse->level, "raid1")==0 ||
	-                                    strcmp(mse->level, "raid5")==0 ||
	-                                    strcmp(mse->level, "multipath")==0)
	+                                   (strcmp(mse->level, "raid0")!=0 &&
	+                                    strcmp(mse->level, "linear")!=0)
	                                        ) {
	                                        struct state *st = malloc(sizeof *st);
	                                        mdu_array_info_t array;

and the other one did

	@@ -398,10 +402,9 @@ int Monitor(mddev_dev_t devlist,
	                if (scan) {
	                        struct mdstat_ent *mse;
	                        for (mse=mdstat; mse; mse=mse->next)
	-                               if (mse->devnum != MAXINT &&
	-                                   (strcmp(mse->level, "raid1")==0 ||
	-                                    strcmp(mse->level, "raid5")==0 ||
	-                                    strcmp(mse->level, "multipath")==0)
	+                               if (mse->devnum != INT_MAX &&
	+                                   (strcmp(mse->level, ...
From: martin f krafft
Date: Wednesday, October 10, 2007 - 3:25 am

also sprach Linus Torvalds <torvalds@linux-foundation.org> [2007.10.10.0354=

This is the part I over-estimated. I thought that Git would figure
out that commits 1-3 had been merged into the target and thus apply,
in sequence, only the commits from the source which had not been
merged.

Many thanks (again), Linus! Looking forward to your next content
manager; you know, the one with artificial intelligence built in!
You could call it "wit" :)

--=20
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
=20
dies ist eine manuell generierte email. sie beinhaltet
tippfehler und ist auch ohne gro=DFbuchstaben g=FCltig.
=20
spamtraps: madduck.bogus@madduck.net
From: David Kastrup
Date: Wednesday, October 10, 2007 - 3:33 am

Well, there is also an obvious name choice when the distinguishing
innovation is a well-rounded feature set, but it would cause a name
collision for the equivalent of "tig".

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Linus Torvalds
Date: Wednesday, October 10, 2007 - 8:25 am

Yes, *some* SCM's have tried to do that. In particular, the ones that are 
"patch-based" tend to think that patches are "identical" regardless of 
where they are, and while re-ordering of them is a special event, it's not 
somethign that changes the fundamental 'ID' of the patch.

For example, I think the darcs "patch algebra" works that way.

It's a really horrible model. Not only doesn't it scale, but it leads to 
various very strange linkages between patches, and it fails the most 
important part: it means that merges get different results just because 

Well, the git model is really largely the reverse: the system is supposed 
to be as *stupid* as humanly possible, but:

 - make it predictable exactly because it's stupid and doesn't do anything 
   even half-ways smart.

   This is part of the "it doesn't matter *how* you got to a particular 
   state, git will always do the same thing regardless of whether you 
   moved an existing patch around or whether you re-did the changes as 
   (possibly more than one) new and unrelated commits".

 - conflicts aren't bad - they're *good*. Trying to aggressively resolve 
   them automatically when two branches have done slightly different 
   things in the same area is stupid and just results in more problems.

   Instead, git tries to do what I don't think *anybody* else has done: 
   make the conflicts easy to resolve, by allowing you to work with them 
   in your normal working tree, and still giving you a lot of tools to 
   help you see what's going on.

So git doesn't try to avoid conflicts per se: the merge strategies are 
fundamentally pretty simple (rename detection and the whole "recursive 
merge" thing may not be simple code, but the concepts are pretty 
straightforward), and they handle all the really *obvious* cases, but at 
the same time, I feel strongly that anything even half-way subtle should 
not be left to the SCM - the SCM should show it and make it really easy 
for the user to then fix it up.

Side ...
From: David Brown
Date: Wednesday, October 10, 2007 - 8:48 am

Actually, specifically darcs, different merges _always_ result in the same
data.  It's a fundamental part of is patch algebra.  No matter what order
you apply a given set of patches, even with conflicts and reordering, you
always get the same result, or no result.  Conflicts are "resolved" by
inserting conflict markers in the file, ordered by the patch ID.  It
doesn't matter which order you apply them in, you get the same markers.
Then there will be a merge patch which fixes the markers that someone could
apply, no matter what order the applied the previous patches.

Darcs breaks down in a few places, though.

   - The no result.  Sometimes, it just can't figure out how to reorder
     patches.  Even worse, occasionally, the implementation will fail to
     terminate try to figure this out.  There isn't much to do at this
     point, except manually apply the patch, hence generating a new patch
     ID.

   - It doesn't scale well.

The strange linkages between patches could be thought of as a feature,
since it is basically constraining the order that the patches can be
applied in.

There is a darcs-git project that tries to do the darcs things on top of
git.

Dave
-

From: Miklos Vajna
Date: Wednesday, October 10, 2007 - 12:07 pm

thanks,
- VMiklos
From: Linus Torvalds
Date: Wednesday, October 10, 2007 - 12:35 pm

No they don't. You don't understand the problem.

Yes, different merges WITH THE SAME PATCHES always result in the same 
data.

But that's not a realistic - or even very interesting - schenario.

What's much more common is that the same problem gets solved slightly 
differently in two different branches. For example, maybe somebody does it 
as two different patches - where the second one fixes a bug in the first 
fix. And another person does the same fix, but without the bug in the 
first place.

See? A patch-based system gets confused by those kinds of issues (or they 
turn into various special cases). And that is fundamentally why you MUST 
NOT take history into account (where "history" is some series of 
individual patches).

Yes, history is interesting for historical reasons, and to explain what 
the context was, but in many ways, history is exactly the *wrong* thing to 
use when it comes to merging. You should look at the end result, since 
people can - and do - come to the same result through different ways.

			Linus
-

From: Miklos Vajna
Date: Wednesday, October 10, 2007 - 5:08 pm

[ ehh, sorry for my previous mail, i must be doing something wrong.. ]


actually it's broken, according to its author:

http://www.mail-archive.com/darcs-users@darcs.net/msg03161.html

though i loved darcs, but yes - it scales horribly

- VMiklos
From: Sam Vilain
Date: Thursday, October 11, 2007 - 2:51 pm

This is true.  However I think there are some obvious places for
improvement that does look at the file history, when the regular
algorithm fails;

1. do a --cherry-pick rev-list on just the file being merged and see if
all the changes on one side disappear, in which case just take the result.

2. see if the files were identical at some point, in which case use a
new merge base for that file based on the changes since that revision.

I actually thought #2 was already the way recursive worked!

Sam.
-

From: Linus Torvalds
Date: Thursday, October 11, 2007 - 3:33 pm

Actually, I think both of these are fundamentally wrong.

The reason is that you talk about "the file".

Anything that is based on per-file heuristics is going to mean that you 
use a history that is not necessarily compatible with the _other_ files in 
the project.

I agree 100% that per-file history information is going to find more 
things to merge automatically. But the point I was trying to make was that 
"automatic merges" aren't always *good*. 

I realize that pretty much every single theoretical merge algorithm out 
there tries to make merges happen automatically as much as possible, but 
they all en dup having strange issues.

For example, take a patch that cherry-picked into mainline from a 
development branch, but that partly depended on some support-feature that 
wasn't in mainline yet. So then there is another patch that removes part 
of that patch from mainline. So mainline is fine.

Now, three months later, the development branch is stable, and is fully 
merged. What happens?

Git will largely get this right. Git will look at the last *global* common 
base, and will just look at the contents, and do a reasonable job. Yes, 
there will probably be conflicts (because both the development branch and 
the mainline ended up touching the same parts of the files thanks to the 
cherry-pick, and yet mainline has some added hacks on top to disable it), 
but on the whole that's exactly what you want!

(Alternatively, maybe the "remove the part that wasn't supported yet" 
ended up meaning that that particular part of the patch was excised 
entirely from mainline, and there was no conflict at all, and git just 
merged the new stuff from the development branch cleanly! So I'm not 
saying that it *has* to conflict, I'm just saying that it might have).

In other words: git always "does the right thing". Assuming both branches 
are stable and working, git does a very reasonable thing. It's obviously 
not always the thing people may *want*, but it's guaranteed to be a ...
Previous thread: [PATCH 4/6] manual: add some markup. by Ralf Wildenhues on Tuesday, October 9, 2007 - 2:03 pm. (10 messages)

Next thread: stg branch command failures. by Aneesh Kumar on Tuesday, October 9, 2007 - 8:31 pm. (3 messages)