Re: [IRC/patches] Failed octopus merge does not clean up

Previous thread: Management of opendocument (openoffice.org) files in git by Sergio Callegari on Monday, September 15, 2008 - 3:40 pm. (10 messages)

Next thread: Re: [PATCH] Documentation: warn against merging in a dirty tree by Junio C Hamano on Monday, September 15, 2008 - 4:42 pm. (5 messages)
From: Thomas Rast
Date: Monday, September 15, 2008 - 3:48 pm

Hi *

James "jammyd" Mulcahy pointed out on IRC that the octopus merge
strategy doesn't properly clean up behind itself.  To wit:

  git init
  echo initial > foo
  git add foo
  git commit -m initial
  echo a > foo
  git commit -m a foo
  git checkout -b b HEAD^
  echo b > foo
  git commit -m b foo
  git checkout -b c HEAD^
  echo c > foo
  git commit -m c foo
  git checkout master
  git merge b c

The merge says

  Trying simple merge with 5b3e4bb1c2d88d6967fb575729fbfc86df5eaec9
  Simple merge did not work, trying automatic merge.
  Auto-merging foo
  ERROR: Merge conflict in foo
  fatal: merge program failed
  Automated merge did not work.
  Should not be doing an Octopus.
  Merge with strategy octopus failed.

So far so good.  However, 'git status' claims

  #       unmerged:   foo

and indeed the contents of 'foo' are the conflicted merge between
'master' and 'b', yet there is no .git/MERGE_HEAD.  This behaviour is
identical for 1.5.6 and 1.6.0.2, so it is not caused by the merge
rewrite as a builtin.  Shouldn't it either really clean up, or really
leave the repo in a conflicted merge state?  (I'm following up with a
patch that turns the above into a test.  Octopus doesn't really have
many tests, does it?)

On the code path to the "Merge with strategy %s failed" message,
builtin-merge.c:1134 runs restore_state() which runs reset_hard().
But (as Miklos pointed out) that cannot actually do 'git reset --hard'
because it is possible (though not recommended, see below) to start a
merge with a dirty index.

Jakub mentioned that there are only three index stages for a three-way
merge, so a conflicted n-way (simultaneous) merge is not really
possible.

The merge manpage should warn about merging with uncommitted changes.
It recommends 'git rebase --hard' to abort during conflicts, but does
not mention that this throws away said changes.  I'm following up with
a patch for this.

=2D Thomas

=2D-=20
Thomas Rast
trast@student.ethz.ch


From: Thomas Rast
Date: Monday, September 15, 2008 - 3:49 pm

Currently, a conflicted octopus merge leaves the repository in a
"halfway into the merge state", where .git/MERGE_HEAD is missing but
the files are listed as conflicted in the index and have conflict
markers in them.  It should either clean up everything, or add a
MERGE_HEAD.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---
 t/t7607-merge-octopus-cleanup.sh |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/t/t7607-merge-octopus-cleanup.sh b/t/t7607-merge-octopus-cleanup.sh
new file mode 100755
index 0000000..4d82867
--- /dev/null
+++ b/t/t7607-merge-octopus-cleanup.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='git merge
+
+Testing that conflicting octopus merge fails cleanly.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo initial > foo &&
+	git add foo &&
+	git commit -m initial &&
+	echo a > foo &&
+	git commit -m a foo &&
+	git checkout -b b HEAD^ &&
+	echo b > foo &&
+	git commit -m b foo &&
+	git checkout -b c HEAD^ &&
+	echo c > foo &&
+	git commit -m c foo &&
+	git checkout master
+'
+
+test_expect_failure 'conflicted octopus merge' '
+	test_must_fail git merge b c &&
+	test -z "$(git ls-files --unmerged)" &&
+	test "$(cat foo)" == a &&
+	test ! -f .git/MERGE_HEAD
+'
+
+test_done
-- 
tg: (1293c95..) t/test-octopus-cleanup (depends on: origin/master)
--

From: Thomas Rast
Date: Monday, September 15, 2008 - 3:49 pm

Merging in a dirty tree is usually a bad idea because you need to
reset --hard to abort; but the docs didn't say anything about it.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---
 Documentation/git-merge.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index 685e1fe..3798e16 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -22,6 +22,11 @@ The second syntax (<msg> `HEAD` <remote>) is supported for
 historical reasons.  Do not use it from the command line or in
 new scripts.  It is the same as `git merge -m <msg> <remote>`.
 
+NOTE: Usually it is a bad idea to merge with a dirty tree or index.
+      If you get conflicts and want to abort (instead of resolving),
+      you need to `git reset \--hard` which loses the uncommitted
+      changes.
+
 
 OPTIONS
 -------
-- 
tg: (1293c95..) t/doc-merge-reset (depends on: origin/master)
--

From: Junio C Hamano
Date: Monday, September 15, 2008 - 4:36 pm

Your analysis is correct --- this has been the correct/established
behaviour of Octopus from day one.

Read the second from the last line of what you were told by git.  Run "git
reset --hard" after that, perhaps.
--

From: Thomas Rast
Date: Monday, September 15, 2008 - 4:47 pm

Including not cleaning up the half-merged state?  If the answer is
"yes", then so be it, merge-octopus has been on this project longer
than I have ;-)


wasn't immediately obvious to the end-user (jammyd in this case),
which started the discussion.

=2D-=20
Thomas Rast
trast@student.ethz.ch

From: Junio C Hamano
Date: Monday, September 15, 2008 - 5:20 pm

By the way, there are three failure modes in Octopus.

 (0) your index (not work tree) is dirty; this is not limited to octopus
     merge but any merge will be refused;

 (1) while it merges other heads one-by-one, it gets conflicts on an
     earlier one, before it even attempts to merge all of them.  Recording
     the heads that it so far attempted to merge in MERGE_HEAD is wrong
     (the result won't be an Octopus the end user wanted to carete), and
     recording all the heads the user gave in MERGE_HEAD is even more
     wrong (it hasn't even looked at the later heads yet, so there is no
     way for the index or work tree to contain anything from them).

     The above is hitting this case.

 (2) it gets conflicts while merging the _last_ head.  It records
     MERGE_HEAD and allows you to record the resolved result.

I think originally we treated (1) and (2) the same way, because an Octopus
is to record the most trivial merge with more than 2 legs, and a rough
definition of "the most trivial" is "tracks of totally independent works;
you could merge them one by one and _in any order_, and the result won't
matter because they are logically independent.  However if they are _so_
independent, why not record them as merged in one step (i.e. octopus),
instead of insisting on recording in what order you merged them".

Obviously, if you get a conflict during Octopus creation, they were not
tracks of totally independent works, and that is where the "Should not" in
the message comes from.

We later relaxed it to allow a conflict at the last step, not because
recording an Octopus with nontrivial merge is particuarly a good idea we
should encourage, but because there simply weren't reason not to.

--

From: Jakub Narebski
Date: Tuesday, September 16, 2008 - 3:53 pm

The problem, as far as I understand it, is not that octopus merge fails.
The problem is that it fails halfway, and it leaves working are in
inconsistent state: git-status output with its "unmerged" suggests that

IIRC the idea of stashing away changes, doing merge, and then unstashing
was shot down as encouraging bad workflows, and more often than not

IMVHO the correct solution would be to rollback changes to the state
before attempting a merge. I'd rather 'octopus' did its thing as
transaction; contrary to ordinary merges if merge fails it doesn't
mean necessary that it is in the state of resolvable conflict, state
we can stop at. Perhaps (if it is still a shell script, doing git-stash
at beginning, and either dropping or popping the stash at the end;
or just saving old index if it is built-in).


BTW. does it mean that "git merge a b" might be not the same as

Well, octopus merge could simply fail when merge is nontrivial (not
limited to being tree-level merge only).

-- 
Jakub Narebski
Poland
--

From: Andreas Ericsson
Date: Tuesday, September 16, 2008 - 11:45 pm

No. Git merges all the sub-things together and then merges the result
of that jumble into the branch you're on.

Someone might have to correct me on that, but that's as far as I've
understood it.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--

From: Jakub Narebski
Date: Wednesday, September 17, 2008 - 1:11 am

From what I understand from above explanation, and from thread on git
mailing list about better implementation of and documenting finding
merge bases for multiple heads, I think octopus merge is done by merging
[reduced] heads one by one into given branch.

This means that "git merge a b" does internally "git merge a; git merge b"
as I understand it.
-- 
Jakub Narebski
Poland
--

From: Miklos Vajna
Date: Wednesday, September 17, 2008 - 8:59 am

On Wed, Sep 17, 2008 at 10:11:02AM +0200, Jakub Narebski <jnareb@gmail.com>=

Sure, but given that both 'a' and 'b' merged (so none of them is subset
of the other, for example so that reduce_heads() would drop one of them)
the order of the parents will be different so the resulting commit will
differ. The resulting tree will no, I think.
From: Andreas Ericsson
Date: Wednesday, September 17, 2008 - 9:40 am

I got it wrong (not wrt reduced heads, but still). My apologies.

If octopus (the program/strategy/whatever) continues to merge after a
branch conflicting against the currently checked out branch (let's call
it "master"), the resulting tree may not differ, but then again, it might.
OTOH, if octopus quits the merge after having encountered a conflict, the
order the branches to merge were passed will always have an impact.

Let's say you have two branches, "clean" and "conflict". Which one is
which should be obvious here.

	git merge clean conflict

will produce a tree with 'master', 'clean' and a conflicted merge of
'conflict', while

	git merge conflict clean

will produce a tree with 'master' and a conflicted merge of 'conflict'.

In short, backing out the entire merge in case of a conflict is almost
certainly the only sane thing to do. If one branch depends on another
to merge cleanly, a different approach should have been taken (depending
branch should have been rebased onto the dependent one prior to merging),
but the merge *might* succeed depending on in which order the other
branches are given as arguments. It's not a very clever idea to merge
something like that though, as bisection will invariably have to mark the
entire depending branch as BAD, although the merge itself could obviously
be a good one.

Clearly, an octopus merge should not be undertaken without knowing very
well what it is one is merging in.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--

Previous thread: Management of opendocument (openoffice.org) files in git by Sergio Callegari on Monday, September 15, 2008 - 3:40 pm. (10 messages)

Next thread: Re: [PATCH] Documentation: warn against merging in a dirty tree by Junio C Hamano on Monday, September 15, 2008 - 4:42 pm. (5 messages)