Re: git-rerere observations and feature suggestions

Previous thread: how to track changes of a file by bill lam on Monday, June 16, 2008 - 3:46 am. (4 messages)

Next thread: Re: git-rerere observations and feature suggestions by David Kastrup on Monday, June 16, 2008 - 4:26 am. (1 message)
From: Ingo Molnar
Date: Monday, June 16, 2008 - 4:01 am

We are running a rather complex Git tree with heavy use of git-rerere 
(the -tip kernel tree, with more than 80 topic branches). git-rerere is 
really nice in that it caches conflict resolutions, but there are a few 
areas where it would be nice to have improvements:

 - Fixing resolutions: currently, when i do an incorrect conflict
   resolution, and fix it on the next run, git-rerere does not pick up
   the new resolution but uses the old (buggy) one on the next run. To
   fix it up i have to find the right entries in .git/rr-cache/* and
   manually erase them. Would be nice to have "git-rerere gc <pathspec>"
   to flush out a single bad resolution.

 - File deletion: would be nice if git-rerere picked up git-rm
   resolutions. We hit this every now and then and right now i know 
   which ones need an extra git-rm pass.

 - Automation: would be nice to have a git-rerere modus operandi where
   it would auto-commit things if and only if all conflicting files were 
   resolved.

 - Sharing .git/rr-cache. It's quite a PITA to share the .git/rr-cache
   amongst -tip maintainers right now. It seems to have dependencies on 
   the index file, so if we want to share the conflict resolution data, 
   we have to copy our index file (which is dangerous anyway and assumes 
   very similar repositories).

   It would be much nicer if we could share conflict resolutions with 
   each other - and with others as well. For example linux-next could 
   re-use our conflict resolution data as well - often Stephen Rothwell 
   has to re-do the same conflict resolution as well, creating 
   duplicated work.

   ( Also, it's a GPL nitpicky issue: the conflict resolution database 
     can be argued to be part of "source code" and as such it should be 
     shared with everyone who asks. With trivial merges the data is
     probably not copyrightable hence probably falls outside the scope 
     of the GPL, but with a complex topic tree like -tip with dozens of 
     conflict resolutions, the ...
From: Mike Hommey
Date: Monday, June 16, 2008 - 4:09 am

- At least, compress the data in the rr-cache. It can grow big quite
  easily. Also, I wonder if keeping the entire files is not overkill...

Mike
--

From: Pierre Habouzit
Date: Monday, June 16, 2008 - 8:48 am

Actually it would be rather straightforward to put it in the usual git
store, and represent the current rr-cache with a flat file that points
to the in-git preimage/postimages, and make git-gc aware of those.

  This would deal with the huge number of files + compression quite
easily. I'm quite sure it's pretty straightforward actually :)

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
From: Pierre Habouzit
Date: Monday, June 16, 2008 - 8:57 am

Actually, this is probably a required step in the direction of sharing
such things btw.

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
From: Sverre Rabbelier
Date: Monday, June 16, 2008 - 9:18 am

Perhaps an approach similar to the 'notes' implementation can be used,
in which a separate branch is created to contain the notes. This way
the rerere information (being the 'rerere' branch) can be shared
easily (by just pulling the branch), and as said we get free
compression. Another advantage would be that you automagically get the
ability to unlearn a bad rerere by simply (partially) reverting a
commit on the rerere branch!

-- 
Cheers,

Sverre Rabbelier
--

From: Karl
Date: Tuesday, June 17, 2008 - 12:37 am

FWIW, StGit is well on its way to store its patch metadata in a git
branch, for much the same reasons.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle
--

From: Theodore Tso
Date: Monday, June 16, 2008 - 4:27 am

For a more complex merge resolution, granted that it rises to the
level of being "copyrightable", but I think it would be a huge stretch
to call the rr-cache the "preferred form for modifications"!  :-)

   	    	     	 	    	     - Ted
--

From: Ingo Molnar
Date: Monday, June 16, 2008 - 12:52 pm

yeah - i'm not really arguing any detail of the GPL here. I'm arguing 
the principle: there should be no technical assymetry between maintainer 
and contributor. So if i am able to run an effort-free integration of 85 
topic branches, i'd like contributors (who will eventually grow up into 
co-maintainer roles in the future) to be able to do the same, if they 
want to do so.

right now that is simply not possible technically - it's even very hard 
to share a .git/rr-cache with a co-maintainer whom i can trust with my 
index file. (which is an otherwise unsafe private binary cache that i'd 
not put into a public repository as it could in theory contain lots of 
unrelated data and is not endian-safe, etc.)

	Ingo
--

From: Junio C Hamano
Date: Monday, June 16, 2008 - 1:25 pm

Where did you get the idea that .git/index is involved in any way, I
wonder...
--

From: Ingo Molnar
Date: Monday, June 16, 2008 - 1:46 pm

so it's only the rr-cache metadata that is involved? We had a few cases 
where git-rerere sessions were not repeatable by copying the 
.git/rr-cache, so i just assumed that there's some extra metadata in the 
index file. When that happened i took a look at git/builtin-rerere.c:

 static int find_conflict(struct path_list *conflict)
 {
        int i;
        if (read_cache() < 0)
                return error("Could not read index");

and (mistakenly) assumed that git-rerere depends on having something in 
the index file - but on a second look it just checks out the conflicting 
file(s) from the index file, right?

	Ingo
--

From: Junio C Hamano
Date: Monday, June 16, 2008 - 2:37 pm

The binary part of the index should be in network byte order and endian
safe.  But it is not necessary to share the index.  Well, if you think
about it, it would be mighty silly if index had any long term effect on
the operation of rerere, which is all about "I've done many conflict
resolutions in the past.  My work tree state (including the index) came
back to a state similar to the conflicted state I saw some time ago.
Let's reuse the previous resolution if we can."  You might have switched
branches, ran "reset --hard" and did 47 thousands different things to your
index since you resolved the conflict you are about to re-resolve ;-).

The replay and conflict recoding codepath of rerere goes like this:

 * read the index, list the paths that have conflicts;

 * inspect the conflicted blob to compute the conflict signature $sig and
   store the sig and path in MERGE_RR;

 * look into rr-cache/$sig; does it have already a conflict resolution
   recorded?

   - If so, modify the file in the working tree the same way to bring
     rr-cache/$sig/preimage to rr-cache/$sig/postimage by 3-way merge.

   - if not, record the file in the working tree as rr-cache/$sig/preimage

The resolution recording codepath goes like:

 * see if any paths listed in MERGE_RR is resolved in the index;

 * look into rr-cache/$sig for such resolved path.  Does it already record
   a resolution?

   - If not, we have a new resolution we can use.  Record it as
     rr-cache/$sig/postimage for later use.

So rerere _does_ look at the index to decide what entries in rr-cache are
relevant and applicable.  But other than that, it is not used.  I do not
think there is no reason copy index to be able to reuse rr-cache.

--

From: Junio C Hamano
Date: Monday, June 16, 2008 - 11:46 am

I agree this is a real issue (I sometimes know that the resolution is iffy
and say "rerere clear" to choose not to record it, but that is working
around the issue with a perfect foresight and is not a solution).

I think (and I think you would agree) "gc" is not the right word but
rather you would want to more actively discard the wrong one.

I agree that it is the right UI to do this to specify paths right after
you found that a bad resolution that was recorded previously was used by
rerere (I think that is what you are suggesting).  Upon such a request, we
should undo the bad resolution and bring the working tree copy to the

I originally did not have need for anything other than three-way conflict
resolving to a result.  I do not know how safe reapplying a removal to

I am not sure how safe this is.  rerere as originally designed does not
even update the index with merge results so that the application of
earlier resolution can be manually inspected, and this is exactly because
I consider a blind textual reapplication of previous resolution always
iffy, even though I invented the whole mechanism.
--

From: Ingo Molnar
Date: Monday, June 16, 2008 - 12:09 pm

We use a 'safe, lazy integration' method in -tip, that basically has 
external checks against any integration bugs.

Basically, we integrate only about once a day, and we advance the topic 
branches but do not reintegrate on every topic merge. We merge commits 
_both_ to their target topic branches, and to the (previous) integration 
branch.

Then once a day (or every second day) we 'reintegrate': we propagate the 
topic branches to the linux-next auto-*-next branches [recreating them 
from scratch] and flush out the messy criss-cross merges from the 
integration tree.

But that is always an identity transformation as far as the integration 
result is concerned: the result of the integration run must be exactly 
the same content (obviously it results in a very different tree 
structure) as the previous one. We only run it on a perfectly tested 
tree so we know none of our previous merges were wrong, and we want the 
git-rerere result to be the same. We repeat the integration until the 
end result matches.

In fact sometimes git-rerere is able to pick up a conflict resolution 
from our 'messy' delta-merge into the integration tree, which is an 
added bonus. (this doesnt always work if the merge order differs from 
integration order)

Anyway, the gist is that in this workflow it does not hurt at all if 
git-rerere is "unsafe", and we'd love to have the integration as fast as 
possible. Right now most of my manual overhead is in making sure that 
git-rerere has not missed some file.

At a ~100 conflicting files tracked, that is rather error-prone, and i'd 
love to have further automation here besides a rather lame method of 
grepping for:

  "Resolved 'kernel/Makefile' using previous resolution."

type of patterns in git-merge output.

So i'd not mind if git-rerere was safe by default, but it would be nice 
to have some knob to turn it into something fast and automatic. For us 
it would be much _safer_, because right now most of our manual energy is 
spent on ...
From: Junio C Hamano
Date: Monday, June 16, 2008 - 1:50 pm

Oh, "unsafe switch" that is off by default will not hurt anybody, and I do
not mind it as a new feature.  We are in agreement in that sense.

Perhaps the way forward would be (and this is independent of the issue of
recording removal as a possible form of resolution):

 (1) Introduce a new configuration rerere.autoupdate that is off by
     default, but when it is on, paths cleanly resolved by rerere will
     also be updated in the index (if we have capability to record
     removal, this may remove such a path from the index as the result).

 (2) The callers of rerere that expects rerere to resolve needs to be
     changed to see if the resulting index after rerere is fully merged,
     and continue.  Currently the callers are "merge", "rebase" and "am",
     I think.  This step might be a bit more involved than you might
     think, as rerere currently happens in the codepath that knows the
     caller does _not_ go further than leaving the failed conflict to be
     sorted out by the user (rerere is designed as merely a way to help).

     Also you _might_ want a separate configuration rerere.autocommit to
     control this --- the user (but not you) might be willing to allow
     autoupdate but you may still want to eyeball the result.

Independent of the above, we have two potential new features:

 * Introduce "git rerere revert paths..."  that brings the index and
   working tree back to the conflicted state after a previous resolution
   is applied, because that resolution is incorrect.  The old resolution
   cached in rr-cache is also removed.

   This however will become much less useful if you allow autoresolution
   to be committed automatically, as the caller will move ahead without
   giving you a chance to say "oh, that one is bad -- do not proceed".

 * Somehow record the fact that the resolution for a particular conflict
   signature is to remove the resulting path.

--

From: Ingo Molnar
Date: Wednesday, June 18, 2008 - 3:57 am

just to demonstrate it, i tried today to do an octopus merge of 87 topic 
branches:

git-merge build checkme core/checkme core/debugobjects core/futex-64bit 
core/iter-div core/kill-the-BKL core/locking core/misc core/percpu 
core/printk core/rcu core/rodata core/softirq core/softlockup 
core/stacktrace core/topology core/urgent cpus4096 genirq kmemcheck 
kmemcheck2 mm/xen out-of-tree pci-for-jesse safe-poison-pointers sched 
sched-devel scratch stackprotector timers/clockevents timers/hpet 
timers/hrtimers timers/nohz timers/posixtimers tip tracing/ftrace 
tracing/ftrace-mergefixups tracing/immediates tracing/markers 
tracing/mmiotrace tracing/mmiotrace-mergefixups tracing/nmisafe 
tracing/sched_markers tracing/stopmachine-allcpus tracing/sysprof 
tracing/textedit x86/apic x86/apm x86/bitops x86/build x86/checkme 
x86/cleanups x86/cpa x86/cpu x86/defconfig x86/delay x86/gart x86/i8259 
x86/idle x86/intel x86/irq x86/irqstats x86/kconfig x86/ldt x86/mce 
x86/memtest x86/mmio x86/mpparse x86/nmi x86/numa x86/numa-fixes x86/pat 
x86/pebs x86/ptemask x86/resumetrace x86/scratch x86/setup x86/smpboot 
x86/threadinfo x86/timers x86/urgent x86/urgent-undo-ioapic x86/uv 
x86/vdso x86/xen x86/xsave

it failed miserably:

 warning: ignoring 066519068ad2fbe98c7f45552b1f592903a9c8c8; cannot 
 handle more than 25 refs
 [...]
 fatal: merge program failed
 Automated merge did not work.
 Should not be doing an Octopus.
 Merge with strategy octopus failed.

this wasnt even for purposes of an integration run: all i wanted to do 
was to pick up 2-3 new commits i have queued into 2-3 topic branches, 
into the (throw-away) integration branch. All the other branches were 
unmodified and already merged into the integration branch.

Hence i believe that the suggestions above by Git that i'm doing 
something wrong are ... wrong :-)

My scripting around this would be a lot faster (less than 10 seconds 
runtime versus a minute currently) and more robust if we could do such 
higher-order ...
From: Miklos Vajna
Date: Wednesday, June 18, 2008 - 4:29 am

The upcoming builtin-merge won't have this problem. I have added a
testcase for this in my working branch:

http://repo.or.cz/w/git/vmiklos.git?a=3Dcommit;h=3D7eef40b3cd772692c6eb7520=
686300533f35f10c
From: Ingo Molnar
Date: Wednesday, June 18, 2008 - 11:43 am

cool, thanks a ton!

stupid question: does this mean that if i install the latest Git devel 
snapshot (v1.5.6-rc3-21-g8c6b578 or later), i'll be able to experiment 
around with it right now?

	Ingo
--

From: Miklos Vajna
Date: Wednesday, June 18, 2008 - 12:53 pm

Nope. It is currently in the 'builtin-merge' branch of
git://repo.or.cz/git/vmiklos.git. And I'm working on to be merged after
1.5.6 will be out.
From: Ingo Molnar
Date: Wednesday, June 18, 2008 - 4:36 am

some hard numbers. Doing a scripted loop of 80 git-merges is 16.2 
seconds:

 earth4:~/tip> time ( for N in $(cat 11 12 13 14); do git-merge $N; done )
 [...]
 Already up-to-date.

 real    0m16.211s
 user    0m10.719s
 sys     0m5.604s

doing the octopus merge of 4x 20 branch octopus merges is 11.6 seconds:

 earth4:~/tip> time ( for N in 1 2 3 4; do git-merge $(cat 1$N); done )
 Already up-to-date. Yeeah!
 Already up-to-date. Yeeah!
 Already up-to-date. Yeeah!
 Already up-to-date. Yeeah!

 real    0m11.580s
 user    0m8.617s
 sys     0m2.895s

a 40% speedup - and would be another 10% faster with an order-of-80 
merge as well i think. Not to be sniffed at.

	Ingo
--

From: Jakub Narebski
Date: Wednesday, June 18, 2008 - 3:01 pm

As a part of patch series introducing new fast-forward strategies
(--ff=never, --ff=only) there was patch which did merge reduction
before selecting merge strategy, by Sverre Hvammen Johansen
  "[PATCH 4/5] Head reduction before selecting merge strategy"
  http://thread.gmane.org/gmane.comp.version-control.git/80288/focus=80335
(I'm not sure if the link above is to nevest version of patch series).

It is now part of 'pu' branch, as commit 59171adb9c.  It didn't make
into 'next' as it conflict with builtin merge by Miklos Vajna, which
(as he wrote) also includes head reduction.

So you either would have to compile git from builtin-merge repository,
compile git from 'pu' or just use git-merge.sh from 'pu' branch, or
apply or cherry pick appropriate commit and compile git.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Miklos Vajna
Date: Wednesday, June 18, 2008 - 3:38 pm

Side note: builtin-merge does not have problem with merging 25+ refs
even in case every ref contains "new" commits.

The patch by Sverre Hvammen Johansen is useful if some of the refs has
no "new" commits, so it will help here, but I think it does not help in
all cases.
From: Karl
Date: Thursday, June 19, 2008 - 12:23 am

So how many parents can a commit have, exactly? Is there a hard limit
somewhere, or just a point beyond which some git tools will start
behaving strangely?

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle
--

From: Miklos Vajna
Date: Thursday, June 19, 2008 - 12:29 am

On Thu, Jun 19, 2008 at 09:23:08AM +0200, Karl Hasselstr=F6m <kha@treskal.c=

AFAIK there is no limit at a core level. git-show-branch has a limit of
25 refs (it can't show more then 25 refs at one time) and git-merge.sh
uses show-branch, while builtin-merge does not.
From: Junio C Hamano
Date: Thursday, June 19, 2008 - 12:30 am

There is no hard limit at the data structure level.

git-commit-tree has a hard limit of accepting 16 parents.  git-blame has
the same 16-parent limit while following the history (but the one in
'next' has lifted the latter limitation).

But that is purely academic.  Anybody who does an octopus with more than 8
legs should get his head examined ;-).

--

From: Karl
Date: Thursday, June 19, 2008 - 1:21 am

Catalin and I are tossing ideas around for how to represent the
history of an StGit patch stack (using a git commit for each log
entry). One complication is that we have to keep references to all
unapplied patches so that gc will leave them alone (and so that they
will get carried along during a pull, in the future). And the number
of unapplied patches is potentially large, so I thought we'd be going
to have to make a tree of "merge" commits to connect them all up.

(What we'd really like, of course, is a way to refer to a set of
commits such that they are guaranteed to be reachable (in the gc and
pull sense), but not considered "parents".)

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle
--

From: Miklos Vajna
Date: Thursday, June 19, 2008 - 1:33 am

On Thu, Jun 19, 2008 at 10:21:56AM +0200, Karl Hasselstr=F6m <kha@treskal.c=

I had a similar problem in git/vmiklos.git on repo.or.cz, while working
on builtin-rebase: I squash several patches using rebase -i before
sending a series, but it's nice to have the old long list of small
patches in case I would need them later.

What I did is to have a rebase-history branch: each commit in it is an
octopus merge:

- The first parent is the previous rebase-history ref

- The second is the old HEAD

- The third is the new HEAD

This way I can use git rebase -i without worrying about loosing history,
even if reflogs are not shared among machines.

(It may or may not be a good idea to do something like this in StGit, I
just though I share this idea here.)
From: Karl
Date: Thursday, June 19, 2008 - 2:19 am

What you're describing is pretty much what we're thinking about doing
-- have a log branch where each commit contains enough metadata to
recreate the complete patch stack state at that point in time, and has
all the parents it needs to be safe from gc.

The particular problem I'm asking about here is that due to StGit's
concept of "unapplied" patches that are per definition not reachable
from the current branch head, a given log entry might have to keep an
unbounded number of commits from being gc'ed. Thus my question about
what would blow up if we were to make a commit with 50 parents. Or
100. Or 1000, if our users are crazy enough. (The alternative being,
of course, to make a tree of octopuses with a fixed maximum fan-out.)

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle
--

From: Miklos Vajna
Date: Thursday, June 19, 2008 - 3:06 am

On Thu, Jun 19, 2008 at 11:19:03AM +0200, Karl Hasselstr=F6m <kha@treskal.c=

I may miss something, but you have (at least) two options to store
"patches".

You can store them as a blob, make a tree of them and make a commit in
the log branch point to the tree. This one has the advantage of being
able to do a 'git log' on a particular patch of the patch set.

The other one is to create n+1 trees (and commits, where the first
commit has no parent) for n patches, and point to the last commit from
the log branch.
From: Karl
Date: Thursday, June 19, 2008 - 3:35 am

If I don't store the pre or post tree in its entirety, I lose the
ability to do patch application by three-way merge. (The current StGit
design assumes that we can always make a three-way merge as a last
resort when applying patches. Basically, StGit is just a fancy way to
rebase.)

But yes, this is a viable idea. (Though once I have to store one of
the trees, I believe it's actually simpler and cheaper to just store
the other tree as well, instead of having to compute the diff and

There's actually no point in making more than one commit. A tree can
easily hold a lot of sub-trees.

I have an existing implementation that stores the pre and post tree
for each patch, plus some metadata (message, author). The issue with
this format is that every time we write a new log entry (that is, for
every StGit command), we have to call git multiple times in order to
write several new trees and blobs.

StGit normally represents each patch by a commit object, so it should
be faster to simply write a single new commit to the log that has some
metadata in its commit message and just refers to all the patches'
commit objects (by having them as parents). Which is why I was
inquiring about the maximum number of parents of a commit object.

( Some background: At a given point in time, your StGit stack consists
  of a few applied patches, and a few unapplied patches. The applied
  patches are just a linear sequence of commits at the top of your
  current branch, so we can trivially save them all from the garbage
  collector by making the stack top a parent of our log commit. The
  unapplied patches, however, are commits that are not reachable from
  the stack top -- they can be "pushed" onto the stack by rebasing, at
  which point they become applied, but until then we can't make any
  assumptions about them being ancestors of anything. So a log commit
  potentially has to have _every_ unapplied patch as a parent. (If we
  know that the commit of an unapplied patch used to be applied, we
  ...
From: Junio C Hamano
Date: Monday, June 16, 2008 - 12:10 pm

By the way, this safety is not a theoretical issue but has been a real
one.  I had two topics that changed the calling convention of the same
function in different ways, and when they were merged to 'pu', the
declaration, definition, and call sites existed on both of these branches
were handled beautifully by rerere.

Recording autoresolution would have been a wrong thing to do.  One of the
branches added a new call site to a file that was not among the ones that
conflicted in the merge between the two branches.  That call site, that
uses the calling convention of one branch, needed to be adjusted to
accomodate the change of calling convention from the other branch (from
textual merge's point of view, this has to be an evil merge).  I had to
make and keep a mental note about that new call site until both topics
graduated to 'master' (similar to your need to remember a particular merge
is resolved to removal right now).

To safely automate reapplication of such a merge, rerere needs to become
much more clever.

The conflicts rerere notices and records are strictly per blob.  A
conflicted merge to a blob is inspected and a "conflict signature", which
becomes the directory name under rr-cache, is computed.  We record the
conflicted blob as a whole as the preimage, and your hand resolution as a
whoe as the postimage.  Next time when you have a conflicted merge to a
blob, and the conflict has the exact same conflict signature, we run
three-way merge between the recorded preimage, postimage and the new
conflicted result.

If we want to handle new call sites added only on a single side, you
should be able to express something like "when a merge has a conflicted
blob with this conflict signature, look in the whole tree, even outside
the set of conflicted paths, and change this text to that".  This is too
much automation and I somehow think the potential for errors (both from
the tool and from the user) is too high.



--

From: Ingo Molnar
Date: Monday, June 16, 2008 - 12:44 pm

in our workflow, we dont ever do any semantic things during the 
integration run. I.e. we dont put more complex merge changes into the 
integration merge commits.

Such integration effects do come up occasionally (especially when a 
topic changes some widely used infrastructure), and we handle them via 
separate merge branches. The current ones in -tip are 
tip/tracing/ftrace-mergefixups and tip/tracing/mmiotrace-mergefixups.

They are one or two orders of magnitude more rare than regular 
conflicts, and they show up immediately during testing. (or we 
anticipate them beforehand)

i.e. we'd like to have a 'dumb' phase of integration, as much cached and 
automated as possible. Things that need more thought need to go into 
separate branches anyway, for better reviewability - merge commits are 
rather hard to debug as they hide their true contents, so we try to keep 
them simple and contextual only.

	Ingo
--

From: Ingo Molnar
Date: Monday, June 23, 2008 - 2:49 am

another git-rerere observation: occasionally it happens that i 
accidentally commit a merge marker into the source code.

That's obviously stupid, and it normally gets found by testing quickly, 
but still it would be a really useful avoid-shoot-self-in-foot feature 
if git-commit could warn about such stupidities of mine.

( and if i could configure git-commit to outright reject a commit like 
  that - i never want to commit lines with <<<<<< or >>>>> markers)

Another merge conflict observation is that Git is much worse at figuring 
out the right merge resolution than our previous Quilt based workflow 
was. I eventually found it to be mainly due to the following detail: 
sometimes it's more useful to first apply the merged branch and then 
attempt to merge HEAD, as a patch.

I've got a script for that which also combines it with the "rej" tool, 
and in about 70%-80% of the cases where Git is unable to resolve a merge 
automatically it figures things out. ('rej' is obviously a more relaxed 
merge utility, but it's fairly robust in my experience, with a very low 
false positive rate.)

The ad-hoc "tip-mergetool" script we are using is attached below. It's 
really just for demonstration purposes - it doesnt work when there's a 
rename related conflict, etc.

Peter Zijstra also wrote a git-mergetool extension for the 'rej' tool 
btw., he might want to post that patch. I've attached Chris Mason's rej 
tool too.

	Ingo


[ "$#" = 0 ] && {
  SRC=`git-ls-files -u | cut -f2 | head -1`
} || {
  SRC=`git-ls-files -u | grep $1 | cut -f2 | head -1`
}

[ "$SRC" = "" -o ! -f "$SRC" ] && { echo "$1 has no conflicts!"; exit -1; }

SRC_SED=`echo $SRC | sed 's/\//\\\\\//g'`

SHA_1=`git-ls-files -u | grep $SRC | grep '^.* .* 1\>' | cut -d' ' -f2`
SHA_2=`git-ls-files -u | grep $SRC | grep '^.* .* 2\>' | cut -d' ' -f2`
SHA_3=`git-ls-files -u | grep $SRC | grep '^.* .* 3\>' | cut -d' ' -f2`

mv -b $SRC $SRC.automerge             || { echo error1; exit -1; }

git-diff $SHA_1 $SHA_2 ...
From: Peter Zijlstra
Date: Monday, June 23, 2008 - 7:19 am

This is what I run with.

I added the cp to the 3-way merge tools because I think its stupid to
see the messed up merge markers instead of the original file.

The rej target basically takes the local version and takes the diff
between base and remote and applies that as a patch, upon failure it
invokes rej to fix up the mess.

--- /usr/bin/git-mergetool	2008-04-08 19:01:37.000000000 +0200
+++ git-mergetool	2008-06-02 19:00:55.000000000 +0200
@@ -214,12 +214,14 @@ merge_file () {
 	    ;;
 	meld|vimdiff)
 	    touch "$BACKUP"
+	    cp -- "$BASE" "$path"
 	    "$merge_tool_path" -- "$LOCAL" "$path" "$REMOTE"
 	    check_unchanged
 	    save_backup
 	    ;;
 	gvimdiff)
 		touch "$BACKUP"
+		cp -- "$BASE" "$path"
 		"$merge_tool_path" -f -- "$LOCAL" "$path" "$REMOTE"
 		check_unchanged
 		save_backup
@@ -271,6 +273,13 @@ merge_file () {
 	    status=$?
 	    save_backup
 	    ;;
+        rej)
+	    touch "$BACKUP"
+	    cp -- "$LOCAL" "$path"
+	    diff -up "$BASE" "$REMOTE" | patch "$path" || rej "$path"
+	    check_unchanged
+	    save_backup
+	    ;;
     esac
     if test "$status" -ne 0; then
 	echo "merge of $path failed" 1>&2
@@ -311,7 +320,7 @@ done
 
 valid_tool() {
 	case "$1" in
-		kdiff3 | tkdiff | xxdiff | meld | opendiff | emerge | vimdiff | gvimdiff | ecmerge)
+		kdiff3 | tkdiff | xxdiff | meld | opendiff | emerge | vimdiff | gvimdiff | ecmerge | rej)
 			;; # happy
 		*)
 			return 1


--

From: Peter Zijlstra
Date: Monday, June 23, 2008 - 7:26 am

While we're on the subject, I only found one tool that 'digs' these
merge markers and that is xxdiff --unmerge.

One would think more tools understand these merge markers, but I
couldn't find any.


--

From: Jeff King
Date: Monday, June 23, 2008 - 8:12 am

The right place for this is in a pre-commit hook, which can look at what
you are about to commit and decide if it is OK. In fact, the default
pre-commit hook that ships with git performs this exact check. You just
need to turn it on with:

  chmod +x .git/hooks/pre-commit

-Peff
--

From: Ingo Molnar
Date: Monday, June 23, 2008 - 8:22 am

cool, thanks :-)

	Ingo
--

From: Jakub Narebski
Date: Monday, June 16, 2008 - 1:11 pm

From what I remember some time ago on git mailing list there was idea
for git-rerere2, which would record resolutions on tree level,
i.e. record file renames.  It could probably record file deletion as
well... would someone implement it, and didn't it stay loose idea.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Johannes Schindelin
Date: Tuesday, June 17, 2008 - 3:24 am

Hi,


I was dreaming about having "git rerere infer-from <merge-commit>".  This 
would be

- more versatile, as you do not have to ask the guy to share the cache,

- would avoid transmitting lots of data that can be inferred from the 
  data,

- would avoid relying on the honesty of the person sharing the cache, and

- it would put all license wieners^Wissues at rest.

FWIW this is in my TODO list, but I am unlikely to get to it, least of all 
before 1.5.6 comes out.

Ciao,
Dscho

--

Previous thread: how to track changes of a file by bill lam on Monday, June 16, 2008 - 3:46 am. (4 messages)

Next thread: Re: git-rerere observations and feature suggestions by David Kastrup on Monday, June 16, 2008 - 4:26 am. (1 message)