Re: [PATCH] transplant: move a series of commits to a different parent

Previous thread: help with cvsimport by Raimund Bauer on Saturday, June 23, 2007 - 2:11 am. (3 messages)

Next thread: [PATCH] Add git-save script by Nanako Shiraishi on Saturday, June 23, 2007 - 6:02 am. (8 messages)
From: Steffen Prohaska
Date: Saturday, June 23, 2007 - 5:51 am

The result of importing branches using git-cvsimport depends
on the time of the first commit to the cvs branch. If the first
commit is done after other commits to the cvs trunk the result
of cvsimport may be wrong. git-cvsimport creates a wrong history.

The problem is quite severe because merging such a wrongly imported
branch by git may be successful without reporting any problem but the
results are wrong. The result may differ from what a simple cvs merge
(cvs up -j) yields.

This test script creates two cvs repositories and imports both to
git. The first cvs repository has the 'wrong' order of commits and
yields an error. The result of a merge in cvs and a merge in git
differs. The second cvs repository has the 'right' order and the
import to git runs as expected and merging yields the same results
in cvs and git.

The conclusion is you must not rely on the existing cvsimport for
tracking cvs branches. The history of such branches may be plain wrong.
Git may display different patches than cvs would do. Merging cvs
topic branches may yield completely wrong results. One obvious thing
that may happen is that a merge reverts changes commited to the cvs
trunk before the first commit to the cvs branch. And this would happen
without any indication by git. Everything would seem to run smoothely.

This conclusion should be stated in bold at appropriate places in
the documentation.

It's a pity because merging is what git is especially good at. But
as long as git-cvsimport may create the wrong history you can't use
git to merge cvs topic branches without double checking every detail,
which makes this approach unfeasable.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t9600-cvsimport.sh |  184 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 184 insertions(+), 0 deletions(-)
 create mode 100755 t/t9600-cvsimport.sh

diff --git a/t/t9600-cvsimport.sh b/t/t9600-cvsimport.sh
new file mode 100755
index 0000000..180cd8a
--- /dev/null
+++ ...
From: Steffen Prohaska
Date: Saturday, June 23, 2007 - 6:26 am

This is not only a theoretical problem but I am experiencing it on a
real-world repository right now. I have a topic branch that branches
off from the wrong commit in git.

Is there an easy way to fix this? I know the right commit the branch
should have as its parent. How can I move it there? git-rebase is not  
the
right command because the patches derived from my branch are already  
wrong.
I would only need to attach the first commit to a different parent.

	Steffen

-

From: Steffen Prohaska
Date: Saturday, June 23, 2007 - 12:27 pm

git-transplant.sh <onto> <from> <to>

transplant starts with the contents of <onto> and puts on top of
it the contents of files if they are touched by the series of
commits <from>..<to>.  If a commit touches a file the content of
this file is taken as it is in the commit. No merging is
performed. Original authors, commiters, and commit messages are
preserved.

Warning: this is just a quick hack to solve _my_ problem.
- No error checking is performed.
- Removal of files is not handled.
- Whitespace in filename is not handled.
- The index is left in dirty state.
- No branch is created for the result.
- The script is not integrated with git's shell utilities.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 git-transplant.sh |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 60 insertions(+), 0 deletions(-)
 create mode 100755 git-transplant.sh

This script seems to solved the problem for me. I can place
the topic branch imported from cvs to the right place.

What do you think? Is this a sane way to handle the situation?

    Steffen

diff --git a/git-transplant.sh b/git-transplant.sh
new file mode 100755
index 0000000..3320071
--- /dev/null
+++ b/git-transplant.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+
+[[ $# == 3 ]] || { echo "$0 <onto> <from> <to>"; exit 1; }
+onto=$(git-rev-parse $1)
+from=$(git-rev-parse $2)
+to=$(git-rev-parse $3)
+
+# copied from git-filter-branch.sh 
+set_ident () {
+    lid="$(echo "$1" | tr "A-Z" "a-z")"
+    uid="$(echo "$1" | tr "a-z" "A-Z")"
+    pick_id_script='
+        /^'$lid' /{
+            s/'\''/'\''\\'\'\''/g
+            h
+            s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
+            s/'\''/'\''\'\'\''/g
+            s/.*/export GIT_'$uid'_NAME='\''&'\''/p
+
+            g
+            s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
+            s/'\''/'\''\'\'\''/g
+            s/.*/export GIT_'$uid'_EMAIL='\''&'\''/p
+
+            g
+            s/^'$lid' [^<]* <[^>]*> \(.*\)$/\1/
+            ...
From: Johannes Schindelin
Date: Saturday, June 23, 2007 - 1:54 pm

Hi,


This reeks of rebase.

IOW, I suspect that it does almost the same as

	git checkout <to>
	git rebase -s ours --onto <onto> <from>^

Ciao,
Dscho

-

From: Steffen Prohaska
Date: Saturday, June 23, 2007 - 11:55 pm

It doesn't do anything useful for me. In fact it seems as if it
did nothing.

I tried your proposal:
  - rebase says 'Changes from <onto> to <onto>',
  - then it rewinds to <onto>,
  - next it says several time 'Already applied: ...' with increasing
    patch numbers,
  - then 'All done',
  - The result is the same as if I executed 'git reset --hard <onto>'.

I thought about something similar before I wrote transplant.
Honestly, I didn't understand what rebase would do combined with ours.

	Steffen
-

From: Alex Riesen
Date: Saturday, June 23, 2007 - 2:04 pm

# detached head
git checkout $(git rev-parse onto) && git format-patch --stdout
--full-index from..to|git am -3
-

From: Steffen Prohaska
Date: Sunday, June 24, 2007 - 12:08 am

No. This one tries to apply the _changes_ between from..to. What I
need is the resulting _content_ of files modified between from..to.

The _changes_ are already wrong because they are relative to the
history. But the history was messed up by git-cvsimport, as I tried to
explaine in my first mail in this thread. So the changes derived
from the wrong history are useless.

transplant only checks if a file is modified by a commit. If it is
it takes the _content_ of the file in that commit. The changes from
the parent commit, which you can find by format-patch, do not matter.

I believe it's more like git-filter-branch, but I wasn't yet abel to
tell git-filter-branch how to do the job.

	Steffen

-

From: Alex Riesen
Date: Sunday, June 24, 2007 - 1:20 am

Ach, yes. I should have read your message a bit more closely. There is

I suspect git-filter-branch can be both.

-

From: Johannes Schindelin
Date: Sunday, June 24, 2007 - 3:26 am

Hi,


Oh! But the commit messages do no longer correspond to their patches, do 
they?

Example:

In "onto", you have a sorely needed bugfix in main.c. In "from", you have 
not. Then you do your transplant, and all of a sudden, the 
first transplanted commit _undoes_ that bugfix (because you take the 
contents at face value), but the commit message _cannot_ say so, or even 
why.

IMHO this makes no sense (and that is why I misunderstood it as being a 
rebase).

Ciao,
Dscho

-

From: Steffen Prohaska
Date: Sunday, June 24, 2007 - 3:45 am

It doesn't make sense on a sane repository.

I need the script to fix an insane, broken repository that was generated
by git-cvsimport [1]. cvsimport created commits attached to the wrong  
parent
in the first place. So the patches derived from this history are wrong.
They are different from the patches that you'd expect from the cvs  
repository.
The commit messages and the patches did never correspond. My script  
fixes
this relationship. Only after I transplanted the branch the commits and
their messages match.

The following is a different illustration of the same problem.  
Suppose you copy
a file and modify its contents somewhere else. Now you checkout the  
_wrong_
branch, but do not recognize, copy the file back and commit it. You'd be
writing a message as if the right branch would have been checked out.  
This
message describes what you believe you did. But because of the wrong  
branch
you may have done something completely different to the repository.  
You may
for example have reverted earlier changes. Your commit doesn't make  
any sense
before you transplanted it to the right branch. You don't want to  
apply a
patch to the right branch but transplant the content of your file.

	Steffen

[1] http://article.gmane.org/gmane.comp.version-control.git/50736

-

From: Alex Riesen
Date: Sunday, June 24, 2007 - 1:29 am

Why not just read-tree for every commit? It is not like you're
modifying the repository in any way, just changing parenthood. That'd
solve the problem with deletions.
So it should be enough to read-tree the repo state for each and every
source commit into the index (and you can just use a temporary index
file for that, see GIT_INDEX_FILE). Than just commit the index.

-

From: Steffen Prohaska
Date: Sunday, June 24, 2007 - 2:05 am

I am changing the repository.

I only modify the index for files that have changes in $commit. Their
content gets replaced by the content from the commit. I'm leaving
all other files untouched.

This creates a new series of commits that starts from the repository
state of <onto> and has mixed in files only if they are changed in
the series of commits from..to. These files are just replaced. I'm not
trying to merge changes but just replace the whole file.

Opposed to that, read-tree would modify the content of _all_ files.


Here's the situation before transplant

           o--Y--3
          /
   1--X--2--o--o--o

Say at X the file x.txt got modified. At Y the file y.txt got modified.
3 has both modifications.

Now I do transplant 1 2 3, which yields

     o--Y--4
    /
   1--X--2--o--o--o

y.txt is identical in 3 and 4 but x.txt is identical in 1 (!) and 4.
Hence 3 and 4 are different. The changes to x.txt in commit X got
eliminated from the history. 4 is a mixture of 1 and the repository
state of files at 3 that got modified between 2 and 3. Changes between
1 and 2 got eliminated from the history.

This is exactly what I want to achieve. The content of the files on
branch 3 is correct for all files that were committed after 2. But
because 2 is the wrong branching point all the content originating
from commits between 1 and 2 is wrong. Files committed between
2 and 3 have the right content but the branch needs to be attached
at 1.

	Steffen


-

From: Alex Riesen
Date: Sunday, June 24, 2007 - 2:30 am

No, you don't modify anything. Ever tried to run git-status after your

No, it wouldn't (unless you run git-read-tree -u, and I fail to see
why would you want that). You probably confuse git-read-tree with

This misses merges (see git-rev-list --parents), but does the job for
linear history:

    export GIT_INDEX_FILE="$(git rev-parse --git-dir)/tr.idx"
    parent=$(git rev-parse "$onto")
    git rev-list --reverse "$from..$to" | while read c
    do
	rm -f "$GIT_INDEX_FILE"
	git read-tree $c || break;
	# Authorship information here
	parent=$(git cat-file commit $c | \
	    sed -e '1,/^$/d' | \
	    git commit-tree $(git write-tree) -p "$parent")
	echo "Commit $parent"
    done


-

From: Steffen Prohaska
Date: Sunday, June 24, 2007 - 10:13 am

Magically, the script solved my problem by creating a new, corrected
branch that is different from the original one. I didn't run any


Not really. I instead reset the index to a controlled state.

	Steffen

-

From: Alex Riesen
Date: Sunday, June 24, 2007 - 11:35 am

I meant: "it does not change the working directory". It is irrelevant,
was a bit of confusion on my part.

Your script works, just it can be made simplier: no need for diff,
it'll only hurt perfomance and complicates things. And you don't have
to care about additions/deletions, and it is trivially extensible to
support merges, and the current index is untouched - your user can
continue working in predictable environment.

As to perfomance: read-tree doesn't actually _read_ the blobs to
populate index, just the trees. And diff-tree has do do the same, but
also _compare_ two trees recursively: more work, more memory needed.

BTW, Johannes moved that ident code you copied from git-filter-branch
into its own shell file, so it can be sourced and trivially reused.

-

From: Steffen Prohaska
Date: Sunday, June 24, 2007 - 1:54 pm

I understand that I can leave the default index untouched by using
a different index. I knew that this must be possible somehow but was
too lazy to find out how. Thanks for the details.

I don't see how I can avoid tree diffs. As I pointed out earlier I need
to mix the tree of the base commit of the newly built branch with
files that were changed in the series of commits that I'm transplanting.

Just taking the whole tree from the commits I'm transplanting is  
_wrong_.
I need to only take files that were touched by a commit. The tree of
the tip of the resulting branch can be quite different from the tree


I see, thanks.

Anyway, the script worked for me and I still think it may be useful for
fixing broken repositories resulting from a wrong cvsimport. I would
probably improve many details if someone else considered my work useful.
But up to now it seems as if I failed to explain, why the script would
be needed in the first place.

However, the best way would be to fix git-cvsimport to handle branches
correctly independently of the time of the first commit to a branch;
and avoid insane, broken repositories altogether.

	Steffen

-

From: Alex Riesen
Date: Sunday, June 24, 2007 - 3:20 pm

"git-read-tree --reset" does an in-index merge (just discards unmerged
entries), so it still is better then git-diff-tree. But remove that
unlink, so that the previuos tree is not discarded and do a

You still better make it work properly wrt deleted files.
And you have to be careful not to hit a real content conflict.

-

Previous thread: help with cvsimport by Raimund Bauer on Saturday, June 23, 2007 - 2:11 am. (3 messages)

Next thread: [PATCH] Add git-save script by Nanako Shiraishi on Saturday, June 23, 2007 - 6:02 am. (8 messages)