Re: RFC: grafts generalised

Previous thread: [PATCH] Documentation: Point to gitcli(7) from git(1) by Brian Gernhardt on Wednesday, July 2, 2008 - 10:13 am. (1 message)

Next thread: Re: How do I stop git enumerating my working directory by Paul Gardiner on Wednesday, July 2, 2008 - 10:47 am. (1 message)
To: <git@...>
Date: Wednesday, July 2, 2008 - 10:35 am

I'm in the process of converting and stitching and patching vast amounts
of initially disjunct CVS and SVN repositories into larger complete
histories inside a single git repository. Recreating history as
accurately as possible.

The problem I encounter is that any number of times I have to "edit"
history in a non-parameterable fashion, in any of the following ways:
- Change parents.
- Add merges.
- Change author, committer, commitdate, authordate.
- Change the tree (because of conversion errors in the automated
conversion process) belonging to a single commit.
- Retrofit a patch which has to ripple through all of history until
the present.

The only things which are easily done at the moment are:
Change parents and add merges. This can be accomplished fairly easily
using the grafts file.
The other changes are messy at best and need to be parameterised into the
form of a shell script so that git filter-branch can have a go at it.
This parameterisation is doable for author/committer/dates in most cases
(but not pretty), but is rather (too) convoluted for ripple-through
patches.

You have to imagine that the whole tree has lots of interconnects
already (merges), and changing the tree at a point in history which has
to ripple through is a mess, because all references and interconnects
need to be rewritten as well.

I propose the following:
- Extend git fsck to do more sanity checks on the content of the grafts
file (to make it more difficult to shoot yourself in the foot with
that file; my feet will be grateful).
- Extend the grafts file format to support something like the following syntax:

commit eb03813cdb999f25628784bb4f07b3f4c8bfe3f6
Parent: 7bc72e647d54c2f713160b22e2e08c39d86c7c28
Merge: 3b3da24960a82a479b9ad64affab50226df02abe 13b8f53e8ccec3b08eeb6515e6a10a2a
Merge: ac719ed37270558f21d89676fce97eab4469b0f1
Tree: 32fc99814b97322174dbe97ec320cf32314959e2
Author: Foo Bar (FooBar) <foo@bar>
AuthorDate: Sat Jun 6 13:50:44 1998 +0000
Commit: Foo Bar (FooBar...

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 8:13 pm

Please, don't. It adds completely unnecessary complexity and it is
_not_ grafting anymore - look the word up in a dictionary. :-)

Have a look at what you wrote above - now, Git already has a way to
store all this information, right? In the commit objects!

So, the real solution is to take the commit objects you want to
modify, create new commit objects, then graft the new commit on all the
old commit children. It fits neatly in the Git philosophy, there is no
need at all to tweak the current infrastructure for this and it should
be trivial to automate, too.

--
Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 8:16 pm

Oops, sorry; I stopped reading the branch of the thread I thought was
going off on a different tangent one post too early. :-)

--
Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson
--

To: Petr Baudis <pasky@...>
Cc: Stephen R. van den Berg <srb@...>, <git@...>
Date: Wednesday, July 2, 2008 - 8:28 pm

What you wrote was a very good summary of what Dmitry suggested earlier
;-)
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 1:19 pm

I don't think that the grafts file is the right place for this kind of
information. Perhaps, it would be better to have a separate file or
even a directory with files where commit-id identifies a text file with
a new commit object, which should be placed instead of an old one. So,
it will be easy to tell git filter-branch to use this new information.

However, if you want more than just ability to edit commits in a text
file but also inspect changes using normal git commands and gitk (as it
is possible with grafts), it will require changes to the git core, which,
perhaps, not difficult to implement using pretend_sha1_file(), but I am
not sure that everyone will welcome that...

Dmitry
--

To: Dmitry Potapov <dpotapov@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 1:59 pm

Yet the grafts file is exactly the place where this type of

Not quite sure why this makes it easier. The point is that there
is not supposed to be a grafts file in a proper repository. Thus,
having a lot of these files means a larger disruption to the core, and
I'd like the core to be as efficient and lean as possible given an empty

I'd want to avoid a plethora of files, and the changes that can be
specified are supposed to be partial overrides, not complete rewrites.
So using pretend_sha1_file() is a bit overkill and more than I was
aiming for.

The point is, that the changes in grafts (as they are now) are *not*
used when cloning. I.e. the only thing you mess up is your *own*
repository, not someone else's. I.e. you can't make someone remote
think that the repository has been altered. That would require git
filter-branch, which immediately changes all the historical SHA1s, and
makes the changes in history blatantly visible.
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 1:58 pm

On second thought, it may be not necessary. You can extract an old commit
object, edit it, put it into Git with a new SHA1, and then use the graft file to
replace all references from an old to a new one. And you will be able to see
changes immediately in gitk.

Dmitry
--

To: Dmitry Potapov <dpotapov@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 2:10 pm

Hmmmm, interesting thought. That just might solve my problem.
In that case, I will stick to extending git fsck to check grafts more
rigorously and fix git clone to *refrain* from looking at grafts.
If anyone still wants the extended format, I'd be willing to implement
it, but my immediate itch for it is gone.
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--

To: Stephen R. van den Berg <srb@...>
Cc: Dmitry Potapov <dpotapov@...>, <git@...>
Date: Thursday, July 3, 2008 - 2:02 am

I don't think it would.

You want to apply a patch through a part of the history. To do that, it is
not sufficient to apply the patch to only one commit/tree and then fake
parenthood of its child commits. You still need to apply the patch to all
children.

-- Hannes

--

To: Johannes Sixt <j.sixt@...>
Cc: Dmitry Potapov <dpotapov@...>, <git@...>
Date: Thursday, July 3, 2008 - 3:30 am

I am aware of that.
There are actually two common cases:
- Historical changes which are confined and don't ripple through. The
above solution works just fine for that.
- Ripple-through changes. They indeed need to be applied to every tree
in the first-parent chain. Even though this is going to take a
considerable amount of time, there still are certain advantages to
doing this using the method described above:
+ You can apply the patch to every commit/tree "interactively" if you want.
(Yes, I know, git-sequencer supports this one as well, but not the
next point).
+ You can view the change at any point in time (including in relation to the
tree that follows it), right after making the amendments (without letting
it ripple through to the end).
+ The ripple-through does not need to be performed in topological order,
i.e. eventually you'll have to touch everything, but you can do it
in the order you see fit (whatever is most efficient to work on).
+ If, at some point during the ripple-through process, you find out
that you forgot some change(s), you can abort or restart the
ripple-through without having spent all that time waiting for a
full-ripple-through.

Actually, ripple-through changes are rare. In the current project it
seems I need exactly one, but it's buried deep in the past (sadly).
The reason why I need it, is to make sure that git-bisect will work for
any revision in the past (i.e. the tree contained/contains some
too-clever-for-their-own-good $Revision$-expansion dependencies)
--
Sincerely,
Stephen R. van den Berg.

This is a day for firm decisions! Or is it?
--

To: Stephen R. van den Berg <srb@...>
Cc: Dmitry Potapov <dpotapov@...>, <git@...>
Date: Thursday, July 3, 2008 - 3:42 am

But you do know that you don't need to apply the change *now*; you can
apply it at bisect-time? Unless you expect you or your mere mortal
coworkers are going to do dozens of bisects into that part of the history,
I wouldn't change history *like*this*. But of course, I don't understand
the circumstances enough, so... just my 2 cents.

-- Hannes

--

To: Johannes Sixt <j.sixt@...>
Cc: Dmitry Potapov <dpotapov@...>, <git@...>
Date: Thursday, July 3, 2008 - 5:37 am

That is exactly the case, I do expect dozens of bisects.
--
Sincerely,
Stephen R. van den Berg.

This is a day for firm decisions! Or is it?
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 4:39 pm

Linus suggested that "git-fsck and repacking should just consider
it[grafts] to be an _additional_ source of parenthood rather than
a _replacement_ source."

http://article.gmane.org/gmane.comp.version-control.git/84686

Dmitry
--

To: Dmitry Potapov <dpotapov@...>
Cc: Stephen R. van den Berg <srb@...>, <git@...>
Date: Wednesday, July 2, 2008 - 5:27 pm

Yeah, thanks for a reminder.

http://thread.gmane.org/gmane.comp.version-control.git/37744/focus=37866

is still on my "things to look at" list.

--

To: Dmitry Potapov <dpotapov@...>
Cc: Stephen R. van den Berg <srb@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Wednesday, July 2, 2008 - 5:49 pm

This shows how the "object transfer ignores grafts" side of the earlier
suggestion by Linus would look like to get people started. Totally
untested.

I threw in for_each_commit_graft() in the patch so that updates to the
reachability walker can add otherwise hidden objects, but otherwise it is
not used yet.

builtin-pack-objects.c | 5 +++++
builtin-send-pack.c | 3 ++-
cache.h | 1 +
commit.c | 10 ++++++++++
commit.h | 2 ++
environment.c | 1 +
upload-pack.c | 1 +
7 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 28207d9..53b0b33 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -30,6 +30,7 @@ git-pack-objects [{ -q | --progress | --all-progress }] \n\
[--threads=N] [--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
[--stdout | base-name] [--include-tag] \n\
[--keep-unreachable | --unpack-unreachable] \n\
+ [--ignore-graft] \n\
[<ref-list | <object-list]";

struct object_entry {
@@ -2160,6 +2161,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
die("bad %s", arg);
continue;
}
+ if (!strcmp(arg, "--ignore-graft")) {
+ honor_graft = 0;
+ continue;
+ }
usage(pack_usage);
}

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index d76260c..d932352 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -27,6 +27,7 @@ static int pack_objects(int fd, struct ref *refs)
*/
const char *argv[] = {
"pack-objects",
+ "--ignore-graft",
"--all-progress",
"--revs",
"--stdout",
@@ -36,7 +37,7 @@ static int pack_objects(int fd, struct ref *refs)
struct child_process po;

if (args.use_thin_pack)
- argv[4] = "--thin";
+ argv[5] = "--thin";
memset(&po, 0, sizeof(po));
po.argv = argv;
po.in = -1;
diff --git a/cache.h b/cache.h
index 188428d..00858f9 100644
--- a/cache.h
+++ b/ca...

To: Dmitry Potapov <dpotapov@...>
Cc: Stephen R. van den Berg <srb@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Wednesday, July 2, 2008 - 8:03 pm

This updates the earlier patch to teach the object transfer side to ignore
grafts, which makes things consistent between dumb commit walkers and
native transport. It is not meant for application as I haven't thought
about[*1*] nor looked into how this may interact with the "shallow clone"
stuff (which is graft in disguise but implemented separately).

Footnote. *1* I also suspect Linus did not think about interactions with
"shallow" when he made the suggestion referenced above, as "shallow" was
still a relatively new curiosity back then.

I am not sure if the addition of --ignore-graft to revision.c should be
there when this becomes real. I added it primarily for debugging
purposes, as it is something the end users should never trigger in the
normal workflow.

--
builtin-pack-objects.c | 5 +++
builtin-send-pack.c | 3 +-
cache.h | 1 +
commit.c | 10 +++++++
commit.h | 2 +
environment.c | 1 +
revision.c | 4 +++
t/t6500-graft.sh | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
upload-pack.c | 2 +
9 files changed, 97 insertions(+), 1 deletions(-)
create mode 100755 t/t6500-graft.sh

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 28207d9..53b0b33 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -30,6 +30,7 @@ git-pack-objects [{ -q | --progress | --all-progress }] \n\
[--threads=N] [--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
[--stdout | base-name] [--include-tag] \n\
[--keep-unreachable | --unpack-unreachable] \n\
+ [--ignore-graft] \n\
[<ref-list | <object-list]";

struct object_entry {
@@ -2160,6 +2161,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
die("bad %s", arg);
continue;
}
+ if (!strcmp(arg, "--ignore-graft")) {
+ honor_graft = 0;
+ continue;
+ }
usage(pack_usage);
}

diff --git a/builtin-send-pack.c b/builtin-sen...

To: Dmitry Potapov <dpotapov@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 5:18 pm

Yes, I know that's what he suggested, the way it should be implemented
IMO though is by checking once without and once with regard to grafts.
And still it should be such that git clone disregards grafts completely.
I'll fix both, eventually, since I need this functionality to verify
correctness for the projects I'm working on at the moment.

As for repack, it should probably ignore grafts, except for reference.
I.e. repack/gc should consider all mentioned SHA1s in the grafts file
to be referenced and undeletable.
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--

To: Stephen R. van den Berg <srb@...>
Cc: Dmitry Potapov <dpotapov@...>, <git@...>
Date: Wednesday, July 2, 2008 - 5:28 pm

I could see an argument that the only modes you really need are a) use
grafts as replacements, and b) use grafts as additions. There is
perhaps no need for c) ignore grafts.

For example, say I wanted to give someone a copy of my repo that
includes grafts (ignoring the fact that this is probably bad to do in
general). He could git-clone it and then install a copy of my grafts
file, as long as git-clone does (a) or (b) but not (c). On the other
hand, if he just wants a copy of the "real" (graft-free) repo, then
git-clone needs to do (b) or (c) but not (a). git-fsck needs (b), and
most normal git operations want (a) (since that was the original
purpose of grafts).

Based on that, (c) is redundant, unless you're really concerned about
not sending redundant objects to people who clone your repo that has
grafts installed. But I think you probably shouldn't have people
cloning your grafted repository anyway unless you know what you're
doing, and if you know what you're doing, you probably want (b). If
you see what I mean.

Have fun,

Avery
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 2:33 pm

This script is just a prove of the concept. It seems to work for me, but
I don't really tested it.

===========================================
#!/bin/bash

set -e

# creating some silly repo
git init
# creating some history
for ((i=0; $i<10; i++))
do
echo foo$i > foo$i
git add foo$i
git commit -m "add foo$i"
done

# run gitk to see it
gitk --all &

# dump all graft info to text file
git rev-list --parents --all > .git/info/grafts.tmp
mv .git/info/grafts.tmp .git/info/grafts

# please choose what commit you want to edit
echo
while read -p 'Edit commit: ' C
do
C=$(git rev-parse "$C") || continue
# edit commit C
git cat-file commit $C > .git/COMMIT_OBJ
vim .git/COMMIT_OBJ

C2=$(git hash-object -w -t commit .git/COMMIT_OBJ)

# replace all references from C to C2
sed -e 's/\<'$C'\>/'$C2'/g' < .git/info/grafts > .git/info/grafts.tmp
mv .git/info/grafts.tmp .git/info/grafts
done
===========================================

Dmitry
--

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 12:35 pm

[...]

First, if I remember correctly (from KernelTrap and now defunct Kernel
Traffic and one issue of Git Traffic) the 'graft' mechanizm was
created so it would be possible to "graft" (join) historical
conversion repository with the "current work" git repository (started
from zero when git was deemed good enough for Linux kernel
development). The same mechanism is used for shallow clone, where one
goes in the opposite direction, shortening history instead of joining
two repositories (two histories).

The fact that git-filter-branch (and earlier cg-admin-rewrite-hist)
respects grafts, and rewrites history so that grafts are no-op and are
not needed further is a bit of side-effect. So I think that it would
be better to provide generic git-filter-branch filter which can
understand this "generalized grafts" file format, or rather
'description of changes' file. Put it in contrib/, and here you
go...

--
Jakub Narebski
Poland
ShadeHawk on #git
--

To: Jakub Narebski <jnareb@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 1:32 pm

Quite. Which is exactly the spirit I'm extending here.
I need it to stitch together history, but it needs to be more perfect
than mere connecting parents.

Also, the graft mechanism specifically is intended as a temporary
solution until one uses filter-branch to "finalise" the result into a

I beg to differ. It's not a side effect, it's the proper way to get
rid of the grafts file. Grafts are temporary and ugly. In proper
repositories they are a sign of transition to a proper state.

The problem is that the process of fixing history is an iterative one,
which can take many months, and everytime you make a change, the
correctness needs to be viewed using gitk.

For argument sake, consider the repository at hand which I'm trying to
"fix", it has 33000 commits, distributed over eight branches with
roughly 3500 merges over a timeperiod of 13 years.
The eight branches were eight separate CVS repositories which have
intersecting histories, and 3500 merges between CVS repositories (i.e.
branches).

If I need to backpatch a certain patch into history, it is likely that
in order to let the change ripple through, it will take 20000 commits to
be rewritten every time I make a slight change to history.
It's not really workable to ripple through 20000 commits everytime I
make a historical change, yet I need to view the change in gitk.

Using git filter-branch, or git sequencer basically has the same
problem, I need to ripple through most of history to get to a state
which is viewable using gitk again. That is too long a turnaround
cycle.

Using the proposed grafts format, I can make changes incrementally, and
immediately viewable (though not cloneable) on the local repository using gitk.
Then after making all the necessary changes, one git filter-branch run
will "burn" the changes into the repository proper in one go
(renumbering all tags, branches and merges along the way).
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--...

To: Stephen R. van den Berg <srb@...>
Cc: <git@...>
Date: Thursday, July 3, 2008 - 8:43 pm

[...]

I wanted to propose that git-filter-branch generic "generalized grafts"
file based filter should be accompanied by extending gitk so it
understand this format to...

...but after reading wonderfull suggestion to create new commits with
corrected contents, and insert them (replace older version by them)
using grafts, thought and brought independently by Dmitry Potapov and
Petr Baudis, I think that you would be best with extending gitk to
support this way instead.

You would have to extend gitk to maintain reverse revision mapping
(from revision to its children), and then you would be able to edit
history interactively from within gitk, with gitk correcting its
internal structures to redisplay changed commits, and creating commits
and doing grafting behind the scenes for later git-filter-branch
run.

--
Jakub Narebski
Poland
--

To: Stephen R. van den Berg <srb@...>
Cc: Jakub Narebski <jnareb@...>, <git@...>
Date: Wednesday, July 2, 2008 - 8:21 pm

Grafts are _much_ older than filter-branch and I'm not sure where did

There's nothing ugly or necessarily temporary about grafts. One example
of completely valid usage is adding previous history of a project to it
later.

First, you don't need to carry around all the archived baggage you are
probably rarely going to access anyway if you don't need to; changing a
VCS is ideal cutoff point.

Second, you don't need to worry about doing perfect conversion at the
moment of the switch.

Third, even if you think you have done it perfectly, it will turn out
later that something is wrong anyway.

Fourth, it may not be actually _clear_ what the canonical history should
be. Consider linux-kernel, you can graft the BitKeeper history (or one
of possible candidates for the ideal conversion, though one is AFAIK
clearly favoured), or you could also graft commit-per-tarball history
even from the times before BitKeeper; you certainly don't want either in
the current main history DAG.

--
Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson
--

To: Petr Baudis <pasky@...>
Cc: Jakub Narebski <jnareb@...>, <git@...>
Date: Thursday, July 3, 2008 - 3:11 am

Not in direct documentation, but it is what breaths down from posts on
the mailinglist like:

http://kerneltrap.org/mailarchive/git/2008/6/10/2085624

That depends on the project, of course, and is not a valid statement in
general. Part of the charm of full history is that git-blame and

Not necessarily. I have automated the checkout-verification-process which
basically checks out every revision from the respective old repository
and binary-compares it with the corresponding revision in the git
repository. This ensures a full binary match across the board.
With respect to historical merges, I agree, those might not be
completely correctly grafted, but the level of correctness can be
determined at will, and once we achieve somewhere around 99% accuracy,

That depends on the project. In my project it *is* clear, so this point
doesn't make any difference.

--
Sincerely,
Stephen R. van den Berg.

This is a day for firm decisions! Or is it?
--

To: <git@...>
Date: Wednesday, July 2, 2008 - 12:43 pm

Maybe the upcoming git-sequencer could be the appropriate place? It
tries to achieve just that: edit history by specifying a list of
commands. The currently planned set of commands would need to be
amended, but the framework should be in place.

Michael

--

To: Michael J Gruber <michaeljgruber+gmane@...>
Cc: <git@...>
Date: Wednesday, July 2, 2008 - 1:42 pm

That's the problem. Like git filter-branch, git sequencer needs you to
parameterise the changes, which, in my case, is hardly possible, since
the changes are randomlike.
Also, having to run the sequencer to dig 20000 commits into the past,
then change something, then come back up and rewrite all following
history and relations (parents/tags/merges) will take a sizeable amount
of time. I need something that can be changed at will, then viewed with
gitk a second later.

These edits are numerous and spread over many months, so the typical
history fixup-sessions involve periods where you make 30 random
historicaledits per hour (which need to be viewed and checked every time
immediately after making the change). And say once every 4 months, you
run it through git filter-branch to cast everything into stone. A
typical git filter-branch run takes 15 minutes on a repository this
size.
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--

To: Stephen R. van den Berg <srb@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, <git@...>
Date: Monday, July 7, 2008 - 2:28 am

A second later might be too much, but for the case where you need to
add a patch in the middle (which I suspect is the most timeconsuming
and tricky part at the moment), you might want to use a temporary
branch checked out where you need to apply the patch, apply the patch
and then rebase the rest of the history onto that new commit. Rebase
is fairly quick (although not a one-second thing for 20k commits), so
you'll get the time down quite a bit, I imagine.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--

To: Andreas Ericsson <ae@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, <git@...>
Date: Monday, July 7, 2008 - 2:59 am

Not really.
Rebase does two things:
a. Apply every patch/commit again, which takes too long for 20k commits.
b. Mess up carefully grafted parent/merge relationships.

Rebase is only suitable for short linear strands of commits.
The history I'm dealing with is neither short, nor linear.
--
Sincerely,
Stephen R. van den Berg.

A truly wise man never plays leapfrog with a unicorn.
--

To: Stephen R. van den Berg <srb@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, <git@...>
Date: Wednesday, July 2, 2008 - 2:25 pm

I think the point was more about making a tool to do exactly what you
want, based on the new git sequencer. Note that git filter-branch could
also be rewritten to use the sequencer.

Mike
--

To: Mike Hommey <mh@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, <git@...>
Date: Wednesday, July 2, 2008 - 2:37 pm

As far as I understood it, the new git sequencer rewrites history
proper. That is timeconsuming by definition, and thus it is *not*
possible to make a tool based on the sequencer that supports the desired
iterative-history-rewrite workflow.
--
Sincerely,
Stephen R. van den Berg.

You are confused; but this is your normal state.
--

To: Stephen R. van den Berg <srb@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, Mike Hommey <mh@...>, <git@...>
Date: Wednesday, July 2, 2008 - 3:31 pm

Hi,

I'm somehow quite confused about the desired workflow but I try an
answer.

If I got the problem right, it is possible.
But you have to rewrite and cannot just fake history, of course.

...for example, a "pick <commit>" that just picks the _tree_ of the
commit and not the _introduced changes_. (I've never used info/grafts,
but if I get the principle right, such tree-picks could realize a
linear list of info/grafts history fakes.)

sequencer doesn't allow to change committer data, but this could
be an easy change if you really need that.
The same with the author timestamp, that could only be reused from

"pause" instruction, and then do manual changes, then
git sequencer --continue

I wonder if grafts can be used in combination with sequencer in such a
way that you rewrite foo~20000..foo~19950 and then fake the parents of

You can run gitk whenever you did "pause" in the sequencer file.
[Btw, an integration of sequencer into gitk is also on the TODO list,
but that's OT here.]

Regards,
Stephan

--
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F
--

To: Stephan Beyer <s-beyer@...>
Cc: Stephen R. van den Berg <srb@...>, Michael J Gruber <michaeljgruber+gmane@...>, Mike Hommey <mh@...>, <git@...>
Date: Wednesday, July 2, 2008 - 4:42 pm

I don't think we speak about any normal workflow but about importing
"initially disjunct CVS and SVN repositories into larger complete
histories inside a single git repository." This is one-time work, not

Using grafts allows you to fake history, which is very useful during
import, because it allows you to edit history without running any
filter-branch, which is very timeconsuming. Of course, at the end
you have to run git filter-branch to have the "true" history, otherwise
anyone who clones from you will end up with a broken repo.

The purpose of rebase (and I believe the sequencer too) is rather
different -- to allow you to keep your changes as patches to the

I don't think it is a good idea. During the normal work you should never
use grafts. Well, you can use grafts to add old history, but using it for
anything else is really dangerous, because its *fakes* history. git rebase
(and AFAIK sequencer too) just re-write history of some branch. IOW, it
creates another branch from a different starting point using patches from
some existing branch and then reassign the branch name to it.

Dmitry
--

To: Dmitry Potapov <dpotapov@...>
Cc: Stephen R. van den Berg <srb@...>, Michael J Gruber <michaeljgruber+gmane@...>, Mike Hommey <mh@...>, <git@...>
Date: Wednesday, July 2, 2008 - 7:46 pm

Hi,

I have written this in the context that Stephen only changes some commits
from a long time ago (foo~20000) and then I showed a way how to avoid that
sequencer rewrites the rest which takes so long.
This is not related to "normal work", but to Stephen's use case (if I
got it right).

What I've meant, was:
Instead of faking a lot of parents, changes and even merges using an
extended grafts file, he could rewrite some patches - which can be fast -
and then use _only one_ graft to change the parent to the changed and
rewritten commit.
This can be done iteratively and seems to be a good agreement in speed
and reliability.

Regards,
Stephan

--
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F
--

To: Stephan Beyer <s-beyer@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, Mike Hommey <mh@...>, Dmitry Potapov <dpotapov@...>, <git@...>
Date: Thursday, July 3, 2008 - 2:05 am

Indeed.
--
Sincerely,
Stephen R. van den Berg.

This is a day for firm decisions! Or is it?
--

To: Stephen R. van den Berg <srb@...>
Cc: Michael J Gruber <michaeljgruber+gmane@...>, Mike Hommey <mh@...>, <git@...>
Date: Wednesday, July 2, 2008 - 3:36 pm

s/once/ones/

To give it some sense. Sorry ;)

--
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F
--

To: <git@...>
Date: Wednesday, July 2, 2008 - 2:34 pm

Yes, that was at least my point. As I understand, git filter-branch -i
is a candidate for that rewrite.

But I understand now that OP wants to do lots of history edits and see
them immediately before doing the actual (time consuming) rewrite; and
then do the rewrite occasionally. Rewriting is surpirsingly slow even on
tmpfs.

Michael

--

Previous thread: [PATCH] Documentation: Point to gitcli(7) from git(1) by Brian Gernhardt on Wednesday, July 2, 2008 - 10:13 am. (1 message)

Next thread: Re: How do I stop git enumerating my working directory by Paul Gardiner on Wednesday, July 2, 2008 - 10:47 am. (1 message)