Re: Recording merges after repo conversion

Previous thread: Hi dear by Angela on Monday, October 8, 2007 - 1:05 pm. (1 message)

Next thread: 'git diff' in rebase--interactive by Johannes Sixt on Tuesday, October 9, 2007 - 1:51 am. (5 messages)
From: Peter Karlsson
Date: Tuesday, October 9, 2007 - 12:09 am

Hi!

I have a couple of repositories converted from CVS to Git using
parsecvs. Some are just converted, some I've continued to develop after
the conversion (and cloned a couple of times).

Since parsecvs gave me all the CVS branches, I would like to record the
merge points in the Git history, if possible. I have commited merges
with comments like "merged <branchname>", so I can probably find them
quite easily, and I do have the imported CVS branches available. Can I
record the merge information so git knows about them?

Is it safe to do so on a repository that has already been cloned (i.e,
will a later push/pull work)?

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Benoit SIGOURE
Date: Tuesday, October 9, 2007 - 12:19 am

I think you can use grafts do achieve this.

 From Documentation/repository-layout.txt:
info/grafts::
         This file records fake commit ancestry information, to
         pretend the set of parents a commit has is different
         from how the commit was actually created.  One record
         per line describes a commit and its fake parents by
         listing their 40-byte hexadecimal object names separated
         by a space and terminated by a newline.

Cheers,

-- 
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory


From: Peter Karlsson
Date: Tuesday, October 30, 2007 - 6:34 am

That seems to work, but the grafts list doesn't seem to propagate when I 
push/pull/clone. Is it possible to get that to work?

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Lars Hjemli
Date: Tuesday, October 30, 2007 - 7:29 am

No, the grafts file is purely local. To achieve your goal, you'd have
to 'git filter-branch' before pushing/cloning. But beware: this _will_
rewrite your current branch(es).

--
larsh
-

From: Peter Karlsson
Date: Tuesday, October 30, 2007 - 2:06 pm

Ouch. I'll have to think about whether I want to do that, then...

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Lars Hjemli
Date: Tuesday, October 30, 2007 - 2:46 pm

Well, it isn't dangerous, but if someone has already cloned your repo
_and_ commited local changes they'll need to rebase their work onto
the new branch(es). Basically, you'll want to inform these people that
you're going to rewrite the branches.

-- 
larsh
-

From: Johannes Schindelin
Date: Tuesday, October 30, 2007 - 7:28 pm

Hi,


Why should it?  This would contradict the whole "a commit sha1 hashes the 
commit, and by inference the _whole_ history" principle.

Ciao,
Dscho

-

From: Peter Karlsson
Date: Wednesday, October 31, 2007 - 2:50 am

Does it? Why can't the grafts file itself be committed to the repository and 
live in the history?

Well, yeah, the SHA1 hashing is one of Git's main strengths, but it also 
opens up some weaknesses.

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Johannes Schindelin
Date: Wednesday, October 31, 2007 - 4:01 am

Hi,


Yes!  Of course!  If what you want becomes possible, I could make an evil 
change in history long gone, and slip it by you.  You could not even see 

You can do that already.  But you have to ask the people at the other end 

If you really think that, I doubt you understood the issues at hand.

Ciao,
Dscho

-

From: Peter Karlsson
Date: Wednesday, October 31, 2007 - 5:07 am

I would see the grafts file being changed, which would alert me (the
problem I have with graft is that it *replaces* history information for
an element, not just *add* to it, which threw me off at my first

Last time I tried, git would not add files that was in the ".git"
subdirectory to version control. I might have done something

I have, I'm just thinking of the issues that are created by solving the
issues it does solve.

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Johannes Schindelin
Date: Wednesday, October 31, 2007 - 5:32 am

Hi,


The thing is: it is too easy to overlook a tiny change like this.  And it 
is very, very difficult to see what it _really_ changed.


Well, I was not explicit enough.  You can check in the grafts file _under 
a different name_.  Outside of .git/.

Hth,
Dscho

-

From: Johan Herland
Date: Wednesday, October 31, 2007 - 5:43 am

Well, technically, if the grafts file was part of the repo, you wouldn't be=
=20
able to change the (in-tree) grafts file without affecting the SHA1 of HEAD=
=2E=20
In other words, given a commit SHA1 sum, you can be sure that someone else=
=20
who checks out the same commit (and has no local modification to their graf=
ts=20
file) will see exactly the same history as you do.

To a certain degree, this is actually "safer" than today's (out-of-tree)=20
solution, where one can change the grafts file _without_ affecting the=20
current HEAD (SHA1 sum), and thus will not see the same history as someone=
=20
else who checks out the same HEAD. This is of course _intended_ to a certai=
n=20
degree by the current implementation, but can easily cause confusion if=20
people lose track of what's in their respective grafts files.

Of course, this is both a blessing and a curse: Say, for example, we have=20
three commits:

=2E.. --> A --> B --> C

and commit B changes the (in-tree) grafts file. Now if I have HEAD @ A, I w=
ill=20
see a different history than if I have HEAD @ C. Worse: If one person has=20
HEAD @ A, and another person has HEAD @ C, and neither is aware of the graf=
ts=20
file change in B, there is _plenty_ of room for getting confused if the two=
=20
persons start discussing the repo history. Note, however, that similar=20
confusement can be achieved today if one of the persons forgets having=20
changed his out-of-tree grafts file


The grafts file concept is very powerful, but can also be extremely confusi=
ng.=20
Adding in-tree versioning of the grafts file will make it more powerful=20
(since we can now easily share and update "errata" to the repo history), bu=
t=20
it might also make things _orders_of_magnitude_ more confusing (as=20
demonstrated in the above example, although to be fair, similar confusement=
=20
can be had in today's out-of-tree solution). At some point things may becom=
e=20
so confusing that we'd rather drop the feature ...
From: Johannes Schindelin
Date: Wednesday, October 31, 2007 - 6:43 am

Hi,


All this does not change the fact that installing a graft and 'git gc 
--prune'ing gets rid of the old history.  D'oh.

Automatically installing grafts is wrong.

Ciao,
Dscho

-

From: Johan Herland
Date: Wednesday, October 31, 2007 - 7:37 am

So will rebasing and --prune'ing, or pulling a rebased branch and --prune'i=
ng.=20
Git already gives you _plenty_ of different ropes to hang yourself with. Th=
e=20

I tend to agree with you here, because the possibility for massive confusio=
n=20
is huge, but that doesn't deny the fact that, if used properly (and that's =
a=20
_big_ 'if'), this is a very powerful feature.


=2E..Johan

=2D-=20
Johan Herland, <johan@herland.net>
www.herland.net
From: Johannes Schindelin
Date: Wednesday, October 31, 2007 - 8:03 am

Hi,


But that is not the question here.  The question here is: are users 
allowed to hang _others_?  I say: no.

Ciao,
Dscho

-

From: Johan Herland
Date: Wednesday, October 31, 2007 - 8:21 am

Well, to a certain degree (and depending on your level of paranoia), you're=
=20
always responsible for the code entering your own repo, and you could alway=
s=20
set up a hook disallowing ".gitgrafts" (or whatever it would be called) fro=
m=20
entering your repo.

But taking this (and everything else that's been said) into account, I tota=
lly=20
agree with you that adding this feature would open up a _massive_ can of=20
worms.


EOD

=2E..Johan

=2D-=20
Johan Herland, <johan@herland.net>
www.herland.net
From: Johannes Schindelin
Date: Wednesday, October 31, 2007 - 8:57 am

Hi,


Yeah, right.  And you could also stay in an oxygen tent the whole time to 
avoid being infected with some virus.

Seriously, your proposal does not make any sense.  If you have to set up a 
hook to get the _sane_ behaviour, something is really wrong.  So I do not 
really understand why you brought up this idea here and now.

I understand that you wanted to end this discussion, but I could _not_ let 
your statement stand uncorrected.

Ciao,
Dscho

-

From: Linus Torvalds
Date: Wednesday, October 31, 2007 - 9:43 am

Well, I think this does kind of have some commonality with another issue 
that has come up before: git clone only clones the really core repository 
data.

That's generally a big feature, and I think it's absolutely the correct 
thing to do.

But I can also see that sometimes, you might want to clone more than the 
actual repository, and get things like SVN metadata, branch reflogs, 
various hooks and all the config options too.

Of course, in practice, at least right now, the right thing to do for that 
is to just do a recursive filesystem copy and then a "git status", but I 
think the background here is that some people simply do end up wanting to 
transfer more infrastructure than just the actual repository data.

One thing to note: one reason for *not* allowing that is that incremental 
upgrades of non-repo data is obviously not possible. You might be able to 
*clone* a repo with config info and other metadata (if nothing else, then 
by just doing that raw filesystem copy), but you will never ever be able 
to _fetch_ the updates, because they aren't part of the core repository, 
and aren't versioned.

So I think I can understand why some people would want to do things like 
this, but I do think it's broken. Yes, you can make the grafts file (or 
the config file) be part of the repo, and even just add a symlink to your 
.git/ directory, but it's simply not a very good model.

So I think it always does end up breaking (other people might rebase, and 
break your grafts, or just not want them in the first place, or they don't 
care about the same things, and mess up "your" configuration etc etc). So 
the git repo layout is designed to have the minimally required shared 
state, and not anything else.

		Linus
-

From: Johan Herland
Date: Wednesday, October 31, 2007 - 10:08 am

I agree that sharing the "metainfo" (i.e. config, grafts, hooks, reflogs,=20
rerere magic, etc.) of the repo is not something git should do in the gener=
al=20
case.

But in some specific workflows (e.g. in-house, centralized workflows), I th=
ink=20
it makes sense to coordinate/share some of this info between repos. But in=
=20
that case, I guess such coordination/sharing can be done by special-purpose=
=20
tools built on top of git (e.g. in-house admin scripts).


=2E..Johan

=2D-=20
Johan Herland, <johan@herland.net>
www.herland.net
From: Johannes Schindelin
Date: Tuesday, October 30, 2007 - 8:05 am

Hi,


No.  Use filter-branch, and publish the cleaned up history (possibly as a 
new branch/repo).

Ciao,
Dscho

-

From: Peter Karlsson
Date: Wednesday, October 31, 2007 - 5:17 am

I'm considering doing this, and just replace the published repository
with the "fixed" one (and fix-up all my clonings of it). I'm having
some problems digesting the git-filter-branch manual page though--is
there an easy way of automating the process, given that I now have a
"grafts" file that expresses what I would like git-filter-branch to do
(I guess it would have to work backwards changing the merge points, to
be able to find all the revisions under the names I've used in the
grafts file)?

-- 
\\// Peter - http://www.softwolves.pp.se/
-

Previous thread: Hi dear by Angela on Monday, October 8, 2007 - 1:05 pm. (1 message)

Next thread: 'git diff' in rebase--interactive by Johannes Sixt on Tuesday, October 9, 2007 - 1:51 am. (5 messages)