Hi! I have a couple of repositories converted from CVS to Git using parsecvs. Some are just converted, some I've continued to develop after the conversion (and cloned a couple of times). Since parsecvs gave me all the CVS branches, I would like to record the merge points in the Git history, if possible. I have commited merges with comments like "merged <branchname>", so I can probably find them quite easily, and I do have the imported CVS branches available. Can I record the merge information so git knows about them? Is it safe to do so on a repository that has already been cloned (i.e, will a later push/pull work)? -- \\// Peter - http://www.softwolves.pp.se/ -
I think you can use grafts do achieve this.
From Documentation/repository-layout.txt:
info/grafts::
This file records fake commit ancestry information, to
pretend the set of parents a commit has is different
from how the commit was actually created. One record
per line describes a commit and its fake parents by
listing their 40-byte hexadecimal object names separated
by a space and terminated by a newline.
Cheers,
--
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory
That seems to work, but the grafts list doesn't seem to propagate when I push/pull/clone. Is it possible to get that to work? -- \\// Peter - http://www.softwolves.pp.se/ -
No, the grafts file is purely local. To achieve your goal, you'd have to 'git filter-branch' before pushing/cloning. But beware: this _will_ rewrite your current branch(es). -- larsh -
Ouch. I'll have to think about whether I want to do that, then... -- \\// Peter - http://www.softwolves.pp.se/ -
Well, it isn't dangerous, but if someone has already cloned your repo _and_ commited local changes they'll need to rebase their work onto the new branch(es). Basically, you'll want to inform these people that you're going to rewrite the branches. -- larsh -
Hi, Why should it? This would contradict the whole "a commit sha1 hashes the commit, and by inference the _whole_ history" principle. Ciao, Dscho -
Does it? Why can't the grafts file itself be committed to the repository and live in the history? Well, yeah, the SHA1 hashing is one of Git's main strengths, but it also opens up some weaknesses. -- \\// Peter - http://www.softwolves.pp.se/ -
Hi, Yes! Of course! If what you want becomes possible, I could make an evil change in history long gone, and slip it by you. You could not even see You can do that already. But you have to ask the people at the other end If you really think that, I doubt you understood the issues at hand. Ciao, Dscho -
I would see the grafts file being changed, which would alert me (the problem I have with graft is that it *replaces* history information for an element, not just *add* to it, which threw me off at my first Last time I tried, git would not add files that was in the ".git" subdirectory to version control. I might have done something I have, I'm just thinking of the issues that are created by solving the issues it does solve. -- \\// Peter - http://www.softwolves.pp.se/ -
Hi, The thing is: it is too easy to overlook a tiny change like this. And it is very, very difficult to see what it _really_ changed. Well, I was not explicit enough. You can check in the grafts file _under a different name_. Outside of .git/. Hth, Dscho -
Well, technically, if the grafts file was part of the repo, you wouldn't be= =20 able to change the (in-tree) grafts file without affecting the SHA1 of HEAD= =2E=20 In other words, given a commit SHA1 sum, you can be sure that someone else= =20 who checks out the same commit (and has no local modification to their graf= ts=20 file) will see exactly the same history as you do. To a certain degree, this is actually "safer" than today's (out-of-tree)=20 solution, where one can change the grafts file _without_ affecting the=20 current HEAD (SHA1 sum), and thus will not see the same history as someone= =20 else who checks out the same HEAD. This is of course _intended_ to a certai= n=20 degree by the current implementation, but can easily cause confusion if=20 people lose track of what's in their respective grafts files. Of course, this is both a blessing and a curse: Say, for example, we have=20 three commits: =2E.. --> A --> B --> C and commit B changes the (in-tree) grafts file. Now if I have HEAD @ A, I w= ill=20 see a different history than if I have HEAD @ C. Worse: If one person has=20 HEAD @ A, and another person has HEAD @ C, and neither is aware of the graf= ts=20 file change in B, there is _plenty_ of room for getting confused if the two= =20 persons start discussing the repo history. Note, however, that similar=20 confusement can be achieved today if one of the persons forgets having=20 changed his out-of-tree grafts file The grafts file concept is very powerful, but can also be extremely confusi= ng.=20 Adding in-tree versioning of the grafts file will make it more powerful=20 (since we can now easily share and update "errata" to the repo history), bu= t=20 it might also make things _orders_of_magnitude_ more confusing (as=20 demonstrated in the above example, although to be fair, similar confusement= =20 can be had in today's out-of-tree solution). At some point things may becom= e=20 so confusing that we'd rather drop the feature ...
Hi, All this does not change the fact that installing a graft and 'git gc --prune'ing gets rid of the old history. D'oh. Automatically installing grafts is wrong. Ciao, Dscho -
So will rebasing and --prune'ing, or pulling a rebased branch and --prune'i= ng.=20 Git already gives you _plenty_ of different ropes to hang yourself with. Th= e=20 I tend to agree with you here, because the possibility for massive confusio= n=20 is huge, but that doesn't deny the fact that, if used properly (and that's = a=20 _big_ 'if'), this is a very powerful feature. =2E..Johan =2D-=20 Johan Herland, <johan@herland.net> www.herland.net
Hi, But that is not the question here. The question here is: are users allowed to hang _others_? I say: no. Ciao, Dscho -
Well, to a certain degree (and depending on your level of paranoia), you're= =20 always responsible for the code entering your own repo, and you could alway= s=20 set up a hook disallowing ".gitgrafts" (or whatever it would be called) fro= m=20 entering your repo. But taking this (and everything else that's been said) into account, I tota= lly=20 agree with you that adding this feature would open up a _massive_ can of=20 worms. EOD =2E..Johan =2D-=20 Johan Herland, <johan@herland.net> www.herland.net
Hi, Yeah, right. And you could also stay in an oxygen tent the whole time to avoid being infected with some virus. Seriously, your proposal does not make any sense. If you have to set up a hook to get the _sane_ behaviour, something is really wrong. So I do not really understand why you brought up this idea here and now. I understand that you wanted to end this discussion, but I could _not_ let your statement stand uncorrected. Ciao, Dscho -
Well, I think this does kind of have some commonality with another issue that has come up before: git clone only clones the really core repository data. That's generally a big feature, and I think it's absolutely the correct thing to do. But I can also see that sometimes, you might want to clone more than the actual repository, and get things like SVN metadata, branch reflogs, various hooks and all the config options too. Of course, in practice, at least right now, the right thing to do for that is to just do a recursive filesystem copy and then a "git status", but I think the background here is that some people simply do end up wanting to transfer more infrastructure than just the actual repository data. One thing to note: one reason for *not* allowing that is that incremental upgrades of non-repo data is obviously not possible. You might be able to *clone* a repo with config info and other metadata (if nothing else, then by just doing that raw filesystem copy), but you will never ever be able to _fetch_ the updates, because they aren't part of the core repository, and aren't versioned. So I think I can understand why some people would want to do things like this, but I do think it's broken. Yes, you can make the grafts file (or the config file) be part of the repo, and even just add a symlink to your .git/ directory, but it's simply not a very good model. So I think it always does end up breaking (other people might rebase, and break your grafts, or just not want them in the first place, or they don't care about the same things, and mess up "your" configuration etc etc). So the git repo layout is designed to have the minimally required shared state, and not anything else. Linus -
I agree that sharing the "metainfo" (i.e. config, grafts, hooks, reflogs,=20 rerere magic, etc.) of the repo is not something git should do in the gener= al=20 case. But in some specific workflows (e.g. in-house, centralized workflows), I th= ink=20 it makes sense to coordinate/share some of this info between repos. But in= =20 that case, I guess such coordination/sharing can be done by special-purpose= =20 tools built on top of git (e.g. in-house admin scripts). =2E..Johan =2D-=20 Johan Herland, <johan@herland.net> www.herland.net
Hi, No. Use filter-branch, and publish the cleaned up history (possibly as a new branch/repo). Ciao, Dscho -
I'm considering doing this, and just replace the published repository with the "fixed" one (and fix-up all my clonings of it). I'm having some problems digesting the git-filter-branch manual page though--is there an easy way of automating the process, given that I now have a "grafts" file that expresses what I would like git-filter-branch to do (I guess it would have to work backwards changing the merge points, to be able to find all the revisions under the names I've used in the grafts file)? -- \\// Peter - http://www.softwolves.pp.se/ -
