To be precise: svn-fe creates commits where git diff-tree treeA treeB I have 32 SVN revs in my history that touch multiple Git commit objects. The simplest example is svn mv svn://svnrepo/branches/badBranchName svn://svnrepo/branches/goodBranchName I'm glad it's stimulating conversation. I'm beginning to wonder if there might be competing design goals for one-way vs. two-way compatibility... Performance is one place where opinions probably greatly differ (I didn't mind taking an extra 30 minutes to mirror my SVN repo because it probably saved more than that in communication overhead later in the process, but that mirror operation is very taxing on your timeline); my exhaustive search of all SVN copies is another (I wanted to be *extremely* certain I knew about all the misplaced branches/tags, but it's inefficient for a casual developer who just wants to interact with an SVN server). It's all just food for thought, and I'm happy to carry on the conversation from my different point-of-view :) Thanks, Stephen --
Hi Stephen, Yep, they're certainly two different ways to approach the problem: I'd be interested in investigating why it will produce different results. Since we both agree that it's easier (and faster) to do it in Git-land, I'm looking into the the areas where it falls short. Right, that IS expected behavior. Don't they correspond to separate SVN revisions anyway? Why would you want to squash them? Ouch! Thanks for the illustrative example- I understand now. We have to bend backwards to perform a one-to-one mapping. It's finally struck me- one-to-one mapping is nearly impossible to achieve, and I don't know if it makes sense to strive for it anymore. Looks like Jonathan Um, there's just one commit that deviates from the branch it's based on (but you don't know that, and I should have been clearer): look at contrib/svn-fe/svn-filter-root.py It's just a minimalistic mapper, but it's fast and done nicely. You When I made this comment, I was thinking of the one-to-one mapping. It Ok, I still don't get this part- why mirror at all? Can't all the information be mined out of the in-memory tree that svn-fe builds while parsing the dumpfile? From the SVN-side, all that's required is a streaming dumpfile like the one that `svnrdump dump` produces. -- Ram --
It's been a while since I was involved in this discussion, so maybe the design has changed by now, but I was under the impression that there would be one "one-to-one" mapping branch (which would never be checked out), containing the history of /, and that the "real" git branches, tags, etc, would be based on the trees originally referenced by the root checkout, with git-notes (or similar) being used to track the weirdness in mappings. How does the "multiple branches touched in a single commit" complicate anything other than the heuristics for automatic branch detection (which I assume nobody is at the stage of talking about yet). I suppose we wouldn't be talking, technically, about a one-to-one mapping in that case, as we would be turning "one" svn revision into "many" git branches, but in the conceptual sense of "one svn repository equals one git repository", I don't see this as being impossible, or so difficult that it shouldn't be striven-for. Something else which is at least semi-common in svn is to treat a folder both as a "directory" and a "branch", which the "checking out /" example would just be an extreme example of. Think in terms of git branches being a "view" of the history, with some mapper sitting between each view and "root" checkout. --
I think there might be a problem in that in git commit is defined by its parents and its final state, while revision in Subversion is IIRC defined by change. Isn't it? -- Jakub Narebski Poland ShadeHawk on #git --
A "change" is a delta between one state and another, so each revision is dependent on those which came before it just as much as a a git commit is. An svn "revision" is a snapshot, regardless of how it is stored, ie, the "svn stores changes, git stores snapshots" is an implementation detail. It's a detail which makes a lot of things easier/faster in git than they would be in svn, but a mere detail none the less. The difference of course is that the "name" of an svn revision stays the same even if aspects of that revision (for example, the commit message) are changed, while the "name" of a git commit is dependent on everything that makes up a commit. In git terms, changing a commit message is considered to be history rewriting, whereas in svn terms it is merely something which happens occasionally as part of regularly maintained repository. the git Philosophy is ingrained in its object model: If you change something which led to a state, you change the state itself. I don't think there should be an attempt to work-around that philosophy when talking to external repositories. That is to say: if a commit message (or other revprop) in history changes, we want to treat it as if we were recovering from an upstream rebase. Of course, a problem in that could very well be "how would we know about it?", which is a good question, but one not directly related to [revision+directory]<->[commit] mappings, afaik ;) --
Thanks for the correction, and for explanation. The problem with one-to-one [SVN revision]<->[Git commit] mapping in the situation of Subversion mishandling described by Stephen Bash persist, though the problem is not because "svn stores changes, git stores snapshots", but because of widely different model of branches. Subversion uses the inter-file branching model (Wikipedia says it was "borrowed" from Perforce) to handle branches and tags. It uses "branches are copies (folders)" paradigm, and technically it doesn't have separate namespace for branches but have projects, branches, and projects' filesystem hierarchy mixed together; what part of path is branch name is defined by convention only. This model makes it easy to mess up repository (because there are no technological barriers for going against conventions, like mentioned all-branches change, or changing tags, or reversed hierarchy or branches and projects). Because (from what I understand) revisions in Subversion are whole project all-branches snapshots, and because revision identifiers are monotonically incrementing numbers, there is no inherent notion of _parent_ of commit, like there is in Git. (I think that was the reason why merge tracking was absent from Subversion until version 1.5, and why mergeinfo is per-file rather than per-commit/per-revision property). In Git commits store snapshot of top level of a project (contrary to revisions in Subversion being snapshot of top level of repository tree, all branches and tags in it). Each commit in Git also stores its parent or parents. Those commit-to-parent links make up DAG (Directed Acyclic Graph) of revisions. Branches in Git reside in separate namespace, and are live pointers (like e.g. top pointer in stack implementations) to commits; commit that branch points to (the tip of branch) marks out subset of DAG of revisions: all descendants of given commits - this form a line of development i.e. branch. What is important here is that commit is ...
I agree. The repository that I'm interested in converting has branches all over the place /sandbox/, /sandbox/<username>/*, /stable/MAIN/*, /stable/Features/*, /features/*, /branches/*, etc... Because subversion didn't enforce the convention it was all to easy to ignore when our questionable branching strategy was created. Instead of expecting sub-folders of a particular path to be a branch is there something that we can key off of in the dumpfile? Are copy operations --
Actually it shouldn't be that hard to implement, it it isn't already implemented in svn-fe. We don't need to have copy operations notated in some fashion; it should be enough to tell svn-fe where the top directory of project is in repository tree hierarchy (e.g. that it is at /stable/MAIN/* at revision 1). git-fe can/could use then 'tree' movement detection that 'subtree' merge strategy uses. -- Jakub Narebski Poland --
To clarify, I was saying that there is a "parent" of each SVN commit, in the top-level sense. This can be easily converted into a "whole repository" ("svnroot") tree in git. Of course, this isn't useful for actual work, but it's a good middle-layer, from which other more-useful things can be derived. In terms of converting the svnroot git history into actual branches, there are several options for mapping things. Ignoring merges for a moment, we could (for example) notice when two trees (as in tree objects) are very similar at some point in history, and decide that those are probably branches. It's tedious, but still fairly simple, to walk the history and build a new history consisting only of edits to a subtree (even if the commit messages don't always make sense out of context). It really doesn't matter one lick whether a single svn commit touched multiple generated git commits. Of course, "ignoring merges" is temporary and a total cop-out, but I wouldn't for a moment pretend that converting svn branches into git Also correct. One SVN commit would logically map to several git commits. It's best to think in terms of: ([svn commit] + [svn path]) -> [git commit] (or git tag, if we can get I'm not entirely familiar with the git replace mechanism, but wouldn't that mean that repository git-A (cloned from SVN before the property change) and repository git-B (cloned from SVN after the property change) would be unable to merge with each-other? In my mind, if it would be a rebase when it happens in git-land, it should be a rebase when it happens in Any sufficiently large SVN-tracked project will use all of SVN's features, whether the maintainer remembers or not ;) Certainly it could be a "few and far between" thing, which doesn't need to be handled to get going / usable (especially since creating a fresh clone is so much faster than with git-svn). I don't know the internals of SVN beyond what was mentioned in the manual 5 or so years ago, but I assume you'd need to ...
"Whole repository hierarchy (snvroot) snapshots" are useless without extra work; Git needs "whole project" snapshots for its commits. But the whole long description of "branching" model in Subversion was meant as intro for explanation why there can be mishandled commits in Subversion, which make it impossible to have 1-to-1 SVN revision to Actually as Stephen Bash wrote in his response creating branches in Subversion generates 'copy' operations in svndump... we have to filter We would have to ensure that commits in Git in branch 'foo' are the same as history of 'project/branches/foo' subtree in svnroot in Subversion. Otherwise we would either have different history in Git and in Subversion, I don't think the most common "sane" Subversion merge case would be difficult to translate into merge commit in Git: the svn:mergeinfo property would have common revisions for all affected files/directories. The problem is that like it is possible to mishandle commit like described by Stephen Bash by creating all-branches revision, it is also possible to mishandle merge in Subversion, creating revision where different files are merged from different branches: such thing does not have easy translation to Git commit-level rather than file-level merge tracking. If I remember correctly some of discussion was whether there can truly be irrecovable situation where single SVN revision *must* be mapped into Note that there is problem with possibly changing svn:log, svn:author and svn:date revision properties is only when there is ongoing interaction between Subversion repository (or mirror) and Git repository (or mirror). There is no problem with this issue when doing one-shot conversion. The major problem is that svn:log etc. are _unversioned_ properties (see http://svnbook.red-bean.com/en/1.5/svn.ref.properties.html), so I am not sure if there is a way for Subversion server to tell that some svn:log properties changed. Perhaps there is a log, even if properties ...
There has been brief discussion of that possibility on the Subversion list [1]: "What we might need is an RA call that has the server provide the N last revisions to have undergone revprop edits..." I'm guessing that there is not such a log now but the developers might be open to a patch adding such a log (for the sake of svnsync and Yes, exactly. In some cases, this "git replace" step would have to be accomplished by a separate command (or even "by hand") to get the job done: alice> git clone svn://svn.example.com/ upstream> svnadmin propedit ... bob> git clone svn://svn.example.com/ In this situation, alice and bob have diverging histories, just as if upstream had rewritten history (because, well, upstream has). Now if alice fetches from bob and notices that, then she must do alice> git replace AA BB (or its user-friendly equivalent, or a batch equivalent to search for and handle cases like this). Exactly. Well, one can mitigate the performance problems by running "git filter-branch" every once in a while. :) Regards, Jonathan [1] http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122840/focus=122944 --
Hi Will, Yep, and I'm to blame for that- sorry I didn't CC you earlier. I got confused between "Tomas Carnecky" and "Will Palmer". To avoid this confusion in future, I'd request everyone to display the names they use on the list in the IRC whois information (unless it's a privacy Yeah, that was my plan too originally, but I clearly haven't thought about it enough. I'm currently noting down the various scenarios that the others are quoting -- there are quite a few I hadn't thought about earlier. [...] -- Ram --
