Git supports renames/moves in different way. Instead of recording renames (which has trouble on it's own, for example rename via applying patch) There are trouble with file-ids. Most common example is trouble with file which was created in two branches (two repositories) independently, then branches got merged. Most (all?) file-id based rename detection has trouble with repeated merging of those branches, even if there are no true conflicts. Read Linus post about file-id based rename detection: Message-ID: <Pine.LNX.4.64.0610201049250.3962@g5.osdl.org> http://permalink.gmane.org/gmane.comp.version-control.bazaar-ng.general/18458 Not that contents based rename detection doesn have it's own pitfals: Message-ID: <7virha4cnm.fsf@assigned-by-dhcp.cox.net> http://permalink.gmane.org/gmane.comp.version-control.git/31899 -- Jakub Narebski Warsaw, Poland ShadeHawk on #git -
Do you mean if the 2 files should be merged into 1 file? If they should be 2 files with different names there is no problem using file identifiers but if they should be merged into one file then I can see that this would cause problems. You would have to delete one of the files and copy its changes into the other which would create conflicts when that file is modified in the other branch. This is a problem if you *only* have file identifiers. But if you tracked both file identifiers *and* content identifiers (as I was trying to say in my first post) this wouldn't be a problem would it? When content is changed you use the content identifiers but when files are changed by renaming or deleting you use file identifiers. To me at least it doesn't seem like it's a choice of one or the other or that one is stupid and the other isn't but that you need them both. bzr uses file ids and git uses content ids. It would be nice if there were an RCS that used both - then you get the best of both worlds don't you? So I don't think you want to use file identifiers to track changes to content (as bzr would do in this case) and you don't want to use content identifiers to track changes to files (as git does, to my understanding, when a file is renamed). Nick -
This can't be fail safe though. I would prefer to also have the option to be able to *explicitly* tell the RCS that a file was renamed and not have it try to detect from the content which is bound to have corner cases that fail. When I know I renamed a file why can't I explicitly tell the RCS and it records the change with the *file identifier*. If I change the content then the change is not recorded with the file identifier but with the line/content identifier. Nick -
You want to tell git about a rename that will never fail to be detected? No problem. $ git mv oldname newname $ git commit The corner cases you speak about are when you rename and edit. For me, I prefer that to be detected as at least the detection algorithm can be tuned - there is no fixing it if the VCS was forced to consider it a rename. When I started using git I was worried about the lack of a rename, but now I realise that it's not needed - it's pointless. The VCS is snapshotting moments in time, that's it. Then by making cleverer and cleverer interpreters of those snapshots you have the potential to do stuff that is far more useful than "just" rename recording. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
Having not used git I can't really say whether git is better than bzr or not in this regard. I know in the kind of development I do the case where a file with the same name has been added independantly in 2 different branches is a pretty rare one. Usually, when it has happened the files should have been 2 separate files with different names anyway - so bzr would have no problem with this. However, renaming a file is pretty common and I would rather be explicit about it and have file name changes easily visible/searchable in my log. Just out of curiosity: How does git handle the case where one file is renamed differently in 2 branches and then the branches are repeatably merged? I know that bzr handles this very well and in various tests I did there were absolutely no repeated conflicts. Would git behave as well in this scenario? Nick -
Ok - I got curious and decided to install git and try this myself. In this test I had a file hello.txt that got renamed to hello1.txt in one branch and hello2.txt in another. Then I merged the changes between the 2 branches. Here is how it looked after the merge in bzr: bzr status renamed: hello2.txt => hello1.txt conflicts: Path conflict: hello2.txt / hello1.txt pending merges: Nicholas Allen 2006-11-28 Renamed hello to hello1 and here's how it looked in git: git status # # Changed but not updated: # (use git-update-index to mark for commit) # # unmerged: hello.txt # unmerged: hello1.txt # unmerged: hello2.txt # modified: hello2.txt # nothing to commit So git is not telling me that I have a conflict due to the same file being renamed differently in 2 branches - well at least not in a way I can comprehend anyway! Whereas bzr made this very clear. Also, in git I ended up with 2 files: ls hello1.txt hello2.txt whereas in bzr there was only one file and I just had to decide which name it was to be given to resolve the conflict. I'm not sure how I should resolve the conflict in git but that's probably just because I am not familiar with it yet and the message it gave was not comprehensible or helpful to me in the slightest. In bzr it was very easy and repeatably merging caused no trouble at all - the name conflict had to be resolved only once. While it was good that git detected my file rename (although this was not hard as the contents did not change at all) the process in bzr was *much* smoother and more user friendly than it was it git. When you have conflicts I think it's especially important that the RCS inform you of what is really happening so you do not make mistakes. Bzr was much more informative than git was and told me exactly why there was a conflict and made it easy to resolve it. This situation is a pretty common one and it seems to me that git's content based approach is not as useful in this ...
Ehh. It told you exactly what happened when you actually did the merge, didn't it? Yeah, "git status" won't tell you _why_ it results in unmerged paths, but the merge will have told you. You must have seen that, but decided to just ignore it and not post it, because it didn't support the conclusion you wanted to get, did it? There are lots of reasons why "git status" may tell you that something isn't merged. The most common one by far being an actual data conflict, not a name conflict. The reason for why something conflicts is always told at merge-time. Linus -
Except when you are doing a large merge, your terminal scrollback is really short, and there's a lot of conflicts. Then you can't see what merge said about any given file. :-( Fortunately its easy to back out of the merge and redo it with large enough scrollback, or redirecting it to a file for later review, but its annoying that we don't save that information off for later review. -- Shawn. -
Heh. Which is partly why I just do "git diff", which usually tells me what is up, or "git log --stat --merge", which is usually even better. I've never actually had to scroll up. [ But I'll also admit that I used to have a "xterm*savedlines=5000" in my .Xdefaults, and it might be worth it for some people. I haven't actually needed it with git, because the _real_ reason for it used to be applying patch-sets, and I've made sure that the git patch-application is so robust that I never need to go back and look for reasons for conflicts - if something conflicts, it just _stops_ and undoes the whole patch instead of continuing to apply the rest or leave the already-applied part applied. ] Although I agree that we could probably also improve "git status" output, especially as I doubt it has been tested much. People don't tend to use "git status" very much, I suspect - the most common usage is not in "git status" itself, but simply as the commit message template, and that one obviously cannot have any unmerged stuff at all (since then we'd refuse to even go as far as asking for a commit message in the first place). Figuring out that the reason for a conflict is a name clash is not necessarily possible after the merge, though: it's really up to the merge policy to decide to merge a file cleanly or not, and the "Why" part of why some particular merge policy decided not the resolve a file is really internal to the policy, and not externally visible in the tree itself. (But we can certainly see whether it was a pure content conflict or whether it had some component of a name clash by just looking at what stages we have for a name: so we could at least separate out the causes I personally find "git log --merge" to be a huge timesaver. But I have to say, I don't think I've seen more than one or two name conflicts ever, and almost all of the true issues tend to be just regular data conflicts. So that's what I personally care about ...
I didn't do this deliberately - it's just because merge spewed out a whole load of stuff at me that I didn't understand and therefore overlooked the conflict message in it. I wasn't expecting to see it here anyway and was hoping for a short and informative summary that I would understand when I did a status. Also what happens if I loose the messages because they scrolled off screen or the power goes down, I need to reboot for some reason, or I don't have time and want to shutdown my computer restart another day and resolve the conflicts then? All useful conflict status is lost isn't it? That's why I expected git status to tell me this in some understandable manner and was not even expecting it to only be in the merge output.... Nick -
I'd suggest just re-doing the merge. Something like git reset --hard git merge -m "dummy message" MERGE_HEAD will do it for you (that's the new "nicer syntax" for doing a merge, in No, it's actually there, but "git status" doesn't really explain it to you. The go-to command tends to be "git diff", which after a merge will not show anything that already merged correctly (because it will have been updated in the git index _and_ updated in the working tree, so there will be no diff from stuff that auto-merged). So any output at all after a failed merge from "git diff" generally tells you exactly what failed. But since 99%+ of all merge conflicts are data-conflicts, I suspect the output is mostly geared towards that. The other useful tools to be used are "git log --merge" (explained in a separate mail) and for people like me who like the git index and grok it fully, doing a git ls-files --unmerged --stage is probably what I'd do (but I have to admit, that is _not_ a very user-friendly interface - you need to not only have understood the index file, you actually need to understand it on a very deep level). "git status" is really used to be just a stupid around "git ls-files" (it's now largely a built-in), but it was really _so_ stupid that it doesn't really try to explain what it does - it's more like a simplified version of ls-files with some of the information pruned away, and other parts in a slightly more palatable format ;) So improving "git status" might mean that some people could avoid having to learn about the index file details ;) Linus -
Side note, to clarify: in the _simple_ cases it's all actually there. I can well imagine that in more complex cases, involving multiple different files, you may well want to re-do the merge and let the merge tell you why it refused to merge something. So the index, for example, contains just a "final end result" of what the merge gave up on, and while for a simple rename conflict like your example you could certainly see that directly from the index state (and thus we could, for example, have a "git status" that talks about it being a filename conflict), if you have a criss-cross rename, the index itself doesn't really tell you _why_, and it could look superficially like a data conflict. In such a case, you'd really have to either go back to the merge itself to see what happened, or you'd use the "git log" thing and just work it out from there (ie you can ask "git log" to tell you about any renames as they happened etc). I don't think I've actually hit a complex enough merge to need this yet, but the graphical tools should help too, ie "gitk --merge" should give you everything that "git log --merge" gives you (ie just the commits that aren't common, and simplified to just the ones that matter for the unmerged filenames in the end result). I can well imagine that being useful too. So the tools are certainly there. "git status" just isn't necessarily the best one (or the best that it could be, for that matter).. Linus -
I guess I hit a limitation in the output of status as opposed to a limitation in what git can do ;-) Nick -
Hi, I think it is something different altogether: you learnt how to use CVS, and you learnt how to use bzr, and you are now biased towards using the same names for the same operations in git. I actually use git-status quite often, just before committing, to know what I changed. But I will probable retrain my mind to use "git diff" or even "git diff --stat", because it is more informative. As for your scenario: There really should be a "what to do when my merge screwed up?" document. Ciao, Dscho -
It would be nice to have git-resolved (or git-resolve) wrapper around git-update-index similar to git-add, git-mv, git-rm which would mark file as resolved, without need for git-update-index, git-add and git-rm even in the case of CONFLICT(rename/rename). Although I'm not sure if it could work in all cases in the simple form of "git resolved <file>", e.g. in the case of CONFLICT(add/add). By the way, I wonder if git can detect the case when the same (or nearly the same) file was added in two different branches under different filename... -- Jakub Narebski Poland -
I have a few examples scenarios and some notes on cleaning up after failed merges in my slides from the presentation I did at OLS last summer. Feel free to look at it off of www.jdl.com! jdl -
Hi, This is actually the most meaningful argument for not hiding the index. Usually I explain it to people as a "staging area" standing between your working directory, and the next committed state. But I will start explaining the index with "what if your merge failed?". Ciao, Dscho -
The thing is, the staging area is needed for a lot more than just merges. Every single SCM has one, because even something as _trivial_ as "commit all files" actually needs it. People don't just always think about it, and the git staging area is "bigger" than most others. Most other SCM's have a staging area that is just a list of filenames (nobody thinks about it, but "commit everything" doesn't actually commit everything at all - it just commits everything /in the list of files that the SCM knows about/). Git's staging area is just more complete than most other SCM's. It contains not just the list of filenames, but their permissions too (where a lot of other SCM's *cough*CVS*cough don't do permissions at all), but also their content, and in the case of a merge conflict, the content of the base version and the two branches to be merged. So the index really _is_ required for pretty much all operations (including very much "git commit -a", if only because of the filename list), but yeah, if you start by talking about merge conflicts, maybe people understand WHY it's also important to actually stage the _contents_ of a file too (multiple times, in fact, for a merge conflict), not just its name. So most of the time, when you use git, you can ignore the index. It's really important, and it's used _all_ the time, but you can still mostly ignore it. But when handling a merge conflict, the index is really what sets git apart, and what really helps a LOT. I've used other systems, but the git handling of merge conflicts really is superior. Other SCM's think that the merge algorithm is interestign and important, and that's bullshit. Merge algorithms are largely trivial and uninteresting. The interestign and important thing is to just handle failures well, and git does that _really_ well. Linus -
Actually, people (at least me) dislike the index because in the most common operations (status, diff, commit), they have to know that the command doesn't actually display all their work but just the 'indexed' part of it. For people used to cvs, svn and other systems it would be nicer if diff -a and commit -a (and possibly other commands) were the default. index is of course necessary during merging, ... and as a speed optimization for applying patches when you know the working copy is clean. Mark -
Hi, No. It does display all your work. However, as Linus pointed out, if there are automatically merged entries without conflicts, it will not display them. Which is sane! And yes, you can hide some modifications by putting the modified file into And what exactly do you think is happening when "cvs add" and "svn add" did _not_ really add the file to the repository, but only a subsequent I think that it is one major achievement of git to make clear and sane definitions of branches (which are really just pointers into the revision graph), and the index (which is the staging area). Ciao, Dscho -
Something resembling index is needed anyway: 1) for "commit all changed files" to prepare list of files to commit, excluding ignored files, 2) to mark files as "to be added" or "to be removed" (well, git index could be a little bit smarter here in marking "intent to add"), 3) as a place for doing the merging. Git just doesn't hide it. I agree that git definition of branches, and git not hiding index is it's advantage... and disadvantage to those who learned using version control on other SCM. -- Jakub Narebski Poland -
I don't see your point, really. Nothing forces you to change the index. None of the normal operations do that, for example, and you really have to _explicitly_ ask git to update the index for you. So you can really think of it as a better list of names than what CVS and others maintain for you. It's exactly the same as the CVS "Entries" file, except it's got capabilities that CVS will never have - tracking not just the filename, but the merge status, the permissions, and the actual contents of an entry. And by default, and in the absense of any failed merges, you will _never_ Why? I mean really.. Why do people mind the index? If you've not done anything to explicitly update it, and you just write "git commit", it will tell you exactly which files are dirty, which files are untracked, and then say "nothing to commit". Maybe we shouldn't even say "use git-update-index to mark for commit", we should just say "use 'git commit -a' to mark for commit", but the point is, there really is no downside. So you forget to mention which files to commit, what's the downside really? It tells you what is up, and you can just mention the files explicitly, or use "-a" to say "ok, commit everything that is dirty", and it doesn't really get any simpler than that. And the ADVANTAGES of the index are legion. You may not appreciate them initially, but the disadvantages people talk about really don't exist in real life, and once you actually start doing merges with conflicts, and fix things up one file at a time (and perhaps take a break and do something else before you come back to the rest of the conflicts), the index saves your sorry ass, and is a _huge_ advantage. Similarly, it _allows_ you to do things that just a list of files never allows you to. You don't _have_ to use it to mark individual files as being ready to be committed, but you _can_. It's nothing that you need to know or worry about if you're not aware of the index, but it's a capability that ...
To start with, that message confuses a lot of new users. "What do you mean there's nothing to commit? I just made changes. And I know you noticed them because you just mentioned the names of the files with the changes to me!". So at the very least, there's some missing guidance as to how to get from the "nothing to commit" stage to actually commit the files the user was trying to commit when they typed "git commit" in the first Yes, I submitted a patch for this. I don't think Junio picked it up because it got him thinking about all the other situations where "git status" doesn't give as much guidance as it should Even with that, the user has to go through the process of: git commit "hmm... why didn't that work" read message git commit -a That's not a _huge_ problem, but it is a little road-bump that a lot of people meet on their first attempt at git. In the thread on the fedora mailing list that prompted my first "user-interface warts" and the patch I mentioned above, the process was worse: git commit "hmm... why didn't that work" read message git update-index git commit "crap... it still didn't work even when I did what it told me to do" Here's the original version of that report: In none of these recent threads have I been arguing disadvantages of the index. I'm really just trying to remove one small hurdle that does trip up new users, (see above). I'm not trying to introduce any large conceptual change into how git works, nor even what experienced Let's help people do exactly that by making the behavior of "git commit -a" be the default for "git commit". -Carl
I meant "good" there for anyone confused, (I'm not sure how that slipped passed my spell-checker). -Carl
Maybe we could do that _only_ if the index matches HEAD, and otherwise keep current behavior? So people who don't care about the index won't get tripped up, and when -- best regards Ray -
I thought of that tonight and almost suggested it myself. It would be an attempt to satisfy both "sides" of the debate without either side having to fight with a default they didn't like or configure it away. I did wonder if the powers that be would find it a bit too magic, (the problem with magic things is that they can sometimes be quite confusing when they don't do exactly what you want). But this might just work. It wouldn't be too bad to document, (we already have several commands that change slightly if the index doesn't match, (often by just refusing to do anything in a dirty tree)). And, significantly this would allow for documenting the simple sequence of: # edit file git commit in the tutorial while also allowing what Junio wanted: git update-index file git commit with the behavior of, ("I already said I wanted to do a staged commit when I explicitly updated the index, so don't make me say anything special again when I go to commit"). Can we really get the best of both worlds here? -Carl
I have been(silently) following the git commit discussion and started being fully on the side of git commit -a being the default, but was slowly moving over towards the git commit -i being the default camp. This post seems like a Eureka moment - chew over the problem long enough and someone comes in from left field with an off the wall remark that suddenly clarifies everything. -- Alan Chandler http://www.chandlerfamily.org.uk -
I hate the if clause. Suppose I prefer update-index way, I would have to check whether HEAD matches index everytime I do a commit to make sure it won't do the other way. Either -a or -i is the default, not if please. By the way I do use the update-index way, but vote -a by default. I don't mind adding " -i" after every commit commands. -- Duy -
No you won't. If you don't use update-index, then index will match HEAD and you will commit changes in the working tree. That is the way for newbies As soon as you do the first update-index the index will no longer match HEAD, so commit will do the same as it does now. And if you are not sure which you have done then presumably you do what you do now, or git commit -a or git commit -i as you need. -- Alan Chandler http://www.chandlerfamily.org.uk -
Plus, one assumes, the git-generated comments in the commit message will tell you what kind of commit it has decided to do. I like this suggestion a lot. Thinking back over my git usage recently, which has included both styles of commits (though mostly -a ones), I think this would have done the right thing by default in every case. -Steve -
Hi, So many people spoke for it, it's time I crash the wedding. From a usability viewpoint, it is a horrible convention. The user has to remember too much of the side effects to handle the commit operation. The function of the program would no longer be dependent on the command line arguments and your config, but _also_ on something as volatile as the index. You would literally end up asking "did I change the index?" _everytime_ before you commit. And remember, even a simple "git add" changes the index! (Why it does is brutally clear once you grasp the concept of the staging area.) Worse, doing a "git commit --amend" should _not_ automatically add "-a" _even_ if the index matches the HEAD, since it is quite possible that you had a typo in the message you want to fix up. And quite possibly other options would not want that either. But here's an idea: tell the user that she has to tell git-commit which files she wants committed. Yes! That's it. Just tell it the friggin' files. And if you are a lazy bum, and want to commit _all_ modified files, git has a nice shortcut for ya: "-a". Ciao, Dscho -
It reminds me Microsoft Office Assistant :-) Let's make "git assistant mode" that tries hard to guess user's desires and give them guidance. Once they get used to git, they can disable that mode and back to "plain git". -- Duy -
Hi, See git-gui from Shawn. It should really help new users with a graphical user interface. Ciao, Dscho -
Sounds sane. Especially if we couple it with a hint for the user to use "commit -a" when he/she wants to do blanket commits. So in essence that would mean: If no pathspecs are given and index matches current HEAD, print out "Nothing to commit but changes in working tree. Assuming 'git commit -a' and then act accordingly. Carl, do you think that would satisfy the desires of your RedHat peers? Always doing '-a' by default is terribly wrong for those of us who actually use partial commits a lot, and it would also rob git of a lot of its power. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Unless you do "git update-index" (and thus are already using the index) on any files, "git diff" shows you exactly the changes between your last commit and the working tree. There's nothing magic, odd or confusing about it, no matter which scm you come from. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Until you make the mistake of reading the git-diff man page, at which
point the novice git user runs screaming into the night...
Show changes between two ents, an ent and the working tree, an
ent and the index file, or the index file and the working
tree. The combination of what is compared with what is
determined by the number of ents given to the command.
* When no <ent> is given, the working tree and the index file
is compared, using git-diff-files.
* When one <ent> is given, the working tree and the named tree
is compared, using git-diff-index. The option --cached can
be given to compare the index file and the named tree.
* When two <ent>s are given, these two trees are compared using
git-diff-tree.
Looking at the man page, it does raise one interesting question ---
So exactly what is the difference between Treebeard and Quickbeam?
And how many working trees do we need before we call it an Entmoot? :-)
- Ted
-
Not so rare in a true DSCM scenario where people submit patches via email or a bug tracker. Say two developers apply the same patch to their trees, and one of them tweaks it a bit. While I don't personally do kernel development, I understand that's reasonably common in the linux dev team. It also happens quite a bit if you cherry pick across branches patches that create files. In such cases, I find GIT does the right thing 99% of the time, including spotting situations where the file got added at different patchlevels in different branches. cheers, martin -
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] |
