Hello I'm coming from cogito. There you can run: cg-add $file ; cg-rm $file and everything is as before; it adds the file to the directory index/cache, and just removes it again from the latter. Whereas with git, git-add $file; git-rm $file is giving the error error: '..file..' has changes staged in the index (hint: try -f) And sure enough, git rm -f $file will remove the file from the index, but also unlink it from the directory. (Ok, I did remember that cogito's -f option is unlinking the file, so I was cautious and didn't try it on an important file, but still...) Turns out that git rm -f --cached $file will do the same action as cg-rm $file. Why so complicated? Why not just make git-rm without options behave like cg-rm? (Or at the very least, I'd change the hint to say "try -f --cached".) Christian. -
It is probably a matter of taste. Personally, I am really upset by this behaviour that cvs, cogito, stgit and others share, which forces me to issue 2 commands to really delete a file from version control and from the filesystem. Do you really need to undo an add more often than you need to remove a file from version-control ? It may be worth, however, to make things easier. Maybe "git add --undo foo" would be a solution ? Not sure we'd want to add --undo to many git commands, though. Opinions ? Best regards, -- Yann -
It doesn't force you to issue 2 commands: the -f option to cg-rm unlinks the file for you. So you only have to type an additional "-f". Yes it's probably (partly) a matter of taste: in my bash startup files I have mv aliased to mv -i and rm to rm -i, so that it asks me whether I'm sure to delete or overwrite a file. If I know in advance that I'm sure, I just type "rm -f $file", which then expands to "rm -i -f $file" where the -f overrides the -i. cg-rm -f just fits very well into this scheme (the only difference being that "cg-rm $file" doesn't explicitely ask me whether I also want the file to be unlinked). (BTW note that usually for removing a file I use a "trash" (or shorter alias "tra") command, which moves it to a trash can instead of deleting; so I use "tra $file" by default, and only for big files or when I'm sure I immediately want to delete them, I use rm, and then if the paths are clear I add the -f flag, if not (like globbing involved), I don't add the -f and thus am asked for confirmation.) If I could alias the git-rm command so that the default action is the reverse of git-add and adding an -f flag removes it from disk, that This doesn't sound very intuitive to me (and I couldn't fix it with an alias). I don't per se require undo actions. I just don't understand why git-rm refuses to remove the file from the index, even if I didn't commit it. The index is just an intermediate record of the changes in my understandings, and the rm action would also be intermediate until it's being committed. And a non-committed action being deleted shouldn't need a special confirmation from me, especially not one which is consisting of a combination of two flags (of which one is a destructive one). Christian. -
I'd say it does so, so you won't loose any uncommitted changes without knowing it - and "git add -f" is available when you have checked that It already works as such: it will warn you if you have already staged the file in the index, but it has not been committed, in which case the data would be lost as well: $ echo foo > bar /tmp/test$ git rm bar fatal: pathspec 'bar' did not match any files /tmp/test$ git add bar /tmp/test$ git rm bar error: 'bar' has changes staged in the index (hint: try -f) That is, "git rm" will only ever remove the file without asking, when it is safe do so, in that you can retrieve your file from history. Or do you think of another way, in which more safety would be needed ? Best regards, -- Yann -
Defaulting to --cached would be an obvious way to avoid data-loss. _At least_, mentionning --cached in the error message in case of staged changes would be a considerable step forward. At the moment, the non-expert user will have difficulties to unversion the file without deleting it. I just see it as $ git rm foo error: 'foo' has changes staged in the index (hint: to hang yourself, try -f) $ _ -- Matthieu -
Hi, What's so wrong with our man pages? You know, there have been man hours invested in them, and they are exclusively meant for consumption by people who do not know about the usage of the commands... Hth, Dscho -
What's wrong is just that I shouldn't have to read a man page to avoid data-loss. I should have to read them to do non-trivial things, for sure. Useability is not just about documenting surprising behaviors, it's really about avoiding them. -- Matthieu -
Hi, Okay, Mr Moy. How did you learn that "rm" leads to data-loss? Because it does. Hmm. How did you expect then, that git-rm does _not_ lead to data loss? If in doubt, you _have_ to read the manual. Especially if the tool is powerful. Ciao, Dscho -
Glad to be called by my name. Is it a tradition here, or a way to make It obviously does, and I can't imagine any other behavior than Because there are tons of possible behaviors for "$VCS rm", and I'd expect it to be safe even if VCS=git, since it is with all the other VCS I know. What's wrong with the behavior of "hg rm"? What's wrong with the behavior of "svn rm"? What's wrong with the behavior of "bzr rm"? (no, I won't do it with CVS ;-) ) None of these commands have the problem that git-rm has. -- Matthieu -
Hi, Which proves exactly my point. There are a ton of interpretations that Guess what. I do not know how they operate! I have no idea what the behaviour of the commands you mentioned is. So before I would answer (if they were not rethoric questions), I would actually really read the man page to know what they are supposed to do. Ciao, Dscho -
Hm. They all behave roughly the same: They unversion the file and unlink it, unless it is modified, in which case they unversion it and leave it alone. Now git has the extra complexity that index contains also content of the file. But the behaviour can be easily adapted like this (HEAD =3D version in HEAD, index =3D version in index, tree =3D version in tree): - if (HEAD =3D=3D index && index =3D=3D version) unversion and unlink - else if (HEAD =3D=3D index || index =3D=3D version) unversion - else print message and do nothing Would you consider that a sane behaviour? --=20 Jan 'Bulb' Hudec <bulb@ucw.cz>
Yes. Roughly, they'll ask you a --force flag whenever you'd risk
data-loss. bzr gives you the choice between --force and --keep (that
^^^^- I suppose you meant "version"
here since you don't use
Just to be more precise:
- if (HEAD == index && index == version) unversion and
* if (--cached is not given) unlink
To me, that's a sane behavior.
It makes a few senarios easy and safe, like this:
$ git add <whatever>
# Ooops, no, I didn't want to version this one :-(
$ git rm some-file
# Cool, I just cancelled my mistake without loosing anything ;-)
One benefit is: you don't have to use "-f" for a non-dangerous
senario. That seems stupid, but for the plain "rm" command, the "-rf"
is hardcoded in the fingers of many unix users, and I know several
people having lost data by typing it a bit too mechanically (with a
typo behind, like forgetting the "*" in "*~" ;-).
I'll try writting patch for that if people agree that this is saner
that the current behavior.
--
Matthieu
-
Here's a first attempt (I'm still not familiar with the git codebase, so the patch is probably not so good). Note: currently, git-rm still shows those "rm '...'" messages on stdout. AAUI, they were actually useful at a time when git-rm didn't actually remove the files, and people actually ran the "rm" commands after. They can probably be removed now, but that's another topic. From f4f4aa047b2b9050d968704d1f2db07b2a1a79cc Mon Sep 17 00:00:00 2001 From: Matthieu Moy <Matthieu.Moy@imag.fr> Date: Sun, 8 Jul 2007 19:27:44 +0200 Subject: [PATCH] Make git-rm obey in more circumstances. In the previous behavior of git-rm, git refused to do anything in case of a difference between the file on disk, the index, and the HEAD. As a result, the -f flag is forced even for simple senarios like: $ git add foo # oops, I didn't want to version it $ git rm -f [--cached] foo # foo is deleted on disk if --cached isn't provided. This patch proposes a saner behavior. When there are no difference at all between file, index and HEAD, the file is removed both from the index and the tree, as before. Otherwise, if the index matches either the file on disk or the HEAD, the file is removed from the index, but the file is kept on disk, it may contain important data. Otherwise, that's an error, and git-rm aborts. The above senario becomes $ git add foo $ git rm foo # back to the initial state. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> --- Documentation/git-rm.txt | 9 ++++--- builtin-rm.c | 55 ++++++++++++++++++++++++++++++++++++--------- 2 files changed, 49 insertions(+), 15 deletions(-) diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt index 78f45dc..180671c 100644 --- a/Documentation/git-rm.txt +++ b/Documentation/git-rm.txt @@ -11,10 +11,11 @@ SYNOPSIS DESCRIPTION ----------- -Remove files from the working tree and from the index. The -files have to be identical to the tip of the branch, and no -updates to its contents ...
Hi, This is not really a good patch title. Since it only obeys your particular understanding of what it should do. You are changing However, if some of the files are of the first kind, and some are of the This is wrong, too. Yes, it works. But it really should be I suspect that this case does never fail. 0 means success for remove_file(). Not good. You should at least have a way to ensure that it removed the files from the working tree from a script. Otherwise there Style: the old code set and used "path" for readability. You should do the same (with "file", probably). Additionally, since this changes semantics, you better provide test cases to show what is expected to work, and _ensure_ that it actually works. Ciao, Dscho -
I'm not sure whether this is really wrong. The things git should really care about are the index and the repository itself, and the proposed behavior is consistant regarding that (either remove all files from the index, or remove none). I'm not opposed to your proposal, but I'd like to have other You don't need to argue, that was a typo. My code is definitely wrong, I've changed it to have exit_status = 1 if git-rm aborted before starting, and 2 if git-rm skiped some file removals (and of course, 0 Sure. I forgot to mention it in my message, but I wanted to have feedback before getting into the testsuite stuff. I'm posting the updated patch for info, but it should anyway not be merged until * We agree on the behavior when different files have different kinds of changes * I add a testcase. From f39ae646049b95b055e34da378ea470ef3f3caef Mon Sep 17 00:00:00 2001 From: Matthieu Moy <Matthieu.Moy@imag.fr> Date: Sun, 8 Jul 2007 19:27:44 +0200 Subject: [PATCH] Change the behavior of git-rm to let it obey in more circumstances without -f. In the previous behavior of git-rm, git refused to do anything in case of a difference between the file on disk, the index, and the HEAD. As a result, the -f flag is forced even for simple senarios like: $ git add foo $ git rm -f [--cached] foo This patch proposes a saner behavior. When there are no difference at all between file, index and HEAD, the file is removed both from the index and the tree, as before. Otherwise, if the index matches either the file on disk or the HEAD, the file is removed from the index, but the file is kept on disk, it may contain important data. Otherwise, that's an error, and git-rm aborts. The above senario becomes $ git add foo $ git rm foo Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> --- Documentation/git-rm.txt | 9 +++-- builtin-rm.c | 71 ++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 64 insertions(+), 16 deletions(-) diff ...
Hi, Well, I think it is wrong for the same reason as it is wrong to apply the changes to _any_ file when one would fail. And since "git apply" shares Oh, so you do not take the return value of this function to determine if it has or has not done something with the files? That's a bit confusing. Besides, it would be all the more a reason for a test case, so that I can I think it should be the other way. If you change semantics with the patch, but another revision changes semantics _differently_, it is really easy to get lost. In order to demonstrate what should be true, you have to provide examples. And if you are already providing examples, just wrap them into test_description <description> . ./test-lib.sh ... test_done and prefix each test with "test_expect_success", and you're done. It is Please do not do this. I meant to complain about your OP, but this time it is even worse. The best way to guarantee that a patch gets lost in a thread is to move it _at the end_ of a reply. Please follow the form that you change the subject, still reply, but but the quoted mail with your answers to that text between the "---" and the diffstat. If that text is too long, you should use a separate email for the patch. Ciao, Dscho -
I did, but previously, I kept the code that "die()"s if the first call to remove_file() "fails". In remove_file_maybe(), not removing a file because it's not sure it's safe to delete it is not a failure, so I had to put a "return 0;" here to avoid the fatal error. My first patch had return status !=0 if we tried to remove the file, and it failed. I I had posted the patch for info, but I did expect this one to get lost, since it's definitely not complete. I'll post an updated patch with a testcase and an appropriate subject line within a few days (I don't have time right now). Thanks, -- Matthieu -
OK, I've been thinking about it for some time (not having time to hack can be good, it lets time for thinking instead ;-) ). I'm actually still not convinced that my proposal was wrong, but I think we disagree because we disagree on what is a "failure". I consider leaving the file in the working tree to be just a safety precaution, not a failure, and to me, it's OK to do that only for the files that need it. Fixing my patch by just "applying the same strategy to all files" would be wrong: leaving _all_ the files on disk when just one has local modifications is very misleading, and if the user notices it after running the command, he or she does not always have an easy way to get back to a clean situation (re-running the same command with -f wouldn't work for example). So, I went a shorter way from the current semantics: * Allow --cached in more situations, so that -f is really needed in very particular situation (as I mentionned above, forcing -f too often means the -f gets hardcoded in the fingers, and makes it useless). * Better error message, which points to --cached in addition to -f. That's very close to what bzr does, BTW. Drawback: it still doesn't solve the "rm isn't the inverse of add". The patch is quite straightforward, and will be in a followup email. -- Matthieu -
In the previous behavior, "git-rm --cached" (without -f) had the same
restriction as "git-rm". This forced the user to use the -f flag in
situations which weren't actually dangerous, like:
$ git add foo # oops, I didn't want this
$ git rm --cached foo # back to initial situation
Previously, the index had to match the file *and* the HEAD. With
--cached, the index must now match the file *or* the HEAD. The behavior
without --cached is unchanged, but provides better error messages.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
Documentation/git-rm.txt | 3 ++-
builtin-rm.c | 32 ++++++++++++++++++++++++++------
t/t3600-rm.sh | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index 78f45dc..be61a82 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -14,7 +14,8 @@ DESCRIPTION
Remove files from the working tree and from the index. The
files have to be identical to the tip of the branch, and no
updates to its contents must have been placed in the staging
-area (aka index).
+area (aka index). When --cached is given, the staged content has to
+match either the tip of the branch *or* the file on disk.
OPTIONS
diff --git a/builtin-rm.c b/builtin-rm.c
index 4a0bd93..9a808c1 100644
--- a/builtin-rm.c
+++ b/builtin-rm.c
@@ -46,7 +46,7 @@ static int remove_file(const char *name)
return ret;
}
-static int check_local_mod(unsigned char *head)
+static int check_local_mod(unsigned char *head, int index_only)
{
/* items in list are already sorted in the cache order,
* so we could do this a lot more efficiently by using
@@ -65,6 +65,8 @@ static int check_local_mod(unsigned char *head)
const char *name = list.name[i];
unsigned char sha1[20];
unsigned mode;
+ int local_changes = 0;
+ int staged_changes = 0;
pos = cache_name_pos(name, strlen(name));
if (pos < ...This does make more sense, but there are still some inconsistencies. Is it OK to lose content that is only in the index, or not? If it is OK, then --cached shouldn't need _any_ safety valve (and after all, anything you remove in that manner is recoverable with git-fsck until the next prune). If it isn't OK, then you are not addressing the cases where git-rm without --cached loses index content (that is different than HEAD and the working tree). -Peff -
I'd say it isn't OK. At least, that's what the previous git-rm Either I didn't understand your question, or the answer is "yes, I do.". The behavior without --cached is not modified, except for the error message, and the previous was to require -f whenever the index doesn't match the head, *or* doesn't match the file. So, without --cached, you need to have file=index=HEAD to be able to git-rm. If I missunderstand you, please, provide a senario where my patch doesn't do the expected. -- Matthieu -
Right, my point was that there is a case where running without --cached could lose content: when there is no working tree file. However, thinking about it more, I recall that Junio made the point that allowing that behavior means the CVS idiom of "rm file; git-rm file" will just work. Not that that was a problem you introduced; I merely wanted to push for total consistency rather than just handling --cached. But I think the non --cached behavior is actually right now, so let me retract my complaint. And assuming the "git-rm when no working tree file" current behavior is OK, then I think your patch removes the last consistency problem that I mentioned in my state table here: http://article.gmane.org/gmane.comp.version-control.git/51449 So in a round-about way, I totally approve of your patch. Sorry for the confusion. -Peff -
Although I would not be using it often myself, I think this would make "git rm" more pleasant to use. Thanks for the patch, and my thanks also go to people who commented on the patch. -
Having said that, I think this comment is not quite right.
+ else if (!index_only) {
+ /* It's not dangerous to git-rm --cached a
+ * file if the index matches the file or the
+ * HEAD, since it means the deleted content is
+ * still available somewhere.
+ */
Personally I do not think "rm --cached" needs any such "safety",
even though I'll keep the check for now, primarily because
loosening the restriction later is always easier than adding new
restriction. I really do not think this is about protecting the
user from "deleted content is not available anywhere else".
In this sequence:
edit a-new-file
git add a-new-file
edit a-new-file
git add a-new-file
we do not complain, even though we are *losing* the contents we
earlier staged. If you replace the second "git add" with
"git-rm --cached", the sequence should work the same way. In
either case, you are working towards your next commit, and most
likely are doing a partial commit (iow, your working tree does
not match any of the commit you create in the middle). Earlier
you thought you would want one state of the file in the next
commit, but now you decided against putting that new file in the
first commit in the series. You may make further updates to the
index and would make a commit, but after making the commit, your
working tree still has "a-new-file" and you can add the contents
from it for the later commit.
-
I agree that this is something you can argue about. But in this case, the behavior without -f should be changed too. If the file matches HEAD, then "git-rm file" should work, regardless of the index then (but this situation is less frequent). In any case, the situation where you might lose content in the index by doing git-rm are rare: it means you started working on a file, did "git-add" at least once, and edited the file again later, and then decided you wanted to remove the file. So, requiring the -f flag in those situation is not a real problem, even if the situation is slightly-dangerous-but-not-quite-so. I'm willing to work on another patch on top of this one if there's an agreement on a better semantics. This one was about fixing something which was IMHO wrong, but doesn't necessarily achieve perfection ;-). -- Matthieu -
I'm really realising that git-rm $file # where $file *has* been committed previously does remove *and* unlink the file. (cg-rm does unlink only with the -f flag, as said.) So there's no -f flag in normal git-rm usage. It thus has a different meaning, namely "force the operation pair of removing from index and unlinking", not "force this operation also onto the checked out files" as is the case with cogito. So I now understand better why they invented the -f flag to git-rm for the case you're mentioning above and why the hint doesn't warn about it's danger, since git-rm is always dangerous. (Ok, as is "rm" without the "-i"; I just found it normal that cogito behaved like my "-i" setup.) Regarding the issue of "lost files" because they have been created, added, and removed again before committing: as far as I remember this has never happened to me with cogito. I commit often, so if I add a file or a few, in most cases I commit just this fact (that they have been added), before doing more fancy stuff. I'm maybe used to thinking in database terms, work that isn't committed is lost. So if I create a file and add it, in my brain the "attention, uncommitted work" flag is on, and it usually doesn't happen that I later erroneously think the work has been committed when in fact it isn't. (I can always check with a quick cg-status (which shows the files as "A", which makes them stand out better than in the git-status output)). Just before writing this mail I had a case where I wanted to remove a file from versioning control, but keep it on disk (I used git-rm and that's how I learned that it really also unlinks the local file without asking(*)). Note that this has not been an undo action; the file has been committed previously. (Well it's not safe if you want to remove the file *from the index* and I think we have just two different points in our view where we think safety matters. Regarding the man pages: yes the git-rm man page is fine, and it's nice to see the ...
Yes, git-rm is used in several situations, and the idea is that it should behave safely in all situations; that is, without -f you can't delete any data that can't be recovered from your history (but maybe that means we shouldn't suggest -f in so cavalier a fashion). Each file has three "states" that we care about: in the HEAD (H), in the index (I), and in the working tree (W). Let's say you call 'git-rm'. Here's a table of possibilities (A is some content "A", B is some content not equal to "A", and N is "non-existant"): H I W | ok? | why? --------------------------------------------------- * N * | no | file is not tracked N A N | ? | currently ok, but 'A' recoverable only through fsck N A A | no | 'A' recoverable only through fsck N A B | no | local modification 'B' would be lost A A N | yes | 'A' recoverable through history A A A | yes | 'A' recoverable through history A A B | no | local modification 'B' would be lost A B N | ? | currently ok, but 'B' recoverable only through fsck A B A | no | 'B' recoverable only through fsck A B B | no | 'B' recoverable only through fsck B * * | | equivalent to H=A With --cached on, it is a little different: H I W | ok? | why? --------------------------------------------------- * N * | no | file is not tracked N A N | ? | currently ok, but 'A' recoverable only through fsck N A A | ? | currently not ok, but 'A' still available in W N A B | no | 'A' recoverable only through fsck A A N | yes | 'A' recoverable through history A A A | yes | 'A' recoverable through history or working tree A A B | ? | currently not ok, but 'A' still available in H A B N | ? | currently ok, but 'B' recoverable only through fsck A B A | no | 'B' recoverable only through fsck A B B | ? | currently not ok, but 'B' still available in W B * * | | equivalent to H=A So it looks like our safety valve is a bit overbearing in a few situations, and still misses some situations where data has to be pulled out of the database ...
These were explicitly done per request from git-rm users (myself
not one of them) who wanted to:
rm the-file
git rm the-file
sequence not to barf. I suspect they were from CVS background
who are used to the SCM that complains if you still have the
file in the working tree when you say "scm rm".
I personally do not think we would need any safety check for
"git rm --cached", as it does not touch the working tree. If
one cares about the differences among three states, one would
not issue "rm --cached" anyway. The only reason "rm --cached"
is used is because one _knows_ that any blob should not exist at
that path in the index.
-
It depends on how we want to define "lost" data. In many cases, we are protecting against losing content that will still be available until the next git-prune. Should our safety valve protect against that case, or should it not? We are totally inconsistent. The main one for --cached, of course, is when that content exists _only_ in the index, but no longer in the working tree (!A A N or !A A B). You really should be using regular git-rm (in the first case, since you are saying "I don't want this file anymore") or git-add (throw out the old data, use my new version). OTOH, clearly git-add can "lose" data in this way as well, since a "modify, git-add, modify, git-add" will "lose" any reference to the index state after the first add. So maybe that is not worth worrying about at all (in which case our safety valve is too strict in many places). We could also issue a warning when "losing" reference to data that is in the object db, which would include the sha1; in that case, an immediate How about: git-add foo echo changes >>foo # oops, I don't want to commit foo just yet git-rm --cached foo but in that case, maybe the user doesn't actually _care_ about that intermediate state of 'foo'. -Peff -
Exactly. And not considering that lossage helps us keep our sanity. I think "git rm --cached" falls into the same category. If the user wants to discard what is in the index without losing a copy in the working tree, I think we should let Yes, that is (at least, "used to be") exactly the use case "rm --cached" is supposed to help. Added something prematurely to the index, not ready to commit that part of the changes yet. Of course you could do partial commits with "add --interactive" these days, so there is not as much need for this as it used to be anymore. -
OK. So should we _remove_ the safety valve in all cases where we're just losing stuff that's in the index? It is, after all, recoverable. Should there be a warning (I suspect it would get annoying very quickly)? I think this would help by making the use of '-f' more rare, which is the thing that can _really_ screw you, since it turns off the safety valve even for things that aren't recoverable. -Peff -
I personally do not think we would need any safety check for "git rm --cached", as it does not touch the working tree. For non-cached case I think the current behaviour is fine. But I should warn you that I rarely use "git rm" myself. -
