Re: [RFC][PATCH] Re: git-rm isn't the inverse action of git-add

Previous thread: FC6 now has git v1.5.2.2 by Patrick Doyle on Monday, July 2, 2007 - 10:09 am. (4 messages)

Next thread: challenges using fast-import and svn by David Frech on Monday, July 2, 2007 - 12:26 pm. (3 messages)
From: Christian Jaeger
Date: Monday, July 2, 2007 - 11:09 am

Hello

I'm coming from cogito. There you can run:

  cg-add $file ; cg-rm $file

and everything is as before; it adds the file to the directory
index/cache, and just removes it again from the latter.

Whereas with git,

  git-add $file; git-rm $file

is giving the error

  error: '..file..' has changes staged in the index (hint: try -f)

And sure enough, git rm -f $file will remove the file from the index,
but also unlink it from the directory. (Ok, I did remember that cogito's
-f option is unlinking the file, so I was cautious and didn't try it on
an important file, but still...)

Turns out that

  git rm  -f --cached $file

will do the same action as cg-rm $file.

Why so complicated? Why not just make git-rm without options behave like
cg-rm? (Or at the very least, I'd change the hint to say "try -f --cached".)

Christian.

-

From: Yann Dirson
Date: Monday, July 2, 2007 - 12:42 pm

It is probably a matter of taste.  Personally, I am really upset by
this behaviour that cvs, cogito, stgit and others share, which forces
me to issue 2 commands to really delete a file from version control
and from the filesystem.

Do you really need to undo an add more often than you need to remove a
file from version-control ?  It may be worth, however, to make things
easier.  Maybe "git add --undo foo" would be a solution ?  Not sure
we'd want to add --undo to many git commands, though.  Opinions ?

Best regards,
-- 
Yann
-

From: Christian Jaeger
Date: Monday, July 2, 2007 - 1:23 pm

It doesn't force you to issue 2 commands: the -f option to cg-rm unlinks
the file for you. So you only have to type an additional "-f".

Yes it's probably (partly) a matter of taste: in my bash startup files I
have mv aliased to mv -i and rm to rm -i, so that it asks me whether I'm
sure to delete or overwrite a file. If I know in advance that I'm sure,
I just type "rm -f $file", which then expands to "rm -i -f $file" where
the -f overrides the -i. cg-rm -f just fits very well into this scheme
(the only difference being that "cg-rm $file" doesn't explicitely ask me
whether I also want the file to be unlinked). (BTW note that usually for
removing a file I use a "trash" (or shorter alias "tra") command, which
moves it to a trash can instead of deleting; so I use "tra $file" by
default, and only for big files or when I'm sure I immediately want to
delete them, I use rm, and then if the paths are clear I add the -f
flag, if not (like globbing involved), I don't add the -f and thus am
asked for confirmation.)

If I could alias the git-rm command so that the default action is the
reverse of git-add and adding an -f flag removes it from disk, that

This doesn't sound very intuitive to me (and I couldn't fix it with an
alias).

I don't per se require undo actions. I just don't understand why git-rm
refuses to remove the file from the index, even if I didn't commit it.
The index is just an intermediate record of the changes in my
understandings, and the rm action would also be intermediate until it's
being committed. And a non-committed action being deleted shouldn't need
a special confirmation from me, especially not one which is consisting
of a combination of two flags (of which one is a destructive one).

Christian.

-

From: Yann Dirson
Date: Monday, July 2, 2007 - 1:40 pm

I'd say it does so, so you won't loose any uncommitted changes without
knowing it - and "git add -f" is available when you have checked that

It already works as such: it will warn you if you have already staged
the file in the index, but it has not been committed, in which case
the data would be lost as well:

$ echo foo > bar
/tmp/test$ git rm bar
fatal: pathspec 'bar' did not match any files
/tmp/test$ git add bar
/tmp/test$ git rm bar
error: 'bar' has changes staged in the index (hint: try -f)

That is, "git rm" will only ever remove the file without asking, when
it is safe do so, in that you can retrieve your file from history.  Or
do you think of another way, in which more safety would be needed ?

Best regards,
-- 
Yann
-

From: Matthieu Moy
Date: Monday, July 2, 2007 - 1:54 pm

Defaulting to --cached would be an obvious way to avoid data-loss.
_At least_, mentionning --cached in the error message in case of
staged changes would be a considerable step forward.

At the moment, the non-expert user will have difficulties to unversion
the file without deleting it. I just see it as

$ git rm foo
error: 'foo' has changes staged in the index
(hint: to hang yourself, try -f)
$ _

-- 
Matthieu
-

From: Johannes Schindelin
Date: Monday, July 2, 2007 - 2:05 pm

Hi,


What's so wrong with our man pages? You know, there have been man hours 
invested in them, and they are exclusively meant for consumption by people 
who do not know about the usage of the commands...

Hth,
Dscho

-

From: Matthieu Moy
Date: Tuesday, July 3, 2007 - 3:37 am

What's wrong is just that I shouldn't have to read a man page to avoid
data-loss. I should have to read them to do non-trivial things, for
sure.

Useability is not just about documenting surprising behaviors, it's
really about avoiding them.

-- 
Matthieu
-

From: Johannes Schindelin
Date: Tuesday, July 3, 2007 - 5:09 am

Hi,


Okay, Mr Moy. How did you learn that "rm" leads to data-loss? Because it 
does. Hmm. How did you expect then, that git-rm does _not_ lead to data 
loss? If in doubt, you _have_ to read the manual. Especially if the tool 
is powerful.

Ciao,
Dscho

-

From: Matthieu Moy
Date: Tuesday, July 3, 2007 - 6:40 am

Glad to be called by my name. Is it a tradition here, or a way to make

It obviously does, and I can't imagine any other behavior than

Because there are tons of possible behaviors for "$VCS rm", and I'd
expect it to be safe even if VCS=git, since it is with all the other
VCS I know.

What's wrong with the behavior of "hg rm"?
What's wrong with the behavior of "svn rm"?
What's wrong with the behavior of "bzr rm"?
(no, I won't do it with CVS ;-) )

None of these commands have the problem that git-rm has.

-- 
Matthieu
-

From: Johannes Schindelin
Date: Tuesday, July 3, 2007 - 7:21 am

Hi,



Which proves exactly my point. There are a ton of interpretations that 

Guess what. I do not know how they operate! I have no idea what the 
behaviour of the commands you mentioned is. So before I would answer (if 
they were not rethoric questions), I would actually really read the man 
page to know what they are supposed to do.

Ciao,
Dscho

-

From: Jan Hudec
Date: Wednesday, July 4, 2007 - 1:08 pm

Hm. They all behave roughly the same: They unversion the file and unlink it,
unless it is modified, in which case they unversion it and leave it alone.

Now git has the extra complexity that index contains also content of the
file. But the behaviour can be easily adapted like this (HEAD =3D version in
HEAD, index =3D version in index, tree =3D version in tree):
 - if (HEAD =3D=3D index && index =3D=3D version) unversion and unlink
 - else if (HEAD =3D=3D index || index =3D=3D version) unversion
 - else print message and do nothing

Would you consider that a sane behaviour?

--=20
						 Jan 'Bulb' Hudec <bulb@ucw.cz>
From: Matthieu Moy
Date: Thursday, July 5, 2007 - 6:44 am

Yes. Roughly, they'll ask you a --force flag whenever you'd risk
data-loss. bzr gives you the choice between --force and --keep (that
                                  ^^^^- I suppose you meant "version"
                                        here since you don't use

Just to be more precise:

   - if (HEAD == index && index == version) unversion and
       * if (--cached is not given) unlink

To me, that's a sane behavior.

It makes a few senarios easy and safe, like this:

  $ git add <whatever>
  # Ooops, no, I didn't want to version this one :-(
  $ git rm some-file
  # Cool, I just cancelled my mistake without loosing anything ;-)
  
One benefit is: you don't have to use "-f" for a non-dangerous
senario. That seems stupid, but for the plain "rm" command, the "-rf"
is hardcoded in the fingers of many unix users, and I know several
people having lost data by typing it a bit too mechanically (with a
typo behind, like forgetting the "*" in "*~" ;-).

I'll try writting patch for that if people agree that this is saner
that the current behavior.

-- 
Matthieu
-

From: Matthieu Moy
Date: Sunday, July 8, 2007 - 10:36 am

Here's a first attempt (I'm still not familiar with the git codebase,
so the patch is probably not so good).

Note: currently, git-rm still shows those "rm '...'" messages on
stdout. AAUI, they were actually useful at a time when git-rm didn't
actually remove the files, and people actually ran the "rm" commands
after. They can probably be removed now, but that's another topic.


From f4f4aa047b2b9050d968704d1f2db07b2a1a79cc Mon Sep 17 00:00:00 2001
From: Matthieu Moy <Matthieu.Moy@imag.fr>
Date: Sun, 8 Jul 2007 19:27:44 +0200
Subject: [PATCH] Make git-rm obey in more circumstances.

In the previous behavior of git-rm, git refused to do anything in case of
a difference between the file on disk, the index, and the HEAD. As a
result, the -f flag is forced even for simple senarios like:

$ git add foo
# oops, I didn't want to version it
$ git rm -f [--cached] foo
# foo is deleted on disk if --cached isn't provided.

This patch proposes a saner behavior. When there are no difference at all
between file, index and HEAD, the file is removed both from the index and
the tree, as before.

Otherwise, if the index matches either the file on disk or the HEAD, the
file is removed from the index, but the file is kept on disk, it may
contain important data.

Otherwise, that's an error, and git-rm aborts.

The above senario becomes

$ git add foo
$ git rm foo
# back to the initial state.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 Documentation/git-rm.txt |    9 ++++---
 builtin-rm.c             |   55 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index 78f45dc..180671c 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -11,10 +11,11 @@ SYNOPSIS
 
 DESCRIPTION
 -----------
-Remove files from the working tree and from the index.  The
-files have to be identical to the tip of the branch, and no
-updates to its contents ...
From: Johannes Schindelin
Date: Sunday, July 8, 2007 - 11:10 am

Hi,


This is not really a good patch title.  Since it only obeys your 
particular understanding of what it should do.  You are changing 

However, if some of the files are of the first kind, and some are of the 

This is wrong, too.  Yes, it works.  But it really should be 


I suspect that this case does never fail. 0 means success for 
remove_file().  Not good.  You should at least have a way to ensure that 
it removed the files from the working tree from a script.  Otherwise there 

Style: the old code set and used "path" for readability.  You should do 
the same (with "file", probably).

Additionally, since this changes semantics, you better provide test cases 
to show what is expected to work, and _ensure_ that it actually works.

Ciao,
Dscho

-

From: Matthieu Moy
Date: Sunday, July 8, 2007 - 1:34 pm

I'm not sure whether this is really wrong. The things git should
really care about are the index and the repository itself, and the
proposed behavior is consistant regarding that (either remove all
files from the index, or remove none).

I'm not opposed to your proposal, but I'd like to have other

You don't need to argue, that was a typo. My code is definitely wrong,

I've changed it to have exit_status = 1 if git-rm aborted before
starting, and 2 if git-rm skiped some file removals (and of course, 0


Sure. I forgot to mention it in my message, but I wanted to have
feedback before getting into the testsuite stuff.

I'm posting the updated patch for info, but it should anyway not be
merged until

* We agree on the behavior when different files have different kinds
  of changes

* I add a testcase.



From f39ae646049b95b055e34da378ea470ef3f3caef Mon Sep 17 00:00:00 2001
From: Matthieu Moy <Matthieu.Moy@imag.fr>
Date: Sun, 8 Jul 2007 19:27:44 +0200
Subject: [PATCH] Change the behavior of git-rm to let it obey in more circumstances without -f.

In the previous behavior of git-rm, git refused to do anything in case of
a difference between the file on disk, the index, and the HEAD. As a
result, the -f flag is forced even for simple senarios like:

$ git add foo
$ git rm -f [--cached] foo

This patch proposes a saner behavior. When there are no difference at all
between file, index and HEAD, the file is removed both from the index and
the tree, as before.

Otherwise, if the index matches either the file on disk or the HEAD, the
file is removed from the index, but the file is kept on disk, it may
contain important data.

Otherwise, that's an error, and git-rm aborts.

The above senario becomes

$ git add foo
$ git rm foo

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 Documentation/git-rm.txt |    9 +++--
 builtin-rm.c             |   71 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 64 insertions(+), 16 deletions(-)

diff ...
From: Johannes Schindelin
Date: Sunday, July 8, 2007 - 2:49 pm

Hi,


Well, I think it is wrong for the same reason as it is wrong to apply the 
changes to _any_ file when one would fail.  And since "git apply" shares 


Oh, so you do not take the return value of this function to determine if 
it has or has not done something with the files?  That's a bit confusing.

Besides, it would be all the more a reason for a test case, so that I can 

I think it should be the other way.  If you change semantics with the 
patch, but another revision changes semantics _differently_, it is really 
easy to get lost.  In order to demonstrate what should be true, you have 
to provide examples.  And if you are already providing examples, just wrap 
them into

	test_description <description>
	. ./test-lib.sh

	...

	test_done

and prefix each test with "test_expect_success", and you're done.  It is 



Please do not do this.

I meant to complain about your OP, but this time it is even worse.  The 
best way to guarantee that a patch gets lost in a thread is to move it _at 
the end_ of a reply.

Please follow the form that you change the subject, still reply, but but 
the quoted mail with your answers to that text between the "---" and the 
diffstat.

If that text is too long, you should use a separate email for the patch.

Ciao,
Dscho
-

From: Matthieu Moy
Date: Monday, July 9, 2007 - 2:45 am

I did, but previously, I kept the code that "die()"s if the first call
to remove_file() "fails". In remove_file_maybe(), not removing a file
because it's not sure it's safe to delete it is not a failure, so I
had to put a "return 0;" here to avoid the fatal error. My first patch
had return status !=0 if we tried to remove the file, and it failed. I

I had posted the patch for info, but I did expect this one to get
lost, since it's definitely not complete.

I'll post an updated patch with a testcase and an appropriate subject
line within a few days (I don't have time right now).

Thanks,

-- 
Matthieu
-

From: Matthieu Moy
Date: Friday, July 13, 2007 - 10:36 am

OK, I've been thinking about it for some time (not having time to hack
can be good, it lets time for thinking instead ;-) ).

I'm actually still not convinced that my proposal was wrong, but I
think we disagree because we disagree on what is a "failure". I
consider leaving the file in the working tree to be just a safety
precaution, not a failure, and to me, it's OK to do that only for the
files that need it.

Fixing my patch by just "applying the same strategy to all files"
would be wrong: leaving _all_ the files on disk when just one has
local modifications is very misleading, and if the user notices it
after running the command, he or she does not always have an easy way
to get back to a clean situation (re-running the same command with -f
wouldn't work for example).

So, I went a shorter way from the current semantics:

* Allow --cached in more situations, so that -f is really needed in
  very particular situation (as I mentionned above, forcing -f too
  often means the -f gets hardcoded in the fingers, and makes it
  useless).

* Better error message, which points to --cached in addition to -f.

That's very close to what bzr does, BTW.

Drawback: it still doesn't solve the "rm isn't the inverse of add".

The patch is quite straightforward, and will be in a followup email.

-- 
Matthieu
-

From: Matthieu Moy
Date: Friday, July 13, 2007 - 10:41 am

In the previous behavior, "git-rm --cached" (without -f) had the same
restriction as "git-rm". This forced the user to use the -f flag in
situations which weren't actually dangerous, like:

$ git add foo           # oops, I didn't want this
$ git rm --cached foo   # back to initial situation

Previously, the index had to match the file *and* the HEAD. With
--cached, the index must now match the file *or* the HEAD. The behavior
without --cached is unchanged, but provides better error messages.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 Documentation/git-rm.txt |    3 ++-
 builtin-rm.c             |   32 ++++++++++++++++++++++++++------
 t/t3600-rm.sh            |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index 78f45dc..be61a82 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -14,7 +14,8 @@ DESCRIPTION
 Remove files from the working tree and from the index.  The
 files have to be identical to the tip of the branch, and no
 updates to its contents must have been placed in the staging
-area (aka index).
+area (aka index).  When --cached is given, the staged content has to
+match either the tip of the branch *or* the file on disk.
 
 
 OPTIONS
diff --git a/builtin-rm.c b/builtin-rm.c
index 4a0bd93..9a808c1 100644
--- a/builtin-rm.c
+++ b/builtin-rm.c
@@ -46,7 +46,7 @@ static int remove_file(const char *name)
 	return ret;
 }
 
-static int check_local_mod(unsigned char *head)
+static int check_local_mod(unsigned char *head, int index_only)
 {
 	/* items in list are already sorted in the cache order,
 	 * so we could do this a lot more efficiently by using
@@ -65,6 +65,8 @@ static int check_local_mod(unsigned char *head)
 		const char *name = list.name[i];
 		unsigned char sha1[20];
 		unsigned mode;
+		int local_changes = 0;
+		int staged_changes = 0;
 
 		pos = cache_name_pos(name, strlen(name));
 		if (pos < ...
From: Jeff King
Date: Friday, July 13, 2007 - 10:57 am

This does make more sense, but there are still some inconsistencies. Is
it OK to lose content that is only in the index, or not?

If it is OK, then --cached shouldn't need _any_ safety valve (and after
all, anything you remove in that manner is recoverable with git-fsck
until the next prune).

If it isn't OK, then you are not addressing the cases where git-rm
without --cached loses index content (that is different than HEAD and
the working tree).

-Peff
-

From: Matthieu Moy
Date: Friday, July 13, 2007 - 11:53 am

I'd say it isn't OK. At least, that's what the previous git-rm

Either I didn't understand your question, or the answer is "yes, I
do.". The behavior without --cached is not modified, except for the
error message, and the previous was to require -f whenever the index
doesn't match the head, *or* doesn't match the file. So, without
--cached, you need to have file=index=HEAD to be able to git-rm.

If I missunderstand you, please, provide a senario where my patch
doesn't do the expected.

-- 
Matthieu
-

From: Jeff King
Date: Friday, July 13, 2007 - 8:42 pm

Right, my point was that there is a case where running without --cached
could lose content: when there is no working tree file. However,
thinking about it more, I recall that Junio made the point that allowing
that behavior means the CVS idiom of "rm file; git-rm file" will just
work.

Not that that was a problem you introduced; I merely wanted to push for
total consistency rather than just handling --cached. But I think the
non --cached behavior is actually right now, so let me retract my
complaint.

And assuming the "git-rm when no working tree file" current behavior is
OK, then I think your patch removes the last consistency problem that I
mentioned in my state table here:

  http://article.gmane.org/gmane.comp.version-control.git/51449

So in a round-about way, I totally approve of your patch. Sorry for the
confusion.

-Peff
-

From: Junio C Hamano
Date: Friday, July 13, 2007 - 11:52 pm

Although I would not be using it often myself, I think this
would make "git rm" more pleasant to use.

Thanks for the patch, and my thanks also go to people who
commented on the patch.


-

From: Junio C Hamano
Date: Saturday, July 14, 2007 - 12:16 am

Having said that, I think this comment is not quite right.

+		else if (!index_only) {
+			/* It's not dangerous to git-rm --cached a
+			 * file if the index matches the file or the
+			 * HEAD, since it means the deleted content is
+			 * still available somewhere.
+			 */

Personally I do not think "rm --cached" needs any such "safety",
even though I'll keep the check for now, primarily because
loosening the restriction later is always easier than adding new
restriction.  I really do not think this is about protecting the
user from "deleted content is not available anywhere else".

In this sequence:

	edit a-new-file
	git add a-new-file
        edit a-new-file
        git add a-new-file

we do not complain, even though we are *losing* the contents we
earlier staged.  If you replace the second "git add" with
"git-rm --cached", the sequence should work the same way.  In
either case, you are working towards your next commit, and most
likely are doing a partial commit (iow, your working tree does
not match any of the commit you create in the middle).  Earlier
you thought you would want one state of the file in the next
commit, but now you decided against putting that new file in the
first commit in the series.  You may make further updates to the
index and would make a commit, but after making the commit, your
working tree still has "a-new-file" and you can add the contents
from it for the later commit.

-

From: Matthieu Moy
Date: Saturday, July 14, 2007 - 3:14 am

I agree that this is something you can argue about.

But in this case, the behavior without -f should be changed too. If
the file matches HEAD, then "git-rm file" should work, regardless of
the index then (but this situation is less frequent).

In any case, the situation where you might lose content in the index
by doing git-rm are rare: it means you started working on a file, did
"git-add" at least once, and edited the file again later, and then
decided you wanted to remove the file. So, requiring the -f flag in
those situation is not a real problem, even if the situation is
slightly-dangerous-but-not-quite-so.

I'm willing to work on another patch on top of this one if there's an
agreement on a better semantics. This one was about fixing something
which was IMHO wrong, but doesn't necessarily achieve perfection ;-).

-- 
Matthieu
-

From: Christian Jaeger
Date: Monday, July 2, 2007 - 2:20 pm

I'm really realising that

git-rm $file # where $file *has* been committed previously

does remove *and* unlink the file. (cg-rm does unlink only with the -f
flag, as said.)

So there's no -f flag in normal git-rm usage. It thus has a different
meaning, namely "force the operation pair of removing from index and
unlinking", not "force this operation also onto the checked out files"
as is the case with cogito.

So I now understand better why they invented the -f flag to git-rm for
the case you're mentioning above and why the hint doesn't warn about
it's danger, since git-rm is always dangerous. (Ok, as is "rm" without
the "-i"; I just found it normal that cogito behaved like my "-i" setup.)

Regarding the issue of "lost files" because they have been created,
added, and removed again before committing: as far as I remember this
has never happened to me with cogito. I commit often, so if I add a file
or a few, in most cases I commit just this fact (that they have been
added), before doing more fancy stuff. I'm maybe used to thinking in
database terms, work that isn't committed is lost. So if I create a file
and add it, in my brain the "attention, uncommitted work" flag is on,
and it usually doesn't happen that I later erroneously think the work
has been committed when in fact it isn't. (I can always check with a
quick cg-status (which shows the files as "A", which makes them stand
out better than in the git-status output)).

Just before writing this mail I had a case where I wanted to remove a
file from versioning control, but keep it on disk (I used git-rm and
that's how I learned that it really also unlinks the local file without
asking(*)). Note that this has not been an undo action; the file has
been committed previously.


(Well it's not safe if you want to remove the file *from the index* and

I think we have just two different points in our view where we think
safety matters.

Regarding the man pages: yes the git-rm man page is fine, and it's nice
to see the ...
From: Jeff King
Date: Monday, July 2, 2007 - 9:12 pm

Yes, git-rm is used in several situations, and the idea is that it
should behave safely in all situations; that is, without -f you can't
delete any data that can't be recovered from your history (but maybe
that means we shouldn't suggest -f in so cavalier a fashion).

Each file has three "states" that we care about: in the HEAD (H), in the
index (I), and in the working tree (W). Let's say you call 'git-rm'.
Here's a table of possibilities (A is some content "A", B is some
content not equal to "A", and N is "non-existant"):

H I W | ok? | why?
---------------------------------------------------
* N * | no  | file is not tracked
N A N | ?   | currently ok, but 'A' recoverable only through fsck
N A A | no  | 'A' recoverable only through fsck
N A B | no  | local modification 'B' would be lost
A A N | yes | 'A' recoverable through history
A A A | yes | 'A' recoverable through history
A A B | no  | local modification 'B' would be lost
A B N | ?   | currently ok, but 'B' recoverable only through fsck
A B A | no  | 'B' recoverable only through fsck
A B B | no  | 'B' recoverable only through fsck
B * * |     | equivalent to H=A

With --cached on, it is a little different:

H I W | ok? | why?
---------------------------------------------------
* N * | no  | file is not tracked
N A N |  ?  | currently ok, but 'A' recoverable only through fsck
N A A |  ?  | currently not ok, but 'A' still available in W
N A B | no  | 'A' recoverable only through fsck
A A N | yes | 'A' recoverable through history
A A A | yes | 'A' recoverable through history or working tree
A A B |  ?  | currently not ok, but 'A' still available in H
A B N |  ?  | currently ok, but 'B' recoverable only through fsck
A B A | no  | 'B' recoverable only through fsck
A B B |  ?  | currently not ok, but 'B' still available in W
B * * |     | equivalent to H=A

So it looks like our safety valve is a bit overbearing in a few
situations, and still misses some situations where data has to be pulled
out of the database ...
From: Junio C Hamano
Date: Monday, July 2, 2007 - 9:47 pm

These were explicitly done per request from git-rm users (myself
not one of them) who wanted to:

	rm the-file
        git rm the-file

sequence not to barf.  I suspect they were from CVS background
who are used to the SCM that complains if you still have the
file in the working tree when you say "scm rm".


I personally do not think we would need any safety check for
"git rm --cached", as it does not touch the working tree.  If
one cares about the differences among three states, one would
not issue "rm --cached" anyway.  The only reason "rm --cached"
is used is because one _knows_ that any blob should not exist at
that path in the index.


-

From: Jeff King
Date: Monday, July 2, 2007 - 9:59 pm

It depends on how we want to define "lost" data. In many cases, we are
protecting against losing content that will still be available until the
next git-prune. Should our safety valve protect against that case, or
should it not? We are totally inconsistent.

The main one for --cached, of course, is when that content exists _only_
in the index, but no longer in the working tree (!A A N or !A A B).  You
really should be using regular git-rm (in the first case, since you are
saying "I don't want this file anymore") or git-add (throw out the old
data, use my new version).

OTOH, clearly git-add can "lose" data in this way as well, since a
"modify, git-add, modify, git-add" will "lose" any reference to the
index state after the first add. So maybe that is not worth worrying
about at all (in which case our safety valve is too strict in many
places).

We could also issue a warning when "losing" reference to data that is in
the object db, which would include the sha1; in that case, an immediate

How about:

  git-add foo
  echo changes >>foo
  # oops, I don't want to commit foo just yet
  git-rm --cached foo

but in that case, maybe the user doesn't actually _care_ about that
intermediate state of 'foo'.

-Peff
-

From: Junio C Hamano
Date: Monday, July 2, 2007 - 10:09 pm

Exactly.  And not considering that lossage helps us keep our
sanity.  I think "git rm --cached" falls into the same
category.  If the user wants to discard what is in the index
without losing a copy in the working tree, I think we should let

Yes, that is (at least, "used to be") exactly the use case "rm
--cached" is supposed to help.  Added something prematurely to
the index, not ready to commit that part of the changes yet.
Of course you could do partial commits with "add --interactive"
these days, so there is not as much need for this as it used to
be anymore.



-

From: Jeff King
Date: Monday, July 2, 2007 - 10:12 pm

OK. So should we _remove_ the safety valve in all cases where we're just
losing stuff that's in the index? It is, after all, recoverable. Should
there be a warning (I suspect it would get annoying very quickly)?

I think this would help by making the use of '-f' more rare, which is
the thing that can _really_ screw you, since it turns off the safety
valve even for things that aren't recoverable.

-Peff
-

From: Junio C Hamano
Date: Monday, July 2, 2007 - 11:26 pm

I personally do not think we would need any safety check for
"git rm --cached", as it does not touch the working tree.  For
non-cached case I think the current behaviour is fine.

But I should warn you that I rarely use "git rm" myself.

-

Previous thread: FC6 now has git v1.5.2.2 by Patrick Doyle on Monday, July 2, 2007 - 10:09 am. (4 messages)

Next thread: challenges using fast-import and svn by David Frech on Monday, July 2, 2007 - 12:26 pm. (3 messages)