Re: [RFC] Two conceptually distinct commit commands

Previous thread: Re: Re: Moving a directory into another fails by Jakub Narebski on Monday, December 4, 2006 - 12:10 pm. (14 messages)

Next thread: [ANNOUNCE] gitfs pre-release 0.04 by Mitchell Blank Jr on Monday, December 4, 2006 - 12:40 pm. (1 message)
From: Carl Worth
Date: Monday, December 4, 2006 - 12:08 pm

[
  I think the proposal below is original, and more correctly captures
  the essence of the "commit interface wart" than any previous
  proposal I've made. This proposal is also based entirely on what is
  useful for all git users, and what I perceive git's conceptual
  models to be. That is, this proposal concerns what _I_, (as a fairly
  experienced git user), actually want, without any bias for any
  assumptions about what an imagined "new user" might want. Notably,
  it does not try to satisfy naive (and likely incorrect) assumptions
  about git's model.

  Finally, this proposal intentionally uses ludicrously long command
  names. This is because a discussion of realistically short names
  triggers the two loaded issues of "muscle memory" and which concepts
  get blessed as "defaults". In previous threads, those issues have
  muddied the conceptual issues I'd like to focus on here. Let's talk
  about the concepts first, and save discussions of naming for later
  if necessary.
]

Proposal
-------
Here are the two commit commands I would like to see in git:

  commit-index-content [paths...]

    Commits the content of the index for the given paths, (or all
    paths in the index). The index content can be manipulated with
    "git add", "git rm", "git mv", and "git update-index".

  commit-working-tree-content [paths...]

    Commits the content of the working tree for the given paths, (or
    all tracked paths). Untracked files can be committed for the first
    time by specifying their names on the command-line or by using
    "git add" to add them just prior to the commit. Any rename or
    removal of a tracked file will be detected and committed
    automatically.

Rationale summary
-----------------
These two commands capture a distinct conceptual split that is useful
for what users want to do with git. The split is necessary and
sufficient to provide access to four different useful pieces of commit
machinery. This is more functionality than in current ...
From: Carl Worth
Date: Monday, December 4, 2006 - 1:10 pm

BTW, the current "apply --index" doesn't allow what I imagined in the
scenario above. It notices that the affected file is different in the
working tree compared to the index and just refuses to do anything.

Given that safety-valve in git-apply, the current behavior of "git
commit paths" would allow for splitting a submitted patch into two
commits.

The difference is that it only works if the local modifications do not
affect any of the same paths as the patch. The user is freed from
worrying about this somewhat, since if it's not the case, then
git-apply just complains and doesn't do anything.

But what might be very interesting is a modified "git-apply --index"
that would not fail in this case, but would instead do the following:

	1. Apply the patch to the working tree

	2. Apply the patch to the index

And of course, if either fails then the entire apply operation fails,
leaving no changes to working tree or to the index.

With that new git-apply behavior, then the scenario I outlined above
would work, and would work in spite of any changes to the same file in
both the working tree and the index. It would also require the
separate commands for commit-index-content vs.
commit-working-tree-content as described in my original message above.

-Carl
From: Horst H. von Brand
Date: Monday, December 4, 2006 - 5:52 pm

Carl Worth <cworth@cworth.org> wrote:


Edit somefile with, e.g, emacs: Get backup called somefile~
Realize that somefile is nonsense, delete it(s edited version)
commit-working-tree-contents: Now you have the undesirable somefile~ saved

Edit somefile, utterly changing it: Get backup called somefile~
mv somefile newfile
commit-working-tree-contents: somefile~ saved, newfile lost

Edit somefile a bit, move it to newfile. Make sure no backups left over.
commit-working-tree-contents: somefile deleted, newfile lost

This is /not/ easy to get right, as it depends on what the user wants, and
the random programs run in between git commands.

You need to tell git somehow what files you want saved, and which ones are
junk. I.e., just the first command (unfortunately).
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239
Casilla 110-V, Valparaiso, Chile               Fax:  +56 32 2797513
-

From: Carl Worth
Date: Monday, December 4, 2006 - 6:18 pm

The semantics I intended to describe for commit-working-tree-content
would not add this file. That's a "new file" so would have to be
mentioned either explicitly on the command-line or in a git-add

OK, you've found a bug in my description above, (though not in the
intended semantics). By "rename...detected automatically" I meant only
that the fact that a file has "disappeared" as part of a rename need
not be mentioned to git. The fact that the contents are made available
as a new file name still would need to be told to git with "git add",

Perhaps I was too oblique in calling this thing
commit-working-tree-contents. This isn't some fabricated-from-scratch
command. The intent of my message was that readers would recognize the
description as matching what the current "commit -a" and "commit
files..."  commands do. So I really wasn't trying to invent anything
really different than those. So almost any problems of unexpected
behavior you can find almost surely apply to "commit -a" already.

I did throw one new thing into the description, (that does not exist
in current git). That's the mention that new files could be added by
mentioning them explicitly on the command-line. This was intended as a
way to allow a tutorial to sidestep the details of how "git add"
interacts with the index. If this one feature is a bad idea, it could
be dropped with no impact on the rest of the proposal nor my
discussion of it.

Similarly, I worded the mention of "git add" to suggest it be done
"just prior to the commit". Again, I did this just to avoid having to
mention anything about the need to "git add" again if the file was
edited between the time of add and the time of commit. That language
is already proposed for the git-add documentation, so there's no need
to repeat it all here.

-Carl
From: Horst H. von Brand
Date: Monday, December 4, 2006 - 7:14 pm

How do you distinguish a "new file, same contents as old file" from "old
file, renamed"? What is the difference between:

  mv somefile newfile

and

  cp somefine newfile
  rm somefile

?

How should

  cp somefile newfile
  vi somefile

be handled? How about

  cp somefile oldfile
  vi somefile

or just

  mv somefile oldfile

? Or

  cp somefile somefile.my-own-bakup
  vi somefile

?

The whole problem is your description based on "file renaming" and
such. AFAIU git has a list of file names it is tracking, and for those
names it keeps track of what the contents for each are at each commit. That
the name somefile had some contents that later show up as newfile (both
names tracked) is recorded just as that. You could /interpret/ this as a
"rename" if somefile is then gone, but it could well be something else.
Besides, you'd have to search for the old somefile contents among /all/
newfiles just to find out it was renamed. Better don't mix facts with
interpretation (== guesses on what operations came in between the snapshots
git takes). Note that it should never matter what strange ideas a random
user gets for naming her temporary backup files, or their git configuration.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239
Casilla 110-V, Valparaiso, Chile               Fax:  +56 32 2797513
-

From: Carl Worth
Date: Monday, December 4, 2006 - 7:32 pm

There is no difference. This is git, a content tracker.


OK. Strike the words "or rename" from the description, leaving just:

	Any removal of a tracked file will be detected and committed
	automatically.

The rest of my analysis still stands, I believe. And I'd be glad to
accept further suggestions on documenting these. The goal is simply to
have a user-oriented description of the semantics that are consistent
between the current "git commit -a" and "git commit files..."
commands.

-Carl
From: Theodore Tso
Date: Monday, December 4, 2006 - 8:51 pm

I think this is a very interesting proposal, although I think I
disagree with the last part:

      Any [rename or] removal of a tracked file will be detected and
      committed automatically.

If adds aren't going done automatically (because otherwise you have
problems with foo.c~ accidentally getting checked it), then it's
non-symmetric to expect that deletes will also happen automatically.
It's relatively rare that files are removed or renamed, and sometimes
files accidentally disappear.  

So in the case where there are no pathnames given to "git
commit-working-tree-content", I would argue that it does not do any
implicit "git add" on new files NOR any implicit "git rm" on missing
files unless the user actually specifies an --implicit-add or
--implicit-delete option, respectively.  If users want to make
--implicit-add and/or --implicit-delete the default, that could be a
configuration option, but I don't think it should be a default.


A second issue which you left unspecified is what should
commit-working-tree-content do if the index != HEAD.  In particular,
in this case:

edit foo.c
git update-index
edit foo.c
git commit-working-tree-content foo.c

What should happen to foo.c in the index?  Should it be stay the same?
Should the contents be replaced with version of foo.c that has just
been commited?  The latter seems to make sense, but runs the risk of
losing the data (what was in the index).  The former has the downside
that the index might have a version of foo.c which is older than what
has been just commited, which could be confusing.  Or should git
commit-working-tree abort with an error message if index != HEAD?

						- Ted
-

From: Junio C Hamano
Date: Monday, December 4, 2006 - 11:33 pm

That is exactly the "'commit --only' jumps the index" issue.

Updating the index with what is committed makes sense because
the commit after this --only commit happens builds on top of it,
and not doing so would mean the change to foo.c would be
reverted.  As you mentioned above, updating the index with the
committed version of foo.c means information loss of what was
staged earliser, and the traditional behaviour has been to
"abort with an error if index != HEAD" at that path, which was a
safety valve.

However, In the recent discussion, everybody (Linus, Nico, and I
included) seems to think this information loss is acceptable and
in fact is even useful.  I've sent a patch to remove the
obsolete safety valve for comments today, but haven't applied it
to any of my public branches yet, but most likely I will, and it
will happen sooner with encouragement from the list.

-

From: Carl Worth
Date: Monday, December 4, 2006 - 11:38 pm

It's non-symmetric, yes, but it's what I would personally like. It's
not an essential aspect of the proposal, so it could go either way as
the git crowd decides.

To explain my personal preference, I like the notion of all files
being "untracked" until I inform the system about their
existence. After that, I'd like the system to take care of them and

The ability to configure --implicit-delete and --implicit-add to

This case is already under debate in a separate thread. There "git
commit files", (which really is commit-working-tree-content already),
currently errors out in this case, but the proposal is to allow it to
proceed with the commit, (thereby "losing" the intermediate staged
content).

-Carl

From: Junio C Hamano
Date: Tuesday, December 5, 2006 - 6:13 pm

(This is offtopic)

I often faced situations like that during git.git history.  One
patch to expose the bug in the existing code, and another to fix
it.  And there are three ways to make that commit.

 (1) one commit exposes, then another fixes.
 (2) one commit fixes, then another verifies the bug is no more.
 (3) one commit to include both.

In my experience, (1) is only useful during the time I am coming
up with the fix (if I am fixing it myself) or during the time I
am reviewing and committing the fix (if I am applying somebody
else's patch).  Committing in that order lets me validate the
brokenness after making the first commit, and then lets me feel
good by not seeing that problem after the second commit.  But
this means I deliberately record a state that is known not to
pass the test, which means it is a problem for somebody else in
the future when the history needs to be bisected to hunt for an
unrelated bug.  If the "test" is just an optional test in the
test suite, then it is easy to work around (the person who is
bisecting can ignore that bug by not running that particular
test), but if it is an assert somewhere deep inside the code,
ignoring it is not very easy, especially if the person who is
bisecting is not familiar with that part of the code.

What I recommend people to do these days is either (2) or (3),
but do so _after_ verifying the fix in the reverse order.  The
criteria to choose between (2) or (3) is fairly simple: if the
"test" is easily separable (e.g. changes to a test script file
that does not overlap with the "fix" patch), roll both in one
commit.  Then it would not later cause problems for bisection.

Enough of offtopic.

The sequence to split a patch in place would be (I'll speak in
the present tense and pretend Nico's "git add" does not exist
yet):

	git apply
        git update-index <files for the first batch>
        git commit
        git commit -a ;# the remainder

so you do not necessarily need a new "concept".  It ...
From: Carl Worth
Date: Tuesday, December 5, 2006 - 9:53 pm

Granted, something like an assert that breaks the library would not be
a useful thing to have in the history. I'm certainly not in favor of
something like that.

I'm talking about tests that demonstrate pre-existing broken-ness in
the code. In the case of cairo, our test suite is entirely optional,
and each test is out-of-process, so even if a test totally crashes the
suite continues and it's easy to ignore that.

But it's not just the correctness test suite. We also have a
performance test suite, and I encourage the same "add test case, then
performance fix" pattern there. This shows an even more obvious
example of why it's useful to have separate commits in the history,
(since someone may want to verify the performance impact on a separate
system at any point in the future, and the two commits makes it easy
to get "before" and "after" for the performance fix results from the

Yes, I can use this today, and I do, (as I mentioned in my mail). The
only requirement is that I start with a non-dirty working tree. I can
arrange that, but it would be just a bit less inconvenient if I didn't

No, I don't need it. And this "commit-index-content [paths...]" was
the least significant part of my proposal. As I said, originally I was
just going to say this "might be useful in some cases", but then
someone just happened to request this feature on the list at the same
time I was considering the proposal.

Anyway, it spite of this being an accidental feature of y proposal, it
seems to be the only part you commented on. Even if this functionality
weren't made available at all, I'd still be interested in your
comments on the main thrust of my proposal. I think that consists of:

	1. Unifying the two current commands that provide
	   commit-working-tree-content semantics into a single,
	   use-oriented description.

	2. Avoiding a change of semantics triggered by merely applying
	   pathname arguments without any command-line option or

Indeed. I like that discipline very much. And ...
From: Johannes Schindelin
Date: Wednesday, December 6, 2006 - 2:54 am

Hi,


Note that (1) maybe would reflect history better, but (2) and (3) are way 
nicer to bisecting.

I fell very strongly that I want (3) in the history.

(Though I am guilty of many instances of (2)...)

Ciao,
Dscho

-

From: Carl Worth
Date: Wednesday, December 6, 2006 - 9:14 am

By the way, the original command-line convention I used in the
proposal was that the omission of an optional argument should be
equivalent to supplying some default argument. Here's another
convention that is also useful to examine:

	Adding path-name arguments limits the behavior of the command,
	(and does not otherwise change the semantics).

I don't know that this is as universal a convention outside of git,
but it's quite strong within git. The path name limiting exists in
deep parts of the machinery and allows for things like:

	git log -- paths...	# path-limited version of "git log"
	git diff -- paths...	# path-limited version of "git diff"
	etc.

It's interesting to look at how the various commit commands fit (or
do not fit) this convention:

  git commit paths...
  git commit --only paths...

	This command cannot be explained in terms of the semantics of
	"git commit" (without command-line options). This command
	_can_ be explained as a path-limited version of "git commit -a".

  git commit --include paths...

	This command does something _extra_ to the given paths before
	executing the equivalent of "git commit". I think this is a
	fairly unique violation of the path-limiting convention.

The proposal I made with commit-working-tree-content and
commit-index-content consistently follow the path-limiting convention.

I think consistency of command-line conventions like this are
important for making the tool usable. And there have been notable
improvements to consistency of convention in git recently, (for
example, using <since>..[<until>] in git-format-patch rather than
<his> <mine>).

-Carl
Previous thread: Re: Re: Moving a directory into another fails by Jakub Narebski on Monday, December 4, 2006 - 12:10 pm. (14 messages)

Next thread: [ANNOUNCE] gitfs pre-release 0.04 by Mitchell Blank Jr on Monday, December 4, 2006 - 12:40 pm. (1 message)