Re: [RFC] git-add update with all-0 object

Previous thread: Re: [PATCH 0/2] Making "git commit" to mean "git commit -a". by Jakub Narebski on Thursday, November 30, 2006 - 5:41 pm. (3 messages)

Next thread: Re: [RFC] git-add update with all-0 object by Jakub Narebski on Thursday, November 30, 2006 - 6:41 pm. (2 messages)
To: <git@...>
Cc: Junio C Hamano <junkio@...>
Date: Thursday, November 30, 2006 - 6:08 pm

One thing that I think is non-intuitive to a lot of users (either novice
or who just don't do it much) is that it matters where in the process you
do "git add <path>" if you're also changing the file. Even if you
understand the index, you may not realize (or may not have internalized
the fact) that what git-add does is update the index with what's there
now.

I think the more obvious behavior is to have it record the fact that you
want to have the path tracked, but require one of the usual updating
mechanisms to get a particular content into the index.

This should be pretty simple to implement: use --cacheinfo 0 0 $path
instead of --add -- $path, and teach programs that look at the objects
recorded in the index (rather than just hashes or other info) about all-0
hashes meaning "but no content there". write-tree would probably just
skip the entry (and then you could add a file, but still produce commits
without it until you actually do either an update-index explicitly or one
of the commit option sets that updates it); diff would treat it as empty;
checkout would ignore it.

-Daniel
*This .sig left intentionally blank*
-

To: Daniel Barkalow <barkalow@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Thursday, November 30, 2006 - 6:46 pm

While this certainly matches the git model better than just automatically
taking whatever state exist at commit time (you instead introduce it as a
special "empty state" case), I don't think you really want it.

Why?

Two reasons:

- you're still left with all the same issues (ie you do need to use "git
commit -a" because that is simply fundamental, and if you don't, "git
commit" now causes an ERROR, which is just illogical - you just added
the data!)

So it's simply better to just tell people "git add" adds the whole
state. Explain to them that git doesn't track "filenames", it tracks
state, and when you do a "git add", it really adds the _data_ and the
permissions too.

Really, if you didn't come from years of broken SCM's, you'd think that
it's _natural_ that when you add a file for tracking, you add its
contents too. It's not that git is surprising or unnatural, it's that
CVS is.

- you generally really don't want to see "git diff" show you the big diff
for a new creation. You only think you do, but trust me, you generally
don't. It's the same thing as with doing merges - keeping the
automerged state in the index is actually nice, because it means that
the default "git diff" can just shut the heck up about the things that
may be the _bulk_ of the change, but it's not the interesting part.

So I would suggest that if people are irritated with "git diff" for
example not showing newly added files AT ALL, then the solution to that
isn't that they should be added as "empty" or "all zeroes". We do have
other state bits in the index already (we need them for marking things as
being unmerged etc), and if the problem is that you want to see that you
have a pending add, it's easy enough to have "git add" always set a bit
saying "this file is new".

A normal "read tree object" would populate index entries with that bit
cleared, and so it would be possible to have

git add file.c
git diff

show so...

To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Thursday, November 30, 2006 - 8:12 pm

I'm not sure I want to see the whole added file more when diffing two
trees, or when I do "git diff --cached" after "git update-index --add",
than when I do "git diff" after "git add", but I'll concede that viewing
the content of a new file as a diff is no fun. (Maybe diff-against-nothing
for display needs work in general? It's solve the whole root commit thing,

This is where I think "git add" is really broken. For every other git
command, if the command causes the index to not match HEAD, the command
contains "index" either in the name of the command or in an option.

So, if you understand the index, and you understand git's model, but you
don't know this one weird corner case, you will come to the conclusion
that "git add <path>" leaves <path> such that the index matches HEAD.

Now *you* know that "git add" really is "git update-index --add", because
you were typing the latter (well, "git update-cache --add", anyway) before
"git add" existed at all. But for new users, and anyone who wasn't adding
a lot of files back then, it's a surprising exception that has to be
learned and internalized.

"git checkout" leaves the index matching HEAD or its original state.
"git commit" leaves the index matching HEAD (the new HEAD) or its original
state.
"git reset" (all options) leaves the index matching HEAD or its original
state.
"git pull/merge" does disrupt the index, but it also starts to prepare a
commit based on multiple *HEAD files, and it leaves every stage of the
index matching some *HEAD or its original state. And new users still seem
to wonder where the merge happens, because it doesn't say "in the index".
"git apply" leaves the index alone.

"git update-index" says it works on the index.
"git apply --index" says it works on the index.

Am I missing any violations of the rule? I guess "git rm", but that's just
for the CVS-damaged, unnecessary anyway, and it still doesn't care about
the state of the working directory at any particular point in t...

To: Daniel Barkalow <barkalow@...>
Cc: Junio C Hamano <junkio@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Friday, December 1, 2006 - 12:57 am

But it's not just this one wierd corner case. You yourself said that
"git pull/merge" leave the index where it's != HEAD.

I have serious trouble believing that "if the command leaves index !=
HEAD, the command must contain 'index' in either the name of the
command or the option" is all that important of a consistent rule or
principle that must be maintained at all costs.

By the way, after thinking about this for a while, part of the problem
is that the name "index" really sucks. Which is perhaps why Linus is
now trying to stop us from actually using the term "index" in these
discussions. :-) If we called it a "staging area", as our Great
Leader has suggested, I think it would be a lot easier for novice
users to understand. Consider what is in the git man page:

The index is a simple binary file, which contains an efficient
representation of a virtual directory content at some random
time. It does so by a simple array that associates a set of
names, dates, permissions and content (aka "blob") objects
together. The cache is always kept ordered by name, and names
are unique (with a few very specific rules) at any point in
time, but the cache has no long-term meaning, and can be
partially updated at any time.....

In particular, the index file can have the representation of
an intermediate tree that has not yet been instantiated. So
the index can be thought of as a write-back cache, which can
contain dirty information that has not yet been written back
to the backing store.

For a kernel programmer, this might not be understandable --- but for
your typical application programmer, this is enough to cause him or
her to conclude that git is simply not meant for use by mere mortals.

So as Junio and Linus have both said, it's all about your mental
model, and if we think about it in terms of a staging area for a
commit, and we think about what commands are most natural given that
model, it's far more important than whether a command has "index" in
its n...

To: Theodore Tso <tytso@...>
Cc: Junio C Hamano <junkio@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Friday, December 1, 2006 - 4:10 am

My position on this subject is that "index" is a good name, but that
description is a terrible description, and "index" is a word that needs a
good description in context. If we just said up front:

Git's "index" is a staging area that you use to prepare commits. It maps
filenames to content. It allows git to remember changes you want to put
into the next commit while you do more work. For normal commits, it is
not necessary to use the index, but it is very helpful for complicated
commits, because it lets you focus on the part you're still working on
while git remembers the part you're done with.

I think people would get it. (If it were called the "cache" still, it
would be hopeless, because "cache" implies false things; "index" doesn't
imply anything initially.)

Of course, we'd still have to disabuse people of the notion that the index
can store the information "there's nothing at this path yet, but I'm
interested in it", because that's a piece of information people often know
before a file is ready, and think git would be able to remember in a
staging area.

-Daniel
*This .sig left intentionally blank*
-

To: <git@...>
Date: Friday, December 1, 2006 - 5:37 am

If we need to explain what "index" means in the context of diff then it's not
a good name :-)

An index /everywhere else/ is a lookup table. topic->page number;
author->book title. record id->byte position. There is never any content in
an index, indices just point at content.

I imagine that's how git's index got it's name. (I'm only guessing as I've
not looked at what's actually inside git's "index"). Here's my guess:

git update-index file1 hashes file1, stores it somewhere under that hash and
writes the hash->filename connection to .git/index. That is why git's index
is called an index. It's a hash->filename index.

Unfortunately, "index" in colloquial git actually means the combination
of .git/index plus the hashed file itself. That's no longer an index, it's
a "book". :-)

It's made worse, I think, by the fact that git doesn't want to do any
index-like things with the "index". Being content-oriented rather than
name-oriented means that an entry like "file1->NOTHING" is impossible in git.
This leads to the sort of "git-add means track this filename" confusion that
turns up a lot with new users.

It's probably all too late to change the nomenclature, but I've always been of
the opinion that names are important, they confer meaning. When we use a
common word, with common meaning and deviate from that common meaning we are
bound to create confusion. New users don't have any "git-way-of-thinking"
knowledge when they begin, so when they hear "index" they can only fall back
on their standard understanding of that word. We shouldn't be surprised then
when new users don't get "the index".

Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com
-

To: Theodore Tso <tytso@...>
Cc: Junio C Hamano <junkio@...>, Daniel Barkalow <barkalow@...>, <git@...>
Date: Friday, December 1, 2006 - 3:10 am

Hey, it was originally called "cache".

I don't care _what_ it's called, I just want people knowing about it,
because hiding it will just cripple git (ie at the very least, when you
hit a merge conflict, you really do want to to understand it if you ever
want to go the the "next level").

If people are more comfortable just calling it the "staging area", and

Yes.

And even "git diff" isn't really a problem once you understand the staging
area. If people feel worried, let them use "git diff HEAD". You won't need
to use git for _that_ long until you realize that since the staging area
is going to match the HEAD under normal circumstances (and when it
doesn't, you actually tend to prefer to get the diff against the staging
area _anyway_), you'll find people just starting to use "git diff" and not
worry about it.

Linus
-

To: Theodore Tso <tytso@...>
Cc: Daniel Barkalow <barkalow@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Friday, December 1, 2006 - 2:20 am

All good points. The only slight worry I have is that just
moving EXAMPLE up deviates from the traditional UNIX manpage
order of presenting information.

I think the plumbing manuals can (and probably should) stay as
the technical manual for Porcelain writers. "git diff", "git
add" and friends that are clearly Porcelain should talk about
what it does in the terms of end user operation in the
DESCRIPTION section and puts less stress on how things work
behind the scene in technical terms. For example, from
git-diff(1):

DESCRIPTION
-----------
Show changes between two trees, a tree and the working tree, a
tree and the index file, or the index file and the working tree.
The combination of what is compared with what is determined by
the number of trees given to the command.

That may be an accurate description of what the command does in
technical terms, but it does not tell why the user may want to
compare "a tree and the working tree". The users would want to
know which case applies to their current situation and we should
make it easier for them to find that information.

For example, although --cached is technically speaking one of
the --diff-options, it should be separated out from other
options when we talk about 'git-diff'. Also, although 'git-diff'
is designed to work on tree-ish, Porcelain users will use with
commit-ish (either a commit or an annotated signed tag that
points at a commit) 99.9% of the time, so we should mention
<tree-ish> at the end as a sidenote and talk about <commit>.

DESCRIPTION
-----------
This command shows changes between four combinations
of states.

* 'git-diff' [--options] [--] [<path>...]

is to see the changes you made relative to the index
(staging area for the next commit). In other words, the
differences are what you _could_ tell git to further add
to the index but you still haven't. You can stage
these c...

To: Daniel Barkalow <barkalow@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Thursday, November 30, 2006 - 6:34 pm

And actually I think this is a good thing. This is what makes the index
worth it. Better find a way to make it obvious to people what's
happening.

Nicolas
-

To: Daniel Barkalow <barkalow@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Thursday, November 30, 2006 - 6:32 pm

Hi,

I fear that this is just your being used to the CVS mindset. Please see
http://article.gmane.org/gmane.comp.version-control.git/32792 for details.

Hth,
Dscho

-

Previous thread: Re: [PATCH 0/2] Making "git commit" to mean "git commit -a". by Jakub Narebski on Thursday, November 30, 2006 - 5:41 pm. (3 messages)

Next thread: Re: [RFC] git-add update with all-0 object by Jakub Narebski on Thursday, November 30, 2006 - 6:41 pm. (2 messages)