login
Header Space

 
 

Re: [FAQ?] Rationale for git's way to manage the index

Previous thread: How to set git commit timestamp by Guido Ostkamp on Sunday, May 6, 2007 - 12:03 pm. (5 messages)

Next thread: Wiki front page pointing to HelpOnLanguages by Matthieu Moy on Sunday, May 6, 2007 - 12:15 pm. (3 messages)
To: <git@...>
Date: Sunday, May 6, 2007 - 12:10 pm

Hi,

I've read the manual, and I belive I have a correct understanding of
how the index works, technically speaking. Still, I'm not clear about
the rational for such design.

Almost any other decent system has an equivalent to cache the stat
information (bzr calls this stat-cache, hg calls it dirstate IIRC).
That is, if your run "$vcs diff" twice, the second run will only need
to stat all files, never diff them.

But the fact that git actually remembers the _content_ of files in the
index, and that the default behavior for "commit" is to commit only
the content that is explicitely "git add"ed is something I've never
seen outside git.

At first, I find it rather annoying. My usual workflow is

&lt;hack hack hack&gt;
% $vcs status
% $vcs commit -m "describe whatever I did"
&lt;hack hack hack&gt;
...

With git, i'd do

&lt;hack hack hack&gt;
% git status
% git add X
% git add Y
% git status
% git commit

or

&lt;hack hack hack&gt;
% git satus -a
% git commit -a -m "..."

In the former case, I have more commands to type, and in the second
case, I loose part of the stat-cache benefit: If I run "git status -a"
twice, the second run will actually diff all the files touched since
the last run, since "git status -a" actually updated a temporary
index, and discarded it afterwards, so it doesn't update the stat
information in the index (while "git status" would have).

In both cases, I can't really see the benefit. I'm pretty sure this is
a FAQ, and I'm also pretty sure there are good arguments for it, but I
can't find it anywhere.

Thanks for your answers,

-- 
Matthieu
-
To: <git@...>
Date: Monday, May 7, 2007 - 7:40 am

Hi,

As a newbie, I'm agree with Matthieu: the Git's index is surprising
for people coming from CVS/SVN (mindless?) world. So a good
documentation about this, even in tutorials, is really important.

In order to improve my productivity with Git, and in order to avoid
traps around moving from SVN to Git, I often use the Git Emacs mode.
It is really usefull for beginners as it works similarly for CVS, SVN
and Git: synthetic view of all modifications, easy selection of what
will be commited... The biggest drawback of this "porcelain": using
it, you do not understand the Git's index philosophy.

-- 
Guilhem BONNEFILLE
-=- #UIN: 15146515 JID: guyou@im.apinc.org MSN: guilhem_bonnefille@hotmail.com
-=- mailto:guilhem.bonnefille@gmail.com
-=- http://nathguil.free.fr/
-
To: Guilhem Bonnefille <guilhem.bonnefille@...>
Cc: <git@...>
Date: Monday, May 7, 2007 - 6:23 pm

I think that the confusing thing isn't really the index, but the fact that 
git, by default, will make commits where the content in the commit is 
different from the content in the working directory. (In fact, you can use 
git-hash-object --stdin and git-update-index --cacheinfo to do a commit 
which shares no content at all with any present or past state of the 
working directory!)

In other version control systems, you have to use some option or argument 
to make that kind of non-matching commit (and you're generally limited in 
how your commits can fail to match the working directory). I think the 
confusion is that git requires an option to say that you want the commit 
to match the working directory, as opposed to creating a non-matching 
commit, which is generally the more advanced and more unusual case.

I think this is why people mostly get to understand the index by way of 
using it to resolve a conflicted merge: in that case, you have to make the 
index match the working directory before committing, and the index tracks 
your progress in reaching this state, which is the intuitive use of the 
index in normal situations.

	-Daniel
*This .sig left intentionally blank*
-
To: Guilhem Bonnefille <guilhem.bonnefille@...>
Cc: <git@...>
Date: Monday, May 7, 2007 - 8:55 am

Hi,


So, you are not only a newbie, but you have to unlearn some CVS 
braindamage.

I don't know how to make it even more prominent that CVS users should read 
a special introduction first. AFAICT such a hint is in all the appropriate 
places. (I mean, you would not expect to be able to fly a plane, just 
because you have learnt to drive a car, wouldn't you?)

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Guilhem Bonnefille <guilhem.bonnefille@...>, <git@...>
Date: Wednesday, May 9, 2007 - 9:14 am

Hi,


  http://www.kernel.org/pub/software/scm/git/docs/tutorial.html does not
talk about anything like that (it links to "Git for CVS users" but
that's really just about importing from CVS and the shared repository
workflow).

  On the other hand, I think the tutorial linked above gives quite a
clear explanation of git commit -a, git add etc. Guilhem, what do you
find missing in the tutorial about this topic?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Guilhem Bonnefille <guilhem.bonnefille@...>, <git@...>
Date: Monday, May 7, 2007 - 3:31 pm

Well, people worried that documentation and command set before
1.5.0 exposed index too much, making learning curve too steep by
having one extra thing people need to learn before starting to
be productive with git.  Now post 1.5.0 people are confused,
quite rightly, that they are not told about index early enough.


Let alone flying.  Just taxiing straight was hard for me until I
shook the habit I picked up from driving a car.


-
To: Guilhem Bonnefille <guilhem.bonnefille@...>
Cc: <git@...>
Date: Monday, May 7, 2007 - 8:16 am

git-gui is a good tool here (so good, in fact, that this is the second
time today I spam the list about it). It shows very pedagogically the
diff between HEAD and index, and the diff between index and working
dir, and allows you to point and click your way to committing
precisely the subset of changes you intended to commit. As an added
bonus, it's perfectly usable even if you don't know anything about
emacs.

-- 
Karl Hasselstr
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 1:25 pm

Yeah. You'd better get used to it, because it's fundamental.

Here's the rationale list:

 - It's fundamentally the only sane thing to do.

   Git tracks content at _all_ levels, not "files". So this is more than 
   an implementation issue, it's a fundamental "how the world works" 
   issue. The fact that everybody else gets it wrong is _their_ problem. 

   [ Corollary: the fact that your brain has rotted from using those 
     broken systems is obviously your problem, and sadly, there's nothing 
     else we can do than try to show the right way and hope that the 
     neurons re-generate. CVS has caused endless suffering, this is just 
     one small example of it ]

 - You fundamentally cannot do it any other way. 

   Not doing it the way git does it (point to the content) means that the 
   index-replacement has to point to something else, namely a "file ID". 

   That's so broken as to be really really sad. In CVS, for example, there 
   obviously isn't any "file ID", so what does the "index" in CVS point 
   at? 

   Right. The "index" in CVS is the Entries file, and it not only lacks 
   stat information, it also lacks any other information, which means that 
   the "file ID" is _literally_ just the pathname itself. That causes 
   obvious problems, so nobody sane would ever suggest that this is a good 
   idea.

   So what do other people use? They tend to not have understood the 
   "content is king" thing (which is what git uses), so they add somethng 
   *else* to the "index" file than the content. What can I say? People are 
   morons. I'm constantly amazed at just how stupid SCM people seem to be.

   In most systems, that "something else" is a "file ID". That just means 
   that they are fundamentally broken whenever they do any trivial merge 
   with renames. Just don't do it. I've talked before about why tracking 
   file ID's is wrong - it's just as wrong as thinking that the "ID" of a 
   file is the path. 

 - Tracking content in the index is f...
To: <git@...>
Date: Sunday, May 6, 2007 - 2:23 pm

Thanks a lot for the detailed explanations.

Note that I'm not "complaining", but just not understanding something.

(I would actually complain about the documentation not being clear
enough, but I'll try to complain with a contribution instead ;-) I'll

Well, git's index still tells more than "the content FOOBAR exists,

Off course, I don't have strong argument against it. The biggest
annoyance is that my fingers are used to "commit -m message", and now
type "commit -a message", but ...

The reason why I'm posting this is that I was wondering whether
"commit -a" not being the default was supposed to be a message like
"you shouln't use it too often".

It seems it isn't. I'll just get used to "commit -a" (and probably
alias it), and discover the actual benefits of the index little by
                                   ^^^
Not sure this was intentional, but your spelling of "as" when used to
talk about CVS seems to reveal something about your state of mind ;-).

Thanks,

-- 
Matthieu
-
To: <git@...>
Date: Wednesday, May 9, 2007 - 1:18 pm

As promised, here's a FAQ entry on the wiki:

http://git.or.cz/gitwiki/GitFaq#head-3aa45c7d75d40068e07231a5bf8a1a0db9a8b717

Feel free to correct it.

Anyway, thanks for the interesting discussion.

-- 
Matthieu
-
To: <git@...>
Date: Monday, May 7, 2007 - 11:16 pm

Heh. Making the index very visible makes sense when you are merging,
Linus and Junio are both integrators and spend a lot of time merging.
Hence the default is for git-commit to observe the index.

I agree with Linus' other points too, but at the end of the day, it
makes life easier and saner mainly when merging, at the expense of
having to pay a bit more attention in common commits. The tradeoff
makes sense _specially_ if you are the integrator.

So I do git-commit -a, and typing that '-a' is small price to pay for
the best SCM I've ever used ;-)

cheers,


martin
-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Tuesday, May 8, 2007 - 7:07 am

Hi,


You're saying that the main use of the index is to help merging. I have to 
disagree strongly.

When I have been chasing a bug all over the place, and finally found it, 
my working tree is a mess. Lots of assertions, lots of debugging 
statements, some of them commented out. So, now it is cleanup time, right?

The problem is that more often than not, I broke my fix while cleaning up.

Therefore, I now put all changed files into the index (git add -u), and 
clean up the files one by one, always checking with "git diff" and "git 
diff HEAD" what I still have to do.

Yes, very often I can just take the original version of a file (git reset 
--soft &lt;file...&gt; would be handy here), but it helped me quite a number of 
times to have my messed-up-but-working state in the index.

In a sense, I am using the index as the stash commit we talked about every 
once in a while.

Ciao,
Dscho

-
To: Martin Langhoff <martin.langhoff@...>
Cc: <git@...>
Date: Tuesday, May 8, 2007 - 12:45 am

It is definitely true that some of the advantages of the way git does the 
index really start shinign when merging and you have content conflicts. 
What we've done to "git diff" really makes things a lot easier (and 
anybody who hasn't used "gitk --merge" after a content conflict really 
hasn't realized how *helpful* git is when merging content conflicts).

However, in all honesty, while the whole "index for merges" comes from 
pretty damn early in git history (the whole "stage number" thing appeared 
on April 15th 2005 - so it was about a week after the first release), it 
wasn't the original impetus of the way git works.

Git used explicit index updates from day 1, even before it did the first 
merge. It's simply how I've always worked. I tend to have dirty trees, 
with some random patch in my tree that I do *not* want to commit, because 
it's just a Makefile update for the next version (to remind me - I've 
released kernel versions too many times with an old version number, just 
because I forgot to update the Makefile).

Or other things like that - I have small test-patches in my tree that I 
want to build, but that I don't want to commit, and I end up doing big 
merges and whole patch-application sequences with such a dirty tree 
(obviously if the patch or merge wants to change that file, I then need to 
do something about that dirty state, but it happens surprisingly seldom).

So the whole "update stuff to be committed explicitly" ends up _really_ 
shining during a merge, but it actually is how I do non-merge development 
too.

			Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 9:41 am

Hmm, does this really work so well for you guys? Because thanks to Mr.
Murphy, in my case, when I have some custom Makefile tweak, I always
need to commit some unrelated changes involving Makefile more often than
usual, and so on; so in general case, file-level changes exclusion
doesn't really work so well for me.

So this use of index seems to me really as a workaround for more
fine-grained change control (in a similar way that rename following
would be a workaround for lack of more fine-grained content moves
tracking). I will have to look into git-gui's hunk-level control and
maybe reimplement it in tig.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Petr Baudis <pasky@...>
Cc: Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 11:52 am

Well, one thing is that I obviously mainly work on a relatively large 
project, and one that has been carefully de-centralized over a long long 
time, so the source code I work with - the kernel - may be more amenable 
to my workflow than most.

For example, we have long long since tried to avoid having central files 
that everybody changes - because it's such a pain to manage, even with 
good automated merging (and even more with central people still using just 
series of patches).

In other words, in well-maintained larger projects, you simply don't see 
those kinds of conflicts very often: people don't work on the same files 
very much. I regularly go for days, and easily merging hundreds of 
thousands of lines of changes, with a dirty tree, and the merges don't 
affect it at all.

And if I happen to hit a dirty file, the pull will just say "cannot 
merge", and I can stash away my changes, and just re-do. So the cost of a 

Many people seem to enjoy per-hunk commits, but I seldom do that. Maybe 
it's just because I'm *so* comfortable with diffs, that when I clean up an 
ugly sequence of commits, what I do is literally:

 - I make sure that my ugly sequence of commits is on some temporary 
   branch, but that the _end_result_ is good and clean (ie I will have 
   tested the end result fairly well, and made sure that there are no 
   debug statements etc crud left).

   I would call this branch something like "target", because the end 
   result of that branch is what I'm looking for - even if the commits in 
   the sequence that gets me there are individually ugly!

 - I just switch back to my starting point (and now I'm usually on 
   "master"), and do

	git diff -R target &gt; diff

   to create a diff of my current tree (which is initially the starting 
   point) to the good result.

 - I actually edit the "diff" file by hand, and edit it down to the part I 
   actually want to commit as the first in the series. And then I just do 
   a "git-apply diff" to actu...
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 8:31 pm

I obviously agree with this.  As I said a few times I regret
introducing "add -i" --- it encourages a wrong workflow, in that
what you commit in steps never match what you had in the working
tree and could have tested until the very end.

-
To: Junio C Hamano <junkio@...>
Cc: Linus Torvalds <torvalds@...>, Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Thursday, May 10, 2007 - 6:06 pm

Which is why I'm considering shelving support (of some kind) in
git-gui...  but I'm probably not going to take away the current
index view, nor am I going to take away the current hunk selection.

But I would like to make it easier for non-patching-editing gods
(Linus) to pull hunks in from a shelf, test them, and commit them.

Said shelf probably would be another branch, much as Linus' nicely
documented workflow does...

-- 
Shawn.
-
To: Shawn O. Pearce <spearce@...>
Cc: Junio C Hamano <junkio@...>, Linus Torvalds <torvalds@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Thursday, May 10, 2007 - 6:51 pm

FWIW, Cogito supports shelving of uncommitted changes when switching a
branch (so that they are not retained through the switch but restored
when you switch back to the original branch) by committing the local
changes to refs/shelves/HEADNAME.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Junio C Hamano <junkio@...>
Cc: Linus Torvalds <torvalds@...>, Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 10:27 pm

On the other hand, not all changes require any testing at all. For 
example, if you're using git to manage documentation, it is totally 
reasonable to commit a fix for a simple spelling error in one part of a 
file while not committing an in-progress rewrite of another part.

-Steve

-
To: Steven Grimm <koreth@...>
Cc: Junio C Hamano <junkio@...>, Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 10:39 pm

Yeah, I don't think "git add -i" is a horrible flow - it just shouldn't be 
the only or the primary one (ie apparently it *is* the primary one for 
darcs, and that's a mistake!)

Of course, whether "git add -i" is a nice interface or not, I dunno. 
Personally, if I wanted to do hunk selection, I think I'd stick to 
something graphical where I can just click on the hunks. But that's just 
me.

		Linus
-
To: <git@...>
Date: Thursday, May 10, 2007 - 4:00 am

Note that darcs has a way to test before commit even for partial
commits. It re-creates your working tree, hardlinking unmodified
files, and runs a command there as a precommit hook.

I still prefer the old good "you commit what's in the tree, and run
whatever you want before commit", but their approach seems interesting
also in this case.

-- 
Matthieu
-
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 1:39 pm

It only sounds like a complicated sequence because you didn't write a 
script to do it...

$ git checkout -b clean origin
$ git-refine target
  (edit the patch in the editor that pops up)
$ git-refine
Test changes and commit
$ make test
...
$ git commit
  (write message)
$ git-refine
  (edit the patch, etc)
  ...
$ git commit
$ git-refine
All done.

I actually wrote it years ago, but I couldn't describe my workflow well 
enough, so I didn't submit it. If everybody seems to be doing the same 
thing, I can submit my script...

	-Daniel
*This .sig left intentionally blank*
-
To: Daniel Barkalow <barkalow@...>
Cc: Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 2:16 pm

Well, I actually think it sounds like a complicated sequence because I 
tried to explain what I do.

The "script" parts don't really end up being any smaller, and not 
scripting it actually means that I can (and often do) things outside of a 
strict scripting environment.

As mentioned, I not only mix it up with "git cherry-pick", but since I 
just use "git diff", I can - and do - things like pick only a certain set 
of files to diff and edit the patch on. 

So it's an iterative process at several levels (the "outer" level is the 
act of actually committing each change, and iterating to the next one, 
while the "inner" level is often a sequence of "git diff" exploration), 
it's not very fixed. 

For example, when I said that I do a 

	git diff -R target &gt; diff

that's not strictly true. The "git diff -R" is useful for comparing the 
current working tree to another commit, but quite often I actually end up 
doing it differently, and doing it as

	git diff ..target file &gt; diff
	.. edit ..
	git apply diff

or, if I don't need the edit (ie just the fact that I limit it to a single 
file is a sufficient "edit" in itself), I might just do

	git checkout target file

instead, which will fetch the whole file from the "target" branch (and 
also update it in the index, which may or may not actually be what I want, 
but that's a different issue).

So the "process" as far as I'm concerned is actually much more fluid than 
necessarily always working with diffs. Git gives you so many ways to do 
things like this, and I'm pretty comfortable with lots of them.

			Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>, Junio C Hamano <junkio@...>, <danahow@...>
Date: Wednesday, May 9, 2007 - 12:33 pm

Geez,  this is similar [in nature, not scale] to what I've been doing.
After reading about people "right-clicking on hunks in git-gui",
I was convinced I needed to force myself to do more manipulations
inside git itself.  Hmm...

Maybe, in addition to [or in] the User Manual, git should have some
workflow examples, which have been cribbed from various emails
on this list?

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-
To: Dana How <danahow@...>
Cc: Linus Torvalds <torvalds@...>, Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, May 9, 2007 - 1:18 pm

That's something several people have asked for, and I think it's a great
idea--I just haven't personally had much time to get to it.  But I'd
happily take even very rough patches and help get them into shape.

The way I'd thought of doing it was having an "examples" section at the
end of each chapter, with subsections for each individual example; see
the one at the end of the "exploring git history" chapter:

	http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#history-examples

They shouldn't use the material introduced in the associated chapter,
but it's also OK to introduce new commands (with references to the man
pages) when their use in the example is pretty self-explanatory.  (In
fact, this is a great way to introduce more commands and options--git
has so many that it would be tedious to try to be comprehensive, but
they'd fit well in examples.)

The patch-editing stuff discussed above might fit best at the end of
"rewriting history and maintaining patch series".

--b.
-
To: J. Bruce Fields <bfields@...>
Cc: Dana How <danahow@...>, Linus Torvalds <torvalds@...>, Martin Langhoff <martin.langhoff@...>, <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, May 9, 2007 - 1:26 pm

There is some workflow-related discussion accumulated over years in
Documentation/howto/, some of them also already suffering quite of a
bitrot.  :-(

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Petr Baudis <pasky@...>
Cc: Dana How <danahow@...>, Linus Torvalds <torvalds@...>, Martin Langhoff <martin.langhoff@...>, <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, May 9, 2007 - 1:29 pm

Yup.  I think we should one-by-one update those and suck them into the
manual.  (Patches accepted!)

--b.
-
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, Martin Langhoff <martin.langhoff@...>, <git@...>
Date: Wednesday, May 9, 2007 - 12:29 pm

On Wed, 9 May 2007 08:52:09 -0700 (PDT), Linus Torvalds wrote:
[Snip good description of rebuilding a branch to meet some "target"
state.]

That's all really good stuff. And as you mentioned you sometimes use
cherry-pick during this rebuilding, one can also use "git add -i" to
help with splitting up an ugly commit that should have been multiple
commit.

For example, a sequence might look like this, (I always use "desired"
where you use target):

	git diff HEAD desired | git apply
	git add -i
	git commit
	git reset --hard
	# test here and commit --amend as needed

And repeat that as needed. It's really no different than your "edit
the diff" approach. It's just using "add -i" instead of a text
editor. But I do admit that the commit;reset;test;--amend sequence

This reminds me of a confusing semantic issue that came about with the
"new" add. It can be quite natural to commit a single file in one step
with:

	git commit some-file.c

or to do that in two steps with:

	git add some-file.c
	git commit

(which is particularly useful if one wants to add multiple files).

I recently found myself wanting to do a similar thing with a directory
path. I can commit a path with:

	git commit path/

but I don't get anything at all like the same semantics if I do:

	git add path/
	git commit

(since "git add" will recursively add all untracked files under path/).

Now the "recursively add all files" behavior is older, and has been an
essential part of git-add forever. But I found it to be not at all
what I wanted in this case, (where I'm now trained to say "git add" to
stage things into the index).

I don't know of any good fix for the problem now. Maybe I'll just need to
remember to break out that old "git update-index" for a situation like
this, but that sure feels clunky.

-Carl
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Tuesday, May 8, 2007 - 1:35 am

Totally, when merging git's approach is incredibly useful. gitk
--merge and the resolved conflicts not appearing in the default git
diff is great stuff.

For for small, simpleminded and mostly-linear development it's not
that important. Of course, I use git on projects large and small, so I
can understand it. For someone using it with a small mostly-linear
project, the whole index thing is overkill, and the explanations

On a large project it's always a good idea to commit with explicit
paths -- regardless of your SCM. As it happens, I have to use explicit
paths with CVS, or it'll punish me by taking solid minutes to do a 2
file commit. I am sure that the mozilla and OpenOffice developers
using CVS also commit with explicit paths. Life's too short to waste
an hour.

(The times are from working on Moodle, hosted on SF.net with ~4K
files, 700 directories.).

cheers,


m
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 7:51 pm

Hi,


As you pointed out yourself, the index _has_ an idea of the content of 
that file. So, arguably, it does not point to _that_ file, but rather to 

Just another reason to hate CVS. Because it trained people to do that. If 
it was not for the training by CVS, I would have strongly opposed to the 
introduction of the "-m" switch to commit. It _encourages_ bad commit 
messages.

Now, with Git I usually let git-commit start up the editor. Because then I 
am actually encouraged to make up my mind, and put down a meaningful 
message, which might not only help _others_ to understand why I did it, 

IMHO yes, that is the message.

In addition to being nice to people used to the behaviour of "git commit" 
_without_ other arguments.

Ciao,
Dscho

-
To: <git@...>
Date: Monday, May 7, 2007 - 4:02 am

Well, this really depends on the use-case, size of commit, ...

I often use a version control system for very low importance stuff. I
don't want to type a 3-lines long message to describe a 2-lines long
change in my ~/.emacs.el for example. I also work with people using
(sorry) svn to work collaboratively, but they don't even provide a log
message: the version control system here is just a replacement for
unison/NFS/whatever other way to have people edit files from different
machines.

For sure, in a context where code quality and review is important, 
-m "xxx" isn't the way (except if you prefer your shell's line editor
to your actual editor).

-- 
Matthieu
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Monday, May 7, 2007 - 7:05 am

Hi,




I positively _hate_ empty commit messages. There is _always_ something to 

I also find it very useful for my own pleasure when reviewing some logs. I 
track config files, small scripts, documents, etc. with Git, and I found 
myself looking for something in _all_ of them. The commit messages helped.

Commit messages, BTW, are somewhat of an artform. You cannot imagine how 
slow I am writing them, because they should be helpful not only for the 
reviewer, but also for the casual git-blame user, who wants to find out 
the rationale of a change.

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 9:07 am

Hi,


  I'm maybe somewhat standing out of the crowd, but I sometimes use -m
for *very* long commit messages - just using separate -m parameters for
paragraphs and writing on; I tend to find it much more natural than
spawning an editor. Only when I find later that I've made an ugly typo
in the middle of 250-characters commandline or I figure out that I
should add some figure to the message, I throw in -e at the end and add
the final touches.


  But I agree that commit messages are somewhat of an artform, and
just finding a good headline can be quite difficult sometime. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 6:53 pm

Well, personally I practically never use it, I find that having a 
separation between what the current state of my tree is and what will be 
comitted to be one of the really "oh wow, why doens't everything else do 
this?" features.  However, i tend to be working on more than one thing at 
once, and switch between them - so I commit work on A while work on B is 
still unfinished, then start C, finish B some point later and commit it, 
and then I can finish C.  Git is the first VCS that supports a butterfly 

"git add -i" - this is a feature I have wanted since I started using 
version control ...

-- 
Julian

  ---
Your good nature will bring you unbounded happiness.
-
To: Julian Phillips <julian@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Monday, May 7, 2007 - 2:35 am

git-gui is really handy for adding/committing a subset of the changes
in your working tree. Especially for those of us with goldfish memory,
since it's so easy to see exactly what's happening: what's going to be

I thought "git add -i" was the best thing since sliced bread -- until
I found the same feature in git-gui, but with a _much_ better
interface. Just right-click on a hunk in a diff, and you have the
option of staging/unstaging that hunk. Pure magic.

-- 
Karl Hasselstr
To: Karl <kha@...>
Cc: Julian Phillips <julian@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Monday, May 7, 2007 - 9:41 pm

"git add -i" has a hunk splitting feature that git-gui lacks.
I'm thinking of adding features to git-gui to let you select a
region of a hunk using the text selection, and then stage only
that selection.  I also want to let you revert hunks from the
working directory copy.

But after reading Junio's comments about "git add -i" being a
possibly bad idea and instead letting you park everything into
a shelf, reset --hard your working directory to HEAD and then
pull things back off the shelf to be staged, I might want to
do that differently in git-gui...  like use a shelf.  ;-)


But I'm glad someone else finds the hunk feature useful in
git-gui.  I use it far too often myself.

-- 
Shawn.
-
To: Shawn O. Pearce <spearce@...>
Cc: Julian Phillips <julian@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Tuesday, May 8, 2007 - 3:37 am

On 2007-05-07 21:41:14 -0400, Shawn O. Pearce wrote:

&gt; Karl Hasselstr
To: Karl <kha@...>
Cc: Julian Phillips <julian@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Tuesday, May 8, 2007 - 10:52 am

Yea, I've played that game before too (reduce content lines) to
try and simulate a hunk splitter.  ;-)  Doesn't always work.

Right now I feel like a huge chunk of the git-gui code is simply not
maintainable.  The 0.7.0 release is really more about refactoring the
code to make it more maintainable, than it is about actual features
(though there are some new things, like vi-keys).

The hunk selection stuff is just one part of the 2,000 lines
still left in git-gui.sh itself, and that still uses a lot of
messy globals.  I want to get the code better organized before

True, but that beats the tar out of copying the - lines to your
clipboard and pasting them into your text editor, then deleting
the - prefix.  Especially if its a couple of hunks that you want
to revert.  Which I find myself doing all to often.

Actually I work around it today by staging what I care about,
then reverting the file.  Since the revert comes out of the index,
I get (mostly) the same action as reverting a particular hunk.
But it does mean that I lose my index state, if that happened to

I haven't looked at StGIT in a while.  I've seen noise on the list
about nifty features being added, but I haven't kept up with what
those features actually are.  I think you are right about this and
maybe git-gui should try to be compatible with StGIT's unapplied

Indeed; I was thinking that this very morning.  Making an index that
you stage things into, but then also saying you cannot really do that
and instead have to shelve what you don't want - that's just evil.
I'll have to think about it more.

The blame interface in git-gui needs help more than the index
staging features.  The colors suck.  ;-)
 
-- 
Shawn.
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 3:54 pm

Indeed. 

Git's index is basically very much defined as

 - sufficient to contain the total "content" of the tree (and this 
   includes all metadata: the filename, the mode, and the file contents 
   are all *parts* of the "content", and they are all meaningless on their 
   own!)

 - additional "stat" information to allow the obvious and trivial (but 
   hugely important!) filesystem comparison optimizations.

So you really should see it as *being* the content. The content is not the 
"file name" or the "file content" as separate parts. You really cannot 
separate the two. Filenames on their own make no sense (they have to have 
file content too), and file content on its own is similarly senseless (you 
have to know how to reach it).

What I'm trying to say is that git fundmaentally doesn't _allow_ you to 
see a filename without its content. The whole notion is insane and not 
valid. It has no relevance for "reality".

Also, you should realize that when you do

	git add X

you are *not* adding the filename X. No, "X" is literally a "content path 
pattern", the same way it is when you do something like

	gitk X

and it's worth always keeping in mind that in neither case is "X" 
necessarily a single file, but literally a pathname pattern that is used 
as a "filter" on all the possible patterns.

(Of course, the filtering rules are different for "git add" and "gitk": in 
the "git add" example, you filter the working tree files, while in "gitk" 
you filter the files that git already knows about, so they are different, 
but in both cases you really should think of them as filters, not as 
"filenames", even though one _trivial_ filter is to give a filter that 

No, "git commit -a" is undoubtedly _convenient_. You can use it as often 
as you like.

So as long as you see it as a convenience feature, and realize that "git 
commit" is actually a lot more powerful than just being able to always do 
the convenient, go on and use "git commit -a" all the time.

When you h...
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 12:51 pm

Hi,



The benefit is a clear distinguishing between DWIM and low level. The 
index contains _exactly_ what you told it to contain. By forcing users to 
use "-a" with "git commit", you make it clear that a separate update 
steo is involved, and if you made an error (which you see from the file 
list), you can abort, and start over with the original index.

Hth,
Dscho

-
To: <git@...>
Date: Sunday, May 6, 2007 - 1:34 pm

Reading my message (including the last 5 words of the sentence you're

In other systems, commit commits _exactly_ the content of files on


Well, with those kind of arguments, I could have my web browser not do
DNS resolution for me, because it would make it clear that a separate
step from HTTP request is involved. Still, this low-level thing brings
no benefit to the user, and I know no web browser forcing the user to

You don't necessarily see your error from the file list:

% vi foo.c
% git add foo.c
% vi foo.c
% git commit -m foo
[...]
 create mode 100644 foo.c
%

This commited the old content of foo.c, while I hardly see any
scenario where this is the expected behavior.

Then, being able to repare the error if I made it is interesting, but
I don't get the reason why the error could not just be avoided.

Well, indeed, I just found a thread talking about this:

  http://lists-archives.org/git/196050-making-git-commit-to-mean-git-commit-a.html

I'll go through it, I might understand better after that ;-).

Thanks,

-- 
Matthieu
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 7:42 pm

Hi Matthieu,


Okay, I rephrase the (badly worded) question:


Because they do not realize that the file _names_ are actually only a key, 
not the value.

With Git, it is possible to stage changes, but also to have a dirty stage.

Think, for example, about debugging a program. Many programs have 
Makefiles, which define CFLAGS without "-g". Now you want to debug. Since 
gdb acquired the bad habit of not working properly at all without that 
flag (which is especially apparent when single stepping jumps around 
wildly in the source code), you _have_ to change the Makefile to include 
"-g" with the CFLAGS.

But you don't want to commit _that_. It is no useful change for the 
project. Submitting such a patch makes you look foolish. So, you leave it 
out of the commit.

And to make you _aware_ that it is a real possibility, and often a 
desirable one, git-commit makes you specify "-a" when you are _sure_ that 
you want to commit _all_ of your changes to the tracked files.

With CVS (which has been bashed on a lot on this list, and rightfully so), 
after a mistaken "cvs commit" _with_ irrelevant changes, like the change 
to the Makefile I illustrated above), you have two options:

	- leave it as it is (possibly undoing the change in a subsequent 
	  commit), or

	- edit the files, which often leads to an inconsistent repository. 
	  Yeah, sure, you can checkout the newest state, but you cannot 

Well, I use it quite a lot. But 30% of the time, I prefer to commit with 
specific filenames, so I can be sure _what_ I commit. FWIW, I picked up on 
that practice when using CVS...

There are even about 20% of the time, when I use "git commit" _without_ 
any parameters, because I used "git add" to tell Git that I resolved some 
conflicts, or that I want this file to be committed, while other files 

No. _You_ never need to tell the browser _not_ to resolve via DNS.

But _you_ sometimes _need_ to commit with _different_ parameters than 
"-a". You might not realize that _now_. But a...
To: <git@...>, Matthieu Moy <Matthieu.Moy@...>, <danahow@...>
Date: Sunday, May 6, 2007 - 2:22 pm

You might find it useful to break your question into 2 pieces.

One is what information should be in the index,
which essentially is what Linus addresses.
The way I look at this, at the moment,
is that the index contains whatever's required to make git-write-tree
work without collecting information elsewhere.
I suspect this is the correct historical way to look at this,
but I wasn't on this list then.

The other is how to get information into the index.
I think this is the original thing that seemed strange to you?
It did to me.  But,  in part,  since git has both "git-commit"
and "git-commit -a",  this is somewhat recognized.
I've wondered if there's a way to improve this,  but I don't
have any coherent ideas right now.  Thanks for finding
and posting that thread;  that was helpful.

Also, the idea of an index isn't all that strange.  I need
to use perforce at work,  and it has an index (called "db.have").
But it is stored on the server and has everyone's state mixed
together,  uses the type of file IDs Linus complains about,
and is more difficult to manipulate (hence less useful).
Being on the server is a great performance bottleneck as well.

Dana



-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-
To: Matthieu Moy <Matthieu.Moy@...>
Cc: <git@...>
Date: Sunday, May 6, 2007 - 1:43 pm

One reason why is because you are using "-m foo" (a very
non-descriptive commit message that would not help anybody
including yourself in the future).  Try the above without giving
such a bogus error message with "-m" to commit, but instead let
it spawn your editor --- you would be doing that in real-life
when you are doing anything nontrivial.  Then notice what
appears on the file list of "Changed but not updated" section.

A single liner "-m" is handy for "Oops, typofix in foo.c" kind
of commit, but in such a case you literally would be changing
only the typofix and won't have "edit foo.c; git add foo.c; edit
foo.c; git commit" sequence anyway.

I think Linus explained quite well to correct your doubts in
your original message, and I do not have anything to add.

-
To: Junio C Hamano <junkio@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 8:52 am

I don't get this argument - I frequently write quite long descriptions
inside the -m argument(s), since I just find it more convenient than
having to edit it in an editor, for various reasons. So there is really
no reason why the "-m is only for short single-liner commit messages"
hypothesis could hold true.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <junkio@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 9:57 am

Hi,


Another reason is that you can see how the end result will look like in an 
editor. For example, you'll have a hard time making sure in the 
command line that the lines are no longer than 76 characters.

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Junio C Hamano <junkio@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 10:24 am

Hi,


  oh, indeed - good point. cg-commit uses fmt to format the message, I
think git-commit should do the same; let's see how controversial such a
change would be.

---
This makes git-commit filter log messages provided on commandline by fmt,
thus making nice paragraphs from them. This makes it possible to specify
even long commit messages on command line without worrying about this, akin
to cg-commit.

Signed-off-by: Petr Baudis &lt;pasky@suse.cz&gt;
---

 git-commit.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-commit.sh b/git-commit.sh
index f28fc24..28cbb55 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -432,7 +432,7 @@ fi
 
 if test "$log_message" != ''
 then
-	echo "$log_message"
+	echo "$log_message" | fmt
 elif test "$logfile" != ""
 then
 	if test "$logfile" = -


-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: Petr Baudis <pasky@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 8:45 pm

Two points.

 * You would not want to wrap the first line;

 * 75-column is not ideal for every project, so this needs to be
   customizable;

 * If we were to munge the given message, we would probably also
   want to enforce "single-liner summary, empty line, and then
   the rest" convention.

Well, I have three there, but I suspect the first two somebody else
may have said already, so...

This is slightly related, but I have been wondering about the
interaction with "single-liner summary, empty line and then the
rest" convention and various commands in the log family.

Currently, --pretty=oneline and --pretty=email (hence format-patch)
take and use only the first line.  I think we could change it to:

 - take the first paragraph, where the definition of the first
   paragraph is "skip all blank lines from the beginning, and
   then grab everything up to the next empty line".

 - replace all line breaks with a whitespace.

This change would not affect well-behaved commit messages that
adhere to the convention, as their first paragraph always
consist of a single line.  On the other hand, people from
different culture can get frustrated by their commit message
chomped at the first linebreak in the middle of sentence right
now, which would be helped by this change.

Their Subject: and --pretty=oneline output would become very
long and unsightly, but their commit messages are already
ugly anyway, and such a change at least avoid the loss of
information.

If we were to do this, Subject: line would most likely use
RFC2822 line folding at the places where line breaks were in the
original, but that goes without saying.

What do people think?


-
To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <junkio@...>, Matthieu Moy <Matthieu.Moy@...>, <git@...>
Date: Wednesday, May 9, 2007 - 11:01 am

Hi,


FWIW, I have a builtin git-fmt in my local repo, which uses the (slightly 
enhanced) functions in utf8.c... Maybe after 1.5.2 I dare to submit 
this...

Ciao,
Dscho

-
To: <git@...>
Date: Wednesday, May 9, 2007 - 10:59 am

I wouldn't do that for the first line of the message.

Someone typing

$ git commit -m "a very very very very very very very very very very very very long summary" \
             -m "a longer description of the above summary"

Probably doesn't want his first line to be broken (otherwise,
git-format-patch and other tools would be confused).

So, that would be more like

echo "$log_message" | (read first_line; echo "$first_line"; fmt)


Perhaps another option would be to provide, say, a -M option, doing

log_message="$log_message

$(echo $1 | fmt)"

to allow people to explicitely say whether they want reformatting. But
that's probably overkill.

-- 
Matthieu
-
To: <git@...>
Date: Wednesday, May 9, 2007 - 11:11 am

Hmm, I don't really know if it's more evil to split an extra-long line
to two or keep it longer than the maximum sane width. Since I'm torn,
I'd prefer to go for the version that's simpler (also, avoids weird
results for those who for some reason chose not to follow the usual
convention, but that's a minor point).

I don't really care, but if noone else does either, I'd stay with the
current simple version. :)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To: <git@...>
Date: Wednesday, May 9, 2007 - 11:32 am

The evil already happened several times in git's repository ;-).

$ git log --all --pretty=oneline | grep \
 ' ................................................................................' \
 | wc -l
81
$

When I encounter such long line, I often just don't care, since my
terminal or tool (gitk ...) is often more than 80 char. And in the
cases I care, the fix is just to enlarge the window or to scroll (only
people using a text-mode console would _really_ be disturbed).

With the other solution (breaking the line automatically), I have no
easy fix. In gitk, I have the beginning of a sentence in the summary
field, in a mailed patched, I have the sentence split between the
Subject: header and the body.

(but we agree that both cases are evil. Perhaps just "ERROR: you're
doing evil" would be better ...)

-- 
Matthieu
-
Previous thread: How to set git commit timestamp by Guido Ostkamp on Sunday, May 6, 2007 - 12:03 pm. (5 messages)

Next thread: Wiki front page pointing to HelpOnLanguages by Matthieu Moy on Sunday, May 6, 2007 - 12:15 pm. (3 messages)
speck-geostationary