Re: git and bzr

Previous thread: [PATCH] Add support for commit.signoff config option by Andy Parkins on Tuesday, November 28, 2006 - 8:02 am. (1 message)

Next thread: Re: [PATCH 0/2] Making "git commit" to mean "git commit -a". by Jakub Narebski on Tuesday, November 28, 2006 - 9:23 am. (1 message)
To: <git@...>
Cc: <bazaar-ng@...>
Date: Tuesday, November 28, 2006 - 8:37 am

That doesn't change the fact that "git pickaxe" abilities in "git blame"
is more than just equivalent of "cvs annotate".

----
bzr annotate FILENAME
Show the origin of each line in a file.

----
git-blame [-c] [-l] [-t] [-f] [-n] [-p] [-L n,m] [-S <revs-file>]
[-M] [-C] [-C] [--since=<date>] [<rev>] [--] <file>

Annotates each line in the given file with information from the revision
which last modified the line. Optionally, start annotating from the given
revision.

Also it can limit the range of lines annotated.
[...]
Also you can use regular expression to specify the line range.
git blame -L '/^sub hello {/,/^}$/' foo
would limit the annotation to the body of hello subroutine.

When you are not interested in changes older than the version v2.6.18, or
changes older than 3 weeks, you can use revision range specifiers similar
to git-rev-list:
git blame v2.6.18.. -- foo
git blame --since=3.weeks -- foo

http://kernel.org/pub/software/scm/git/docs/git-blame.html
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

-

To: Jakub Narebski <jnareb@...>
Cc: <bazaar-ng@...>, <git@...>
Date: Tuesday, November 28, 2006 - 9:35 am

Hi,

You should also mention that git-annotate can follow code movements
through file renames.

I know, because I was already rightfully blamed for code which was moved
by somebody else.

Ciao,
Dscho

-

To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Jakub Narebski <jnareb@...>, <bazaar-ng@...>, <git@...>
Date: Tuesday, November 28, 2006 - 12:08 pm

.. and within the same file, and _copied_ from other files.

A good example of this is still just doing a

git blame -C revision.c

because that "revision.c" file was created by splitting the old
"rev-list.c" into two files (revision.c and rev-list.c). And the fact that
"git blame" catches it and shows it in a very natural format is really
quite nice.

(rev-list.c has since been renamed to "builtin-rev-list.c", so if you want
to see the "other" side of the split, just do

git blame -C builtin-rev-list.c

in order to realize how well git blame follows both renames _and_ pure
data movement).

The reason this is a good example is simply the fact that it should
totally silence anybody who still thinks that tracking file identities is
a good thing. It explains well why tracking file identities is just
_stupid_.

You simply couldn't have done that kind of split sanely with file identity
tracking (well, that one only had a single copy, so you could argue that a
file identity tracker with copies could have done it, but the fact is that
(a) they never do and (b) "git blame" can equally well track stuff that
comes from _multiple_ different "file iddentities").

Such a "multiple sources" case can actually be found by doing

git blame -C tree-walk.c

which (correctly) figures out that the code comes from both merge-tree.c
(the "entry compare/extract" functions)_and_ from sha1_name.c (the
"find_tree_entry()" function).

So yes, "git blame" is a _hell_ of a lot more powerful than anybody elses
"annotate", as far as I know. I literally suspect that nobody else comes
even close.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, <bazaar-ng@...>, <git@...>
Date: Tuesday, November 28, 2006 - 1:44 pm

I'm unfamiliar with git so I could be totally wrong here!

I know that bzr supports file renames/moves very effectively and I
understood that git doesn't support this to the same extent (correct me
if I am wrong as I have not used git at all!).

If that is the case, could that be because bzr gives each file its own
id and can detect this easily but git's content based approach can't? If
so then claiming file identifiers is *stupid* seems a bit extreme. So I
would have thought *both* file identifiers and line/content identifiers
are needed for tracking changes made to the files and to their contents
respectively. When a file is copied then the contents are copied and it
is given a new file identifier. When a file is moved it keeps the same
identifier. So don't you need file identifiers as well as line/content
identifiers?

Nick
-

To: Linus Torvalds <torvalds@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, <bazaar-ng@...>, <git@...>
Date: Tuesday, November 28, 2006 - 1:07 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

No need to be aggressive about this. Yes, it's true that file identity
doesn't directly solve this problem, but it doesn't prove that an
identity-based approach is wrong.

In the end, everything comes down to identity of some kind. Because if
you're going to apply someone else's changes, you must apply them to the
same thing that they changed.

Git determines identity based on content, while bzr has the user
indicate it. Both approaches work.

Bzr supports merging based on line identity (our weave merge, not our
knit merge). At the moment, our concept of line identity is based on

I think you're wrong about that. There's nothing stopping bzr from
inferring a file split, or even explicitly recording it. bzr doesn't
record copies, because we think there are no sane merge semantics across

I notice that blame has an option to limit the annotation to recent
history. I can only assume that is for performance reasons. bzr
annotate doesn't need a feature like that, because annotations are
explicit in bzr's storage format. I expect that even if we were to
extend annotate to track content across files, it would still be so fast
that we wouldn't need it.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFbGy70F+nu1YWqI0RAt75AKCAy0ALi0IKzqZpgnavJrx97+lhDgCfaMSe
fs4Lt77k1/OXC82aFbh5pKg=
=/OiA
-----END PGP SIGNATURE-----
-

To: Aaron Bentley <aaron.bentley@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, <bazaar-ng@...>, <git@...>
Date: Tuesday, November 28, 2006 - 2:00 pm

You'd assume wrong.

Trust me, if you talk about performance, bzr will lose. I can pretty much
guarantee you that you perform worse. The mozilla discussion pointed to a
performance test between hg and bzr, and hg in that test tended to perform
better by a factor of 2-10. And git tends to be another factor faster than
_that_.

Performance is important to git, but it's important not in the sense of
"let's not do it because it performs badly", but in the sense of "things
should be so fast that people don't even realize that they are done". You
guys may count commit times in seconds. I still want to commit multiple
patches _per_second_ to the kernel tree. THAT is performance.

So no, performance wasn't the reason.

The reason is simple: be logical. The original blame/annotate semantics
were

git blame filename

which is what people traditionally use, but then to specify which version
to _start_ with (in case you wanted to go backwards in time), you had an
optional revision argument at the end.

Which is totally against how all the other git programs work, and I
complained, because I had actually wanted to see the blame at a particular
release version, and what my fingers typed didn't work. I want to be able
to do

git blame [revno] [--] filename

the same way I can ask for a git log, git whatchanged, gitk, and any
other such history tool.

And once you do the same command line parsin as the other log-related
commands, you pretty much automatically get the revision limiting. So now
you can do

git blame v2.6.17..v2.6.18 filename

on the kernel archive to see who is to blame for certain lines in a
certain _range_ of commits. It just fell out of using the same syntax
everywhere.

It's also happens to be useful. Quite often, you know something broke
after a particular known-good release, so you're interested in the blame,
but anything older than that known-good release is simply noise, and
actually takes AWAY from the information, by just maki...

Previous thread: [PATCH] Add support for commit.signoff config option by Andy Parkins on Tuesday, November 28, 2006 - 8:02 am. (1 message)

Next thread: Re: [PATCH 0/2] Making "git commit" to mean "git commit -a". by Jakub Narebski on Tuesday, November 28, 2006 - 9:23 am. (1 message)