Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle

Previous thread: VCS comparison table by Jon Smirl on Saturday, October 14, 2006 - 8:07 am. (1 message)

Next thread: Re: [PATCH 1/2] Delete ref $frotz by moving ref file to "deleted-$frotz~ref". by Junio C Hamano on Saturday, October 14, 2006 - 11:47 am. (4 messages)
From: Jakub Narebski
Date: Saturday, October 14, 2006 - 9:40 am

It is quite obvious that comparison of programs of given type (SMC)
on some program site (Bazaar-NG) is usually biased towards said program,
perhaps unconsciously: by emphasizing the features which were important

For example simple namespace for git: you can use shortened sha1
(even to only 6 characters, although usually 8 are used), you can
use tags, you can use ref^m~n syntax.

I'm not sure about "No" in "Supports Repository". Git supports multiple
branches in one repository, and what's better supports development using
multiple branches, but cannot for example do a diff or a cherry-pick
between repositories (well, you can use git-format-patch/git-am to
cherry-pick changes between repositories...).

About "checkouts", i.e. working directories with repository elsewhere:
you can use GIT_DIR environmental variable or "git --git-dir" option,
or symlinks, and if Nguyen Thai Ngoc D proposal to have .gitdir/.git
"symref"-like file to point to repository passes, we can use that.

Partial checkouts are only partially supported as of now; it means
you have to do some lowe level stuff to do partial checkout, and be
carefull when comitting. BTW it depends what you mean by partial
checkout, but they are somewhat incompatibile with atomic commits
to snapshot based repository.

Git supports renames in its own way; it doesn't use file ids, nor
remember renames (the new "note" header for use e.g. by porcelains 
didn't pass if I remember correctly). But it does *detect* moving
_contents_, and even *copying* _contents_ when requested. And of
course it detect renames in merges.

Git doesn't have some "plugin framework", but because it has many
"plumbing" commands, it is easy to add new commands, and also new
merge strategies, using shell scripts, Perl, Python and of course C.
So the answer would be "Somewhat", as git has plugable merge strategies,

Gaah, subscribe-to-post mailing list!
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


-

From: Jon Smirl
Date: Saturday, October 14, 2006 - 10:18 am

I believe they mean checking out only the latest few revisions instead
of copying the whole repo. This issue is a problem for Mozilla. If you
want to change a line in the git version you have to download the

I believe partial checkout means being able to check one directory
tree out of the repo and work on it while ignoring what is happening
in the rest of the repo. This is another issue for Mozilla which has



-- 
Jon Smirl
jonsmirl@gmail.com
-

From: Jakub Narebski
Date: Saturday, October 14, 2006 - 10:42 am

From http://bazaar-vcs.org/RcsComparisons
  A "Checkout" is a working tree that points elsewhere for its RCS data.

You can always do like Linux kernel did, splitting repository into 
current and historical part (which would contain also dead branches), 
and creating and publishing current-historical graft file, to join 

So split different projects into different repositories. There was some 
helper program (git-splitrepo or something like that) for that posted 
on git mailing list. And use "superrepository" to gather all projects 
together (see last discussion about subprojects on git mailing list).
-- 
Jakub Narebski
Poland
-

From: Aaron Bentley
Date: Monday, October 16, 2006 - 3:26 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Bazaar's namespace is "simple" because all branches can be named by a
URL, and all revisions can be named by a URL + a number.

If that's true of Git, then it certainly has a simple namespace.  Using

That sounds right.  So those branches are persistent, and can be worked

It sounds like the .gitdir/.git proposal would give Git "checkouts", by

Yes, I'm very much aware of that tension.  It will be fun when Bazaar

You'll note we referred to that bevhavior on the page.  We don't think
what Git does is the same as supporting renames.  AIUI, some Git users

It sounds like you're saying it's extensible, not that it supports
plugins.  Plugins have very simple installation requirements.  They can
provide merge strategies, repository types, internet protocols, new
commands, etc., all seamlessly integrated.

What you're describing actually sounds like the Arch approach to
extensibility: provide a whole bunch of basic commands and let users
build an RCS on top of that.

As the author of two different Arch front-ends, I can say I haven't
found that approach satisfactory.  Invoking multiple commands tends
re-invoke the same validation routines over and over, killing
efficiency, and diagnostics tend to be pretty poorly integrated.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNAb90F+nu1YWqI0RAvRDAJ9HHHdbhT1+aA3wOGeuUDkjRIr7BQCcDBKB
cL+DAy5GdTDk8Iz9TUkQ//M=
=AJAu
-----END PGP SIGNATURE-----
-

From: Andy Whitcroft
Date: Monday, October 16, 2006 - 3:35 pm

In my experience there are two key features to rename support.  The
first that files move about efficiently ie. we don't have to carry a
different copy of the same file for each name it has had, this git
handles nicely.  The second is the seemless following of history 'back',
this git does not do trivially (when limited to specific files).  git
log on a renamed file pretty much stops at the rename point and you have
deal with it yourself.

I would love to see someone respond with a pickaxe like command line
which would list each and every change and its origin though merges and
the like.

Hmmm.

-apw
-

From: Jakub Narebski
Date: Monday, October 16, 2006 - 4:19 pm

Well, all refs (branches and tags) are named by [relative] path. So for
example we can have 'master', 'next', 'jc/diff' branches, 'v1.4.0' and
'examples/tag' tags. Cogito for example uses <repository URL>#<branch>

Well, <ref>~<n> means <n>-th _parent_ of a given ref, which for branches
(which constantly change) is a moving target.

There was proposal to add some kind of serial number to git (like 
Subversion revision numbers) and even solution how to do this...
but one must realize that any serial number must be _local_ to the
repository. One cannot have universally valid revision numbers (even
only per branch) in distributed development. Subversion can do that only
because it is centralized SCM. Global numbering and distributed nature
doesn't mix... hence contents based sha1 as commit identifiers.


But this doesn't matter much, because you can have really lightweight
tags in git (especially now with packed refs support). So you can have

Branches are persistent, have _separate_ (!) namespace (are not
incorporated in repository URL according to some kind of convention
like in Subversion), can be worked independently, you can easily
switch between branches in one working directory. Branches are cheap
in git (notion of topic branches).

I wonder if any SCM other than git has easy way to "rebase" a branch,
i.e. cut branch at branching point, and transplant it to the tip
of other branch. For example you work on 'xx/topic' topic branch,
and want to have changes in those branch but applied to current work,
not to the version some time ago when you have started working on
said feature.

What your comparison matrick lacks for example is if given SCM
saves information about branching point and merges, so you can
get where two branches diverged, and when one branch was merged into

Actually it is better to work with clone of repository, perhaps either
symlinking object database, or by alternates mechanism (with alternates
repositories would share old history, but gather new ...
From: Nguyen Thai Ngoc Duy
Date: Monday, October 16, 2006 - 4:39 pm

I agree. Each Git repository is designed to work with one working
directory. Using .gitdir/.git proposal, you are likely to checkout two
working directories from one repo.
-- 
Duy
-

From: Aaron Bentley
Date: Monday, October 16, 2006 - 9:56 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Ah.  Bazaar uses negative numbers to refer to <n>th parents, and
positive numbers to refer to the number of commits that have been made

Sure.  Our UI approach is that unique identifiers can usefully be
abstracted away with a combination of URL + number, in the vast majority

The nice thing about revision numbers is that they're implicit-- no one

If I understand correctly, in Bazaar, you'd just merge the current work

I'm not sure what you mean about divergence.  For example, Bazaar
records the complete ancestry of each branch, and determining the point
of divergence is as simple as finding the last common ancestor.  But are
you considering only the initial divergence?  Or if the branches merge
and then diverge again, would you consider that the point of divergence?

merge-point tracking is a prerequisite for Smart Merge, which does

I'm not sure what you mean by API, unless you mean the commandline.  If
that's what you mean, surely all unix commands are extensible in that
regard.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNGKQ0F+nu1YWqI0RAsW+AJoDOsNRmBjo3raT43JL6qn7SuJNRwCfe9l5
oAZ9OyrxMQlHnwrruhcjz9Y=
=RNuG
-----END PGP SIGNATURE-----
-

From: Shawn Pearce
Date: Monday, October 16, 2006 - 10:20 pm

But this only works when the URL is public.  In Git I can just lookup
the unique SHA1 for a revision in my private repository and toss it
into an email with a quick copy and paste.  With Bazaar it sounds
like I'd have to do that relative to some known public repository,
which just sounds like more work to me.

But I don't want to see this otherwise interesting thread devolve into

Git has two approaches:

 - merge: The two independent lines of development are merged
   together under a new single graph node.  This is a merge commit
   and has two parent pointers, one for each independent line of
   development which was combined into one.  Up to 16 independent
   lines can be merged at once, though 12 is the record.

 - rebase: The commits from one line of development are replayed
   onto a totally different line of development.  This is often
   used to reapply your changes onto the upstream branch after the
   upstream has changed but before you send your changes upstream.
   It can often generate more readable commit history.

I believe what you are talking about in Bazaar is the former (merge)

I'm believe you nailed what Jakub was talking about on the head.
And yes, I noticed its in your matrix but its not very clear.
I think that some additional explanation there may help other
readers.
 
-- 
Shawn.
-

From: Martin Pool
Date: Tuesday, October 17, 2006 - 1:21 am

Yes, but then people need to know how to get it out of your private
repository.  For stuff that goes into well-known repositories I suppose

You can also name a revision using its UUID, in which case things will


For the 'rebase' operation in Bazaar you can use 'bzr graft':

  http://spacepants.org/src/bzrgraft/

-- 
Martin
-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 1:16 am

What do you do once a branch has been thrown away, or has had 20 other 
branches merged into it? Does the offset-number change for the revision 

merge != rebase though, although they are indeed similar. Let's take the 
example of a 'master' branch and topic branch topicA. If you rebase 
topicA onto 'master', development will appear to have been serial. If 
you instead merge them, it will either register as a real merge or, if 
the branch tip of 'master' is the branch start-point of topicA, it will 
result in a "fast-forward" where 'master' is just updated to the 

I'm fairly certain he's talking about the API in the sense it's being 
talked about in every other application. Extensive work has been made to 
libify a lot of the git code, which means that most git commands are 
made up of less than 400 lines of C code, where roughly 80% of the code 
is command-specific (i.e., argument parsing and presentation).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 1:01 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


We always track the number of parents since the initial commit in the

Ah, now I see what you mean, and the "graft" plugin mentioned by others


Ah, okay.

So it sounds to me like git is extensible, though not as thoroughly as bzr.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNTat0F+nu1YWqI0RAn9aAJ9WzMrM72be+3SlwCpvJXQ/X2Y3nQCfeYk3
NTIJuZSze9URUaAsiO4Hu5o=
=9nvr
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 2:01 pm

While this I think is quite reliable (there was idea to store "generation
number" with each commit, e.g. using not implemented "note" header, or
commit-id to generation number "database" as a better heuristic than
timestamp for revision ordering in git-rev-list output), and probably
independent on repository (it is global property of commit history,
and commit history is included in sha1 of its parents), numbering branching

Very useful as a kind of poor-man's-Quilt (or StGit). You develop some
feature step by step, commit by commit in your repository cooking it
in topic branch. Then before sending it to mailing list or maintainer
as a series of patches (using git-format-patch and git-send-email)
you rebase it on top of current work (current state), to ensure that

Fast-forward is a really good idea. Perhaps you could implement it,

I think having good API for C, shell and Perl (and to lesser extent for any
scripting language) means that it is extensible more. Git is not as of yet
libified; when it would be we could think about bindings for other
programming languages (there is preliminary Java binding/interface).
-- 
Jakub Narebski
Poland
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 2:27 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



We support it as 'pull', but merge doesn't do it automatically, because
we'd rather have merge behave the same all the time, and because 'pull'

I guess it's a value judgement on which is more important to extensibility:

Git has more language support.

Bzr has plugin autoloading, Protocol plugins, Repository format plugins,
and more.  Because Python supports monkey-patching, a plugin can change
absolutely anything.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNUrP0F+nu1YWqI0RAizXAJ0Wnf2ZoIRpaba3mX2L4pN9XcWDPQCePtg/
G/W6Oxm+kd8SzhGEEfLAxL8=
=VqC7
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 2:51 pm

We want linear history, not polluted by merges. For example you cannot
send merge commit via email. Another problem is that you want to
send _series_ of patches, string of commits (revisions), creating feature
part by part, with clean history; with merge you get _final result_
which will apply cleanly, with rebase you would get that series

I smell yet another terminology conflict (although this time fault is
on the git side), namely that in git terminology "pull" is "fetch"
(i.e. getting changes done in remote repository since laste "fetch"

Which is _not_ a good idea. Git is created in such way, that the repository
is abstracted away (introduction of pack format, and improving pack format
can and was done "behind the scenes", not changing any porcelanish (user)
commands), but we don't want any chage that would change this abstraction.
Changing repository format is not a good idea for "dumb" protocols; native
protocol is quite extensible (for example there was introduced multi-ack
extension for better downloading of multiple branches with lesser number
of object in the pack sent; even earlier there were intoduced thin packs),
and does a kind of feature detection between client and server. Adding
cURL based FTP read-only support to existing HTTP support was a matter
of few lines, if I remember correctly.

Besides, if monkey-patching is something akin to advices, I guess that
performance might suffer.


To make perhaps not that good analogy. In git adding new commands is
like adding new filesystem to Linux kernel using existing VFS interface,
or existing FUSE/LUFS interface. In Bazaar adding new command is like
writing new filesystem support (plugin) in mikrokernel like L4/Mach.
(And please take note for what project git was created for :-))

-- 
Jakub Narebski
ShadeHawk on #git
Poland
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 3:28 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Yes, that's something that I'd heard about the kernel development
methodology-- that a series of small patches is preferred to one patch
that makes the whole change.

That's not the way we operate.  We like to review all the changes at
once.  But because bundles are applied with a 'merge' command, not a
'patch' command, an old bundle will tend to apply more cleanly than an


I'm not sure what you think Bazaar does.  In Bazaar, a repository format
plugin  implements the same API that a native repository format does.


I can't parse this.  Repository formats and protocols are different

I was meaning dumb protocol extension.  I can't say how extensible the

We support read and write over native, ftp and WebDAV (a plugin).  We

No, monkey-patched code executes at the same speed as unpatched code.
There are arguments against monkey-patching, but speed is not one of them.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNVkM0F+nu1YWqI0RAjCaAJwOcWSUdVy7RpUZROJVxAC9aj/V/wCfUg0T
uHkdc9k6i+v0QnhEvTXdszM=
=YO8G
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 3:57 pm

Perhaps it would be nice to have "bundles" in git too. As of now
we can save arbitrary part of history in a pack, but it is binary
not textual representation.

Some of git workflow stems from old, pre-SCM Linux kernel workflow
of sending _patches_ via email.


By the way, are bzr "bundles" compatibile with ordinary patch?
git-format-patch patches are. They have additional metainfo,

But if I remember correctly Subversion does not remember merge points
(merge commits), so how can you provide full Bazaar-NG compatibility
with Subversion repository as backend? Some repository formats lack
some features. Besides, as I said repository database and stuff is
quite well abstracted away.

In git we have import tools (most of them capable of incremental import),
a few exchange tools like git-cvsexportcommit, git-cvsserver, and

"Dumb" protocols in git are protocols for which server provides access
to contents git repository plus some additional info (usually generated
using hooks). The client (be it git-fetch or git-push) discovers which
files to download or what to upload, but it only can download repository
"as is". So if server repository was created with repository format plugin,

Native git protocol (git:// and git+ssh://) does feature discovery, then
negotiates what contents has to be send, and finally tries to send minimal

Git has read-only access over git:// protocol (served by git-daemon on
port 9418), read-write access over git+ssh:// protocol (you can limit
exposition using git-shell), read-only access via HTTP, HTTPS, FTP "dumb"
protocols, read-write access via WebDAV "dumb" protocol.

Git is open-source, we don't need plugins ;-)
-- 
Jakub Narebski
ShadeHawk on #git
Poland
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 3:59 pm

And deprecated read-only (I think), deprecated, suggested to use only
for cloning, rsync:// "dumb" protocol.
-- 
Jakub Narebski
Poland
-

From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 4:16 pm

Actually, the reason to _not_ have bundles very much stems from the fact 
that BK did have bundles, and they were pretty horrid.

It would be easy to send the exact same data as the native git protocol 
sends over ssh (or the git port) as an email encoding. We did that a few 
times with BK (there it's called "bk send" and "bk receive" to pack and 
unpack those things), and after doing it about five times, I absolutely 
refused to ever do it again. There's just no point, except to make your 
mailbox grow without bounds, and it was really annoying. 

So sending things as patches is just a lot more convenient if you want 
emails.  And if you want to sync two repos directly, I think we've gotten 
sufficiently past the old UUCP days when you want to use email as a 
packetization medium.

That said, "bundles" certainly wouldn't be _hard_ to do. And as long as 
nobody tries to send _me_ any of them, I won't mind ;)

		Linus
-

From: Jeff King
Date: Tuesday, October 17, 2006 - 10:36 pm

I never used BK, but my understanding is that it was based on
changesets, so a bundle was a group of changesets. Because a git commit
represents the entire tree state, how can we avoid sending the entire
tree in each bundle? The interactive protocols can ask "what do you
have?" but an email bundle is presumably meant to work without a round
trip.

We could always make a guess ("git send --remote-has master~10") but
that seems awfully error-prone. I assume a changeset-oriented system
would implicitly keep some concept of "I think Linus is at master~10"
and do it automatically.

-Peff
-

From: Junio C Hamano
Date: Tuesday, October 17, 2006 - 10:57 pm

We could always anchor at a well known point ("git send v2.6.18..").
If you as the recipient do not have the preimage, the "bundle" would
identify what the assumed common ancestor is and you can fetch
it before proceeding.

-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 7:52 am

That's not the problem. That's easy to handle - and we already do. That's 
the whole point of the wire-transfer protocol (ie sending deltas, and only 

Right, but they can do exactly what bk did: you have to have a reference 
to what the other side has. In git, that's usually even simpler: you'd do

	git send origin..

and that "origin" is what the other end is expected to already have.

Of course, if you send an unconnected bundle (ie you give an origin that 
the other end _doesn't_ have), you're screwed.

In other words, to get such a pack, we'd _literally_ just do something 
like

	git-rev-list --objects-edge origin.. |
		git-pack-objects --stdout |
		uuencode

and that would be it. You'd still need to add a "diffstat" to the thing, 
and tell the other end what the current HEAD is (so that it knows what 
it's supposed to fast-forward to), but it _literally_ is that simple.

"plug-in architecture" my ass. "I recognize this - it's UNIX!".

		Linus
-

From: Petr Baudis
Date: Wednesday, October 18, 2006 - 11:52 am

Dear diary, on Wed, Oct 18, 2006 at 04:52:25PM CEST, I got a letter

Took me exactly an hour from mkdir cogito-bundle to cg-push to
kernel.org. :-)

cogito-bundle is an example on how to create third-party addons or
plugins adding own commands to Cogito and using Cogito's infrastructure.
It's not _that_ easy currently since you have to replicate large part of
the build infrastructure locally; that could be fixed by installing some
"library makefiles" and asciidoc toolkit to /usr/share or something, if
there would be a real demand for such an addon API. cg-help and the cg
wrapper will pick up the newly installed commands automagically. The
only thing missing is updating cogito(7) to list the addon commands,
which would take a bit more work.

Though it's an example, it's actually supposed to be useful, by doing
exactly what is outlined above - l - it lets you exchange commits over
mail by so-called "bundles", similar to e.g. Bazaar bundles - basically,
it is like push or fetch, but over email, and the commit ids are
preserved when transferred in bundles (if you just send patches, the
commit ids will end up different).

The provided cg-bundle and cg-unbundle commands are rather crude and
don't support many things - they don't actually include a diff, only a
diffstat, etc. The uuencoded bundle is inlined in the mail, which I
suspect isn't very useful; perhaps it would be more practical to just
attach it binarily. Feel free to send patches (or bundles ;).

An example bundle is available at

	http://pasky.or.cz/~pasky/cp/example-bundle.txt

as generated by

	cogito.master$ cg-bundle -r v0.18 -m"Subject is this" \
		-m"And some body now..." --stdout

and cogito-bundle is available at

	git://git.kernel.org/pub/scm/cogito/cogito-bundle.git/
	(gitweb http://kernel.org/git/?p=cogito/cogito-bundle.git)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo ...
From: Petr Baudis
Date: Wednesday, October 18, 2006 - 11:59 am

Dear diary, on Wed, Oct 18, 2006 at 08:52:25PM CEST, I got a letter

By the way, originally I just wanted to index and save the pack, but
when trying to feed it to git-index-pack, I kept getting

	fatal: packfile '.git/objects/pack/pack-b2ab684daebea5b9c5a6492fa732e0d2e1799c8e.pack' has unresolved deltas

while feeding it to git-unpack-objects works fine. Any idea what's wrong?

(BTW, I got the id by sha1summing the pack file; is there an existing
way to name a pack properly if I have it lying around, unnamed? sha1sum
seems to be specific to a fairly new GNU coreutils version.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 12:04 pm

Yes.  You told the pipeline, with --objects-edge, to create a
thin pack.  By definition that is _not_ indexable.



-

From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 12:13 pm

Ah true.  I missed the "thin" pack.

Any idea why we should still prevent this?  It is not like it was a 
technical limitation.


Nicolas
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 12:18 pm

It still is in sha1-file.c; or at least the last time I looked at
that code.  The base is always resolved from the same pack/index
as the delta.  If you fix sha1-file.c sure, I don't see why you
can't allow indexing thin packs.

-- 
Shawn.
-

From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 12:33 pm

If there are advantages to do so then maybe. That would be for another 
day though, as I've been burned a bit with packs recently.


Nicolas
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 1:46 pm

I guess its my turn then to work in the mmap window code, huh?  :-)

-- 
Shawn.
-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 2:17 pm

There are bigger reasons to _never_ allow packs to contain deltas to 
outside of themselves:

 - there's no point. 

   If you have many small packs, you're doing something wrong. The whole 
   _point_ of packs is to put things into the same file, so that you can 
   avoid the filesystem overhead. And once packs are big and few, the 
   advantage of having deltas to outside the pack is basically zero.

 - it's a bad design. 

   Self-sufficient packs means that a pack is a "safe" thing. When the 
   index says that it contains an object, then it damn well contains it.

   In contrast, if you had packs that only contained a delta, and the pack 
   needed some _other_ pack (or loose object) to actually generate that 
   object, then it's not safe any more. You could end up with a situation 
   where you get two packs from two different sources, and they contain 
   deltas to _each_other_, and you have no way of actually generating the 
   object itself any more.

   (Or you end up having to have rules to figure out when you have a loop,
   and stop looking just in the packed files, and start looking for loose 
   objects instead)

   In other words, it has potentially _serious_ downsides.

So DAMMIT! Stop looking to make the data structures worse. The fact is, 
the git data structures are FINE. They are well-designed. They work well. 
There's no _point_ in changing them, especially since changing them seems 
to be all about making things less reliable for dubious gain.

One of the advantages of git is that you can explain things with object 
relationships, and that the file format is stable as _hell_. Thats a GOOD 
thing. Please realize that if you want to change the file formats, you'd 
have a hell of a better reason for it that "just because I can".

Please. Really.

So next time somebody suggests a new pack-format, ask yourself:

 - does it save disk-space by 50% or more?

 - does it drop memory usage by 50% or more?

 - does it improve performance by 50% ...
From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 2:32 pm

That and all of the other reasons you cited in your message are
why I haven't finished trying to use some sort of dictionary based
compression for packing objects.

On the other hand we've already seen how packs >1.5 GiB in size
(certainly well within the 4 GiB limitation in the current index
file format) cannot be repacked by git-repack-objects on a 32
bit address space as the entire pack file is mmap'd on one shot.
After the kernel space of ~1 GiB and the pack file at ~1.5 GiB
there's very little address space left for the application code.

My comment that you quoted was about mmap'ing the pack files in
large chunks (around 64-128 MiB at a time, but configurable from
.git/config) rather than as an entire massive mapping.  It had
absolutely nothing to do about changing the pack file format, the
index format, or any other on disk format.  Although it would add
a new pair of configuration options to .git/config.  Is that change
too radical?  :-)

With such a change the Git and Linux kernel repositories would both
still mmap in one chunk but much larger projects like Mozilla or
very large pack files coming out of git-fastimport would actually
be usable on 32 bit architectures without running into address space
limitations so quickly.  Git would also be slightly more usable for
some people who have a lot of very uncompressable data stored in Git.


Unless of course you are actively working on a fix for the Linux
kernel so that we can actually have all 4 GiB of virtual address
space available for the userspace git-repack-objects process.
Or have some sort of secret plan to upgrade everyone who uses Git
to 64 bit processors which support 64 bit address spaces...

-- 
Shawn.
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 2:42 pm

I wonder what you would need the configuration options for.

If mmap() pack works well, it works well, and if it is broken
nobody has reason to enable it.  The code should be able to
adjust the mmap window to appropriate size itself and its
automatic adjustment does not even have to be the absolute
optimum (since the user would not know what the optimum would be
anyway), so maybe your configuration options would not be
"enable" nor "window-size" -- and I am puzzled as to what they
are.


-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 2:52 pm

All very true.

However what do we do about the case where we mmap over 1 GiB worth
of pack data (because the mmap succeeds and we have at least that
much in .pack and .idx files) and then the application starts to
demand a lot of memory via malloc?  At some point malloc will return
NULL, xmalloc will die(), and that's the end of the program.

If the user was able to set the maximum threshold of how much data
we mmap then they could initially prevent us from mmap'ing over 1 GiB;
instead using a smaller upper limit like 512 MiB.

Of course as I write this I think the better solution to this
problem is to simply modify xmalloc (and friends) so that if the
underlying malloc returned NULL and we have a large amount of stuff
mmap'd from packs we try releasing some of the unused pack windows
and retry the malloc before die()'ing.


The other configuration option is the size of the mmap window.
This should by default be at least 32 MiB, probably closer to
128 MiB.  But its nice to be able to force it as low as a single
system page to setup test cases in the t/ directory for the mmap
window code.

Earlier this summer we discussed this exact issue and said this
value probably needs to be configurable if only to facilitate the
unit tests.

-- 
Shawn.
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 3:02 pm

I see.  So you are allowing users to control individual window
size and total mmap memory.  That makes sense.

-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 2:55 pm

Sure. I agree that we should do that, if only because it's clearly getting 
hard to handle large pack-files on a 32-bit architecture.

You just seemed to say that in the _context_ of wanting to support having 
multiple pack-files open (in order to allow deltas to refer to things 
outside their own pack-file).

I just wanted to head that particular idea off at the pass.

I think thin packs have been a good idea, and they certainly cut the 
amount of data sent over the network down by a large amount (much more 
than 50%), so I think thin packs are a great idea. Just _not_ when 
indexed.

So I don't object to mmap windows at all. I object to them only in the 
context of "they would allow us to use deltas between two different packs"
discussion ;)

		Linus
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 3:05 pm

Having mmap windows or not has no impact on using deltas between
packs.  We already map multiple packs at once.  We just don't do
delta resolution between them, for the reasons you have already
given.

The two are totally unrelated.  I apologize for somehow making
yourself (and others) think they are.

-- 
Shawn.
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 3:07 pm

Ah, I feel quite behind.  I was about to say "oh have you been
pushing with --thin option?", and then realized that we made it
default since late March this year.

I need to run memtest86 on myself X-<.

-

From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 2:41 pm

Remember what I said earlier: "If there are advantages to do so then 

To me this is the real killer.

Shawn was talking about a different issue though.


Nicolas
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 2:41 pm

Actually there is a point to storing thin packs.  When I pull from
a remote repo (or push to a remote repo) a huge number of objects
and the target disk that is about to receive that huge number of
loose objects is slooooooooow I would rather just store the thin
pack then store the loose objects.

Ideally that thin pack would be repacked (along with the other
existing packs) as quickly as possible into a self-contained pack.
But that of course is unlikely to happen in practice; especially

Yes, it does.

But it could also be useful when you fetch 20k+ objects onto a
Windows system or push 1k+ objects onto the slowest NFS system I
have ever seen...  where writing file data (aka packs) is reasonable
but creating or deleting files takes nearly 1 second per file.
I don't want to kill the better part of an hour waiting for a push
to complete!

-- 
Shawn.
-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 3:00 pm

I'm really nervous about keeping thin packs around. 

But a possibly good (and fairly simple) alternative would be to just 
create a non-thin pack on the receiving side. Right now we unpack into a 
lot of loose objects, but it should be possible to instead "unpack" into a 
non-thin pack.

In other words, we could easily still use the thin pack for communication, 
we'd just "fill it out" on the receiving side.

		Linus
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 3:11 pm

Funny, I had the same thought.  :-)

We already know how many objects are coming in on a thin pack;
its right there in the header.  We could just have some threshold
at which we start writing a full pack rather than unpacking.

Writing such a full pack would be a simple matter of copying the
input stream out to a temporary pack, but sticking any delta bases
into a table in memory.  At the end of the data stream if we have any
delta bases which weren't actually in that pack then find them and
copy them onto the end, update the header and recompute the checksum.
git-fastimport does some of that already, though its trivial code...

Worst case scenario would be the incoming thin pack is 100% deltas
as we would need to copy in a base object for every object mentioned
in the pack.

-- 
Shawn.
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 3:13 pm

It should not be hard to write another program that generates a
packfile like pack-object does but taking a thin pack as its
input.  Then receive-pack can drive it instead of
unpack-objects.

-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 3:42 pm

Give me half an hour. It should be trivial to make "unpack-objects" write 
the "unpacked" objects into a pack-file instead.

		Linus
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 3:48 pm

Heh, three people having the same idea that goes in the same
direction at the same time is not necessarily a good sign of
efficient project management...

I am currently fighting with FC5 so please go ahead.

-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 4:22 pm

Or maybe it is just a sign of a good way to resolve the issue I
was raising.  :-)

-- 
Shawn.
-

From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 4:18 pm

If you use builtin-unpack-objects.c from next, you'll be able to 
generate the pack index pretty easily as well, as all the needed info is 
stored in the obj_list array.  Just need to append objects remaining on 
the delta_list array to the end of the pack, sort the obj_list by sha1 
and write the index.

Pretty trivial indeed.


Nicolas
-

From: Johannes Schindelin
Date: Wednesday, October 18, 2006 - 4:50 pm

Hi,


Easy! You take all the fun out of it!

Ciao,
Dscho

-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 5:07 pm

Actually, I've hit an impasse.

The index isn't the problem. The problem is actually writing the resultant 
pack-file itself in one go.

The silly thing is, the pack-file contains the number of entries in the 
header. That's a silly problem, because the _natural_ way to turn a thin 
pack into a normal pack would be to just add the missing objects from the 
local store into the resulting pack. But we don't _know_ how many such 
missing objects there are, until we've gone through the whole source pack. 

So you can't easily do a streaming "write the result as you go along" 
version using that approach.

So there's _another_ way of fixing a thin pack: it's to expand the objects 
without a base into non-delta objects, and keeping the number of objects 
in the pack the same. But _again_, we don't actually know which ones to 
expand until it's too late.

The end result? I can expand them all (I have a patch that does that). Or 
I could leave as deltas the ones I have already seen the base for in the 
pack-file (I don't have that yet, but that should be a SMOP). But I'm not 
very happy with even the latter choice, because it really potentially 
expands things that didn't _need_ expansion, they just got expanded 
because we hadn't seen the base object yet.

So I'll happily send my patches to anybody who wants to try (I don't write 
the index file yet, but it should be easy to add), but I'm getting the 
feeling that "builtin-unpack-objects.c" is the wrong tool to use for this, 
because it's very much designed for streaming.

It would probably be better to start from "index-pack.c" instead, which is 
already a multi-pass thing, and wouldn't have had any of the problems I 
hit. 


So it's conceptually totally trivial to rewrite a pack-file as another 
pack-file, but at least so far, it's turned out to be less trivial in 
practice (or at least in a single pass, without holding everything in 
memory, which I definitely do _not_ want to do).

So I'm leaving this for today, and ...
From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 5:15 pm

A potentially even simpler way would probably be to literally just use 
"git-pack-objects" directly, and just have a very special mode that allows 
mapping the thin pack as if it was a real pack (ie basically 
pre-populating a fake pack entry, where the fake part comes from adding 
the missing objects by hand to the mapping).

So many ways to do it, so little real motivation ;)

		Linus
-

From: Johannes Schindelin
Date: Wednesday, October 18, 2006 - 5:31 pm

Hi,


You do not write this to stdout, right? Why not just come back and correct 
the number of objects? Of course, the SHA1 has to be calculated _after_ 
that.

Ciao,
Dscho

-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 5:46 pm

That's the issue. I wanted the pack-file thing to look as similar to the 
old code as possible. And that means using the "sha1write()" interfaces, 
which calculate the SHA1 checksum _as_ we write.

So yes, I wanted to do it all in one phase.

Anyway, if anybody is interested, here's a series of four patches that do 
something that _almost_ works. I save away the SHA1's and the offsets so 
that I could write an index too, but I didn't actually do that part.

But with this, I can rewrite a pack-file "in flight", and the end result 
can then have "git index-pack" run on it, and used as a pack. It's just 
that there are no deltas left because of some of the silly problems I 
outlined (the code to write out deltas is actually there and just 
uncommented - it works, but it leaves the end result with unsatisfied 
deltas again).

		Linus
---
commit 4efd9b0f44635b3075c9aad6d1cc8830e3abded3
Author: Linus Torvalds <torvalds@osdl.org>
Date:   Wed Oct 18 17:22:04 2006 -0700

    Fix up csum-file interfaces
    
    Add "const" where appropriate
    
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/csum-file.c b/csum-file.c
index b7174c6..3237228 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -47,7 +47,7 @@ int sha1close(struct sha1file *f, unsign
 	return 0;
 }
 
-int sha1write(struct sha1file *f, void *buf, unsigned int count)
+int sha1write(struct sha1file *f, const void *buf, unsigned int count)
 {
 	while (count) {
 		unsigned offset = f->offset;
@@ -115,7 +115,7 @@ struct sha1file *sha1fd(int fd, const ch
 	return f;
 }
 
-int sha1write_compressed(struct sha1file *f, void *in, unsigned int size)
+int sha1write_compressed(struct sha1file *f, const void *in, unsigned int size)
 {
 	z_stream stream;
 	unsigned long maxsize;
@@ -127,7 +127,7 @@ int sha1write_compressed(struct sha1file
 	out = xmalloc(maxsize);
 
 	/* Compress it */
-	stream.next_in = in;
+	stream.next_in = (void *) in;
 	stream.avail_in = size;
 
 	stream.next_out = ...
From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 8:01 pm

Hmmm.... unpack-objects receives a (possibly thin) pack over its stdin.  
That part has to be streamed.  But its output is currently always 
written to multiple files as separate objects.  So, while the input 
comes from a stream, the output doesn't have to.

In that case, why not just write the input directly to a temporary file, 
append the missing objects, seek back to adjust the object number, and 
finally run a SHA1_Update() on the whole thing?  This forces you to 
write everything and then read everything back, but this should not be 
too bad especially that the written data is likely to still be cached.  
Once its final sha1sum is written then it just need to be moved with the 

Most base objects, well all of them nowadays, are written before their 
deltas.  So in practice the only objects that will get expanded are the 

But index-pack is totally incompatible with any streaming.  It mmap() 
the whole pack and happily perform random accesses.  So you'd need to 
write the entire thin pack to disk anyway before it could work on it.  
This is not really better than the unpack-objects option.  At least 
unpack-objects is structured to perform work on the fly as data is 

I'll have a look at your patches tomorrow as well.  I have many ideas 
brewing, including randering index-pack obsolete since actually 
unpack-objects could do it all already (both tools have many concepts in 
common).


Nicolas
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 8:46 pm

pack-objects.c::write_one() makes sure that we write out base
immediately after delta if we haven't written out its base yet,
so I suspect if you buffer one delta you should be Ok, no?


-

From: Nicolas Pitre
Date: Thursday, October 19, 2006 - 7:27 am

If we create full packs out of thin packs the base objects will end up 
at the end of the pack so this assumption is a bad one to rely upon if 
we want to make things robust (like being able to feed such a pack 
back).


Nicolas
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 7:55 am

It doesn't matter. I realized that my bogus patch to unpack-objects was 
more seriously broken anyway: even the "un-deltify every single object" 
was broken. And that's despite the fact that I _tested_ it, and verified 
the end result by hand.

Why? Because I tested it within one repo, by just piping the output of 
git-pack-objects --stdout directly to the repacker. That seemed to be a 
good way to test it without setting up anything bigger. But it turns out 
that it misses one of the big problems: if you don't unpack the objects in 
a way that later phases can read, none of the streaming code works at all, 
and you have to buffer up _everything_ in memory just to be able to read 
any previous _non_delta objects too.

So my patch-series works - but it only works in a repo that already has 
all the objects in question, because then it can look up the objects in 
the original database. Which makes it useless. Duh.

So forget about unpack-objects. It's designed to be streaming (and it's a 
_good_ design for what it does), but repacking really cannot be done that 
way. Repacking needs to be done by saving the thin pack to disk, and then 
doing a multi-pass over it (like git-index-pack does, for example).

Just throw my patch away. It's not even useful as a basis for anything 
else, unless you want to use it as a way to keep all the objects in memory 
and use the "unpack-objects" logic to just _parse_ the incoming pack.

I suspect using "index-pack" is saner (since it already has the multi-pass 
logic), or just doing somethign that maps all the objects in memory, and 
then calls builtin-pack-objects once it has set up the new thin pack so 
that others can see/use the new objects without realizing that they aren't 
in the canonical pack-format.

		Linus
-

From: Jan Harkes
Date: Thursday, October 19, 2006 - 9:07 am

You are correct that it is not possible to create a pack with all
objects expanded in a single pass. But that doesn't mean that a single
pass conversion to a full pack is impossible.

If we find a delta against a base that is not found in our repository we
can keep it as a delta, the base should show up later on in the
thin-pack. Whenever we find a delta against a base that we haven't seen
in the received part of the thin pack, but is available from the
repository we should expand it because there is a chance we may not see

About that patch series, is there a simple way to import the series into
a local repository? git-am doesn't like it, even after splitting it into
separate files on the linebreaks. I guess git-mailinfo could be taught
to recognise the git-log headers. Or have I missed some useful git apply
trick.

Jan

-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 9:48 am

Yes, indeed. We can also have another heuristic: if we find a delta, and 
we haven't seen the object it deltas against, we can still keep it as a 
delta IF WE ALSO DON'T ALREADY HAVE THE BASE OBJECT. Because then we know 
that the base object has to be there later in the pack (or we have a 
dangling delta, which we'll just consider an error).

So yeah, maybe my patch-series is something we can still save.

However, the thing that makes me suspect that it is _not_ saveable, is 
this:

 - let's assume we have a nice thin pack, with object A B C D (in that 
   order), which is actually a good pack in itself (ie it _might_ be thin, 
   but it's actually self-sufficient)

 - let A be a full object, and B be packed as a delta off A, C as a delta 
   off B, and D as a delta off C.

 - Try to repack it as a streaming thing (the end result _should_ 
   obviously be exactly the same as the input, since it turns out to be 
   self-sufficient)

Looks trivial, no?

The answer is: no. It's not trivial. Or rather, it _is_ trivial, but you 
have to _remember_ all of the actual data for A, B, C and D all the way to 
the end, because only if you have that data in memory can you actually 
_recreate_ B, C and D even enough to get their SHA1's (which you need, 
just in order to know that the pack is complete, must less to be able to 
create a non-delta version in case it hadn't been).

So we can definitely do the one-pass creation, but it requires that we 
keep track of everything we've expanded so far in memory (because we won't 
have the data available any other way - we don't have them as objects in 
our object database, and we don't have a good new pack yet).


No, you've not missed anything. I didn't really expect anybody to want to 
seriously play with it, so I didn't bother to do things properly. 

Especially since I hadn't even written very good commit messages.

Anyway, I just pushed the "rewrite-pack" branch to my git repo on 
kernel.org, so once it mirrors out, if you ...
From: Jan Harkes
Date: Thursday, October 19, 2006 - 5:20 pm

It looks like you were really close. When we cannot resolve a delta, we
just write it to the packfile and we don't queue it. If it can be
resolved we write it as a full object.

The only thing that cannot be reliably tracked is the pack index
information. The offsets are trivial, but we cannot calculate the SHA1
for a delta without applying it to it's base, if the base comes later
the existing code could do it, but if it has already been written to the
pack we can't easily track back.

And why add all the extra complexity. Running git-index-pack after
git-update-objects --repack not only generates the correct index without
a problem, it also serves as an extra consistency check and we keep this
code isolated from any possible future changes to the index file format.

I'll try to follow this up with 2 patches, one is an almost trivial
change to your code that makes it write out a pack with all full objects
and resolvable deltas converted to full objects, any unresolved deltas
are expected to be relative to some other object in the same pack.

The rewritten pack is indexed correctly even when I run git-update-index
in a repository that does not contain any of the objects in the thin-pack.
Ofcourse it also works when the objects are available, but the resulting
full pack is considerably bigger since we can find a suitable base for

Only if you want to build the index at the same time, we don't need to

I think I still left quite a bit of the mess unfixed.

Jan
-

From: Jeff King
Date: Friday, October 20, 2006 - 7:41 am

If I understand correctly, if we see an unresolvable delta, we are just
making the assumption that its base has arrived (or will arrive) in the
same pack (without checking).  This means that we could end up with a
corrupted repository if the sender gives us a bad pack. I believe that
git's network interaction has been designed specifically to avoid such
possibilities (e.g., verifying completeness and integrity of downloaded
objects).

-Peff
-

From: Jan Harkes
Date: Thursday, October 19, 2006 - 5:20 pm

The resulting pack should be correct if we have the base somewhere else in
the received pack, if we didn't have the base the received pack would be
faulty and can't be unpacked as loose objects either.

The internal pack index information is not updated correctly anymore.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>

---
 builtin-unpack-objects.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index f139308..b95c93c 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -246,7 +246,10 @@ static void unpack_delta_entry(unsigned 
 	}
 
 	if (!has_sha1_file(base_sha1)) {
-		add_delta_to_list(base_sha1, delta_data, delta_size);
+		if (pack_file)
+			write_pack_delta(base_sha1, delta_data, delta_size);
+		else
+			add_delta_to_list(base_sha1, delta_data, delta_size);
 		return;
 	}
 	base = read_sha1_file(base_sha1, type, &base_size);
-- 
1.4.2.1

-

From: Jan Harkes
Date: Thursday, October 19, 2006 - 5:20 pm

Tracking the offsets is not that hard, but calculating the sha1 for the
deltas is tricky, we may have already seen and written out the base we
need. So it is actually easier to avoid the complexity altogether and
rely on git-index-pack to rebuild the index. The indexing step is also a
useful validation whether the final pack contains a base for every delta.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>

---
 builtin-unpack-objects.c |   57 +++++++++++-----------------------------------
 1 files changed, 14 insertions(+), 43 deletions(-)

diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index b95c93c..3df7938 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -89,29 +89,6 @@ static void *get_data(unsigned long size
 }
 
 static struct sha1file *pack_file;
-static unsigned long pack_file_offset;
-
-struct index_entry {
-	unsigned long offset;
-	unsigned char sha1[20];
-};
-
-static unsigned int index_nr, index_alloc;
-static struct index_entry **index_array;
-
-static void add_pack_index(unsigned char *sha1)
-{
-	struct index_entry *entry;
-	int nr = index_nr;
-	if (nr >= index_alloc) {
-		index_alloc = (index_alloc + 64) * 3 / 2;
-		index_array = xrealloc(index_array, index_alloc * sizeof(*index_array));
-	}
-	entry = xmalloc(sizeof(*entry));
-	entry->offset = pack_file_offset;
-	hashcpy(entry->sha1, sha1);
-	index_array[nr++] = entry;
-}
 
 static void write_pack_delta(const unsigned char *base, const void *delta, unsigned long delta_size)
 {
@@ -122,11 +99,9 @@ static void write_pack_delta(const unsig
 	sha1write(pack_file, header, hdrlen);
 	sha1write(pack_file, base, 20);
 	datalen = sha1write_compressed(pack_file, delta, delta_size);
-
-	pack_file_offset += hdrlen + 20 + datalen;
 }
 
-static void write_pack_object(const char *type, const unsigned char *sha1, const void *buf, unsigned long size)
+static void write_pack_object(const void *buf, unsigned long size, const char *type, const unsigned char *sha1)
 {
 ...
From: Nicolas Pitre
Date: Thursday, October 19, 2006 - 6:11 pm

I don't think it is a good idea.

After looking at the problem for a while I should side with Linus.  
unpack-objects is not the proper tool for the job.  The way to go is to 
make input to index-pack streamable.

This patch in particular creates additional restrictions on pack 
files that were not present before.  And I don't think this is a good 
thing.

This patch impose an ordering on REF_DELTA objects that doesn't need to 
exist.  Say for example that an OFS_DELTA depends on an object which is 
a REF_DELTA object.  With this patch any pack with the base for that 
REF_DELTA stored after the OFS_DELTA object will be broken.

And to really do thin pack fixing properly we really want to just append 
missing base objects at the end of the pack which falls in the broken 
case above.



Nicolas
-

From: Junio C Hamano
Date: Thursday, October 19, 2006 - 6:35 pm

I agree.

By the way, it is rather rare for us to see a NAK on this list.
I'd welcome to see more of them ;-).



-

From: Jan Harkes
Date: Thursday, October 19, 2006 - 7:27 pm

I don't see where it imposes any ordering.

If we see a complete object it will remain complete. If we find a delta,
and we have the base in the current repository it will be expanded to a
complete object. When we get a delta that doesn't have a base in the
current repository it will remain unresolved and is written out as a
delta.

So the output pack will always contain fewer deltas as the input.

btw. I don't really know what OFS_DELTA and REF_DELTA objects are, I
grepped the source and found no references to either. I can only find
an OBJ_DELTA.

But if any of the deltas depend on an object that is not in the thin
pack, the base has to be available in the current repository and as such
it will be expanded to a full object, replacing the possibly external
delta reference with an internal base object. If the base is not found
in the current repository the base has to be another object in the
original thin pack so we can write out the delta as is.

There is no before or after decision here. We don't look back in the
thin pack, and we don't have to look forward either. So I don't
understand why your example would break or not depending on if the base

I guess I'll grep through the mailinglists to try to figure out what
these OFS and REF deltas are and why they behave so differently
depending on their order in the pack.

Jan
-

From: Junio C Hamano
Date: Thursday, October 19, 2006 - 7:30 pm

It's been cooking in "next" branch for quite a while.

-

From: Jan Harkes
Date: Thursday, October 19, 2006 - 7:46 pm

Ah yes, just went through the thread about the git-index-pack breaking on
64-bit systems and the back and forth about the possible complexity of
...

I guess one of these must be false.

But clearly this patch breaks those offset based delta's when we expand
random deltas in place.

Jan

-

From: Nicolas Pitre
Date: Thursday, October 19, 2006 - 8:36 pm

But the point of the whole exercice is actually to avoid unresolved 
deltas.  And you know if you have unresolved deltas only when the whole 
pack has been processed.

If the base object is not in the repository but it is in the pack 
_after_ the delta that needs it, you won't have resolved it.  If this is 
a thin pack with missing base objects for whatever reason you're 
screwed.

If the delta has its base object in both the repository _and_ in the 
pack but after the delta then you will have expanded the delta 
needlessly.

So your solution is suboptimal.

The optimal solution really consists of appending missing base objects 
to a thin pack in order to make it complete, or error out if those 
cannot be found.


Nicolas
-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 2:56 pm

We've tried this already, and I shelved the patch for 64-index
for now due to exactly the same reasoning as yours (and it would
have conflicted heavily with Shawn's windowed-mmap() patch).  It
involved updating just the index file format, so you are right
on both counts.

But you are always right anyway, so it may not be a news at all
;-).

-

From: Junio C Hamano
Date: Wednesday, October 18, 2006 - 12:33 pm

It is a technical limitation.  We have never assumed that the
virtual address space is big enough to hold more than one whole
pack mmapped at the same time.

Lifting this needs the piecemeal mmap() change somebody was
talking about.

I might bite the bullet and do that myself but I've been hoping
to get an appliable patch from somewhere else ;-).

-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 1:47 pm

Even though its not big enough for some larger packs on a 32

I might be able to do it this weekend.  I'll try to spend some time
on it.  You'll either see a patch series, or you won't.  ;-)

-- 
Shawn.
-

From: Nicolas Pitre
Date: Wednesday, October 18, 2006 - 12:09 pm

Did you really manage to miss the "heads-up: git-index-pack in "next" is 
broken" thread?

The fix:

diff --git a/index-pack.c b/index-pack.c
index fffddd2..56c590e 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -23,6 +23,12 @@ union delta_base {
 	unsigned long offset;
 };
 
+/*
+ * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
+ * to memcmp() only the first 20 bytes.
+ */
+#define UNION_BASE_SZ	20
+
 struct delta_entry
 {
 	struct object_entry *obj;
@@ -211,7 +217,7 @@ static int find_delta(const union delta_
                 struct delta_entry *delta = &deltas[next];
                 int cmp;
 
-                cmp = memcmp(base, &delta->base, sizeof(*base));
+                cmp = memcmp(base, &delta->base, UNION_BASE_SZ);
                 if (!cmp)
                         return next;
                 if (cmp < 0) {
@@ -232,9 +238,9 @@ static int find_delta_childs(const union
 
 	if (first < 0)
 		return -1;
-	while (first > 0 && !memcmp(&deltas[first - 1].base, base, sizeof(*base)))
+	while (first > 0 && !memcmp(&deltas[first - 1].base, base, UNION_BASE_SZ))
 		--first;
-	while (last < end && !memcmp(&deltas[last + 1].base, base, sizeof(*base)))
+	while (last < end && !memcmp(&deltas[last + 1].base, base, UNION_BASE_SZ))
 		++last;
 	*first_index = first;
 	*last_index = last;
@@ -312,7 +318,7 @@ static int compare_delta_entry(const voi
 {
 	const struct delta_entry *delta_a = a;
 	const struct delta_entry *delta_b = b;
-	return memcmp(&delta_a->base, &delta_b->base, sizeof(union delta_base));
+	return memcmp(&delta_a->base, &delta_b->base, UNION_BASE_SZ);
 }
 
 static void parse_pack_objects(void)


Nicolas
-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 1:08 pm

Since you created a "thin" pack (that's what the "--objects-edge" means), 
the pack actually contains deltas to objects that are _not_ in the pack. 

In other words, it's not a valid stand-alone pack, it's only a valid thin 
pack, useful to transfer data to the other end (and the other end had 
better have the objects that the deltas are against already).

As a result, index-file refuses to index it: it cannot be used as a 
stand-alone pack, it's _only_ useful as a transfer medium.

So don't even _try_ to use it as a standalone pack-file. It won't work.

(If you want somethign that actually works as a stand-alone pack-file, 
change the "--objects-edge" flag to just "--objects" - that makes the 
pack-file self-sufficient, and doesn't try to delta against "edge" 

A properly named _standalone_ pack gets named not by its actual contents, 
but by the SHA1-sum of the sorted list of objects it contains. That's so 
that a pack-file will be named the same thing regardless of how the 
contents are actually packed.

A thin pack cannot be named that way at all, for the same reason you 
cannot index it: it has a set of objects it enumerates (so you could name 
it by them), but it _also_ has a set of objects outside of it that it 
depends on. 

That said, even a thin pack internally has a SHA1 checksum of its 
contents: the last 20 bytes should be the SHA1-sum of all preceding bytes. 
So if you just want _some_ kind of name, you can use the last 20 bytes of 
a pack, which is just its internal integrity-checksum (but that is 
_different_ from the "pack-xxxxxx.idx"/"pack-xxxxxx.pack" naming).

			Linus
-

From: Sean
Date: Wednesday, October 18, 2006 - 12:57 pm

On Wed, 18 Oct 2006 20:52:25 +0200


Couldn't these just as easily have been written as git-bundle and

Not sure if it would be useful, but it shouldn't be too hard to have

Think you're right about making it an attachment instead.

Sean
-

From: Alexander Belchenko
Date: Wednesday, October 18, 2006 - 11:46 pm

Petr Baudis пишет:

You probably miss main idea of bzr bundles. It's not just the way to
send via e-mail or other appropriate transport the part of repository.
It primarily was designed to be human readable as usual diff (i.e.
patch). It was designed to solve 2 thing simultaneously:

- be informative for human as usual patch
- be consistent for machine.

--
Alexander

-

From: Sean
Date: Thursday, October 19, 2006 - 3:40 am

On Thu, 19 Oct 2006 09:46:32 +0300

Petr already mentioned that the data currently shown in the email
text isn't really useful.  But it's simple to make it an attachment
and show a combined diff instead.

Although that might just make the email bigger for not a lot of
gain.  It's easy to use the git command line and gui tools to inspect
the bundle after importing it into your repository.  And just as
easy to expunge the bundle afterward if it isn't up to grade.

Sean
-

From: Aaron Bentley
Date: Friday, October 20, 2006 - 7:03 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


In Bazaar bundles, the text of the diff is an integral part of the data.
 It is used to generate the text of all the files in the revision.

Bazaar bundles were designed to be used on mailing lists.  So you can
review the changes from the diff, comment on them, and if it seems

It's my understanding that most changes discussed on lkml are provided
as a series of patches.  Bazaar bundles are intended as a direct
replacement for patches in that use case.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFONck0F+nu1YWqI0RAgrHAJ0flmF1wCGYYUSk8f2iy8LuZnkaKQCdFSIo
JIaKi9S8TzUkhvaWpYYP5AA=
=MgZo
-----END PGP SIGNATURE-----
-

From: Sean
Date: Friday, October 20, 2006 - 8:37 am

On Fri, 20 Oct 2006 10:03:16 -0400

Perhaps I missed something in the earlier mails about this feature.
As I understood it, the email sent has a combined diff that shows
the net effect of all the commits included in the bundle.  (Whereas
the current Cogito version only shows a diffstat)

If the recipient of such a bundle is unable to extract the diff of
each separate commit included in the bundle then I can't see any
value in the feature at all.  But showing a combined diff in the
email may have marginal value, so long as when the bundle is 
imported into the recipient repository the individual commits

A combined diff of a bunch of changes would usually be most _unwelcome_
for review on lkml.  The constant refrain is to ask people to split their
changes up into smallish individual patches for review.

Sean
-

From: Jeff King
Date: Wednesday, October 18, 2006 - 2:20 pm

OK, that was how I was envisioning it, as well, but I was concerned
about the "screwed" part. But I'm not sure how often that would be an
issue in practice (after all, patches require some matchup of the base,
though not as strict as SHA1s).

Thanks for the explanation.

-Peff
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 4:33 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



That's true.  We support merge points in a way that's compatible with
svk.  Subversion allows revisions to have arbitrary properties, and svk

Bzr's subversion support is quite nice.  You can commit, merge, run
history viewers.

There are screenshots and stuff here:
http://bazaar-vcs.org/BzrForeignBranches/Subversion

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNWhc0F+nu1YWqI0RAkH7AJ4/S648shA8IKg42xcGWdjnjmA+PgCdEDhg
Af/mcG+XTy3Tsb9b1x3rYcg=
=xnjF
-----END PGP SIGNATURE-----
-

From: Andreas Ericsson
Date: Wednesday, October 18, 2006 - 1:13 am

Sounds a bit like [PATCH 0/8] would have the output of

	git diff $(git merge-base master)..topic-branch

for any given patch-series. It might be easier to review the whole 
patch-series in some cases. Especially with patch-series where more than 
one patch touches the same part of the code.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 11:22 pm

AAUI, the initial claim was that after a rebase, git can do a
fast-forward, but Aaron has missed the /after a rebase/ part.

And yes, it the bzr terminology, bzr can do a "pull" after a "graft".
I don't think there's a fundamental difference here.

-- 
Matthieu
-

From: Sean
Date: Tuesday, October 17, 2006 - 3:00 pm

On Tue, 17 Oct 2006 17:27:44 -0400

But really why does any of that matter?  This is the open source world.
We don't need plugins to extend features, we just add the feature to
the source.  The example I asked about earlier is a case in point. 
Apparently in bzr "bisect" was implemented as a plugin, yet in Git it
was implemented as a command without any issue at all, no plugins
needed, and its compiled and runs at machine speed.

Sean
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 3:44 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


That can lead to feature bloat.  Some plugins are not useful to
everyone, e.g. Mercurial repository support.  Some plugins introduce
additional dependencies that we don't want to have in the core (e.g. the
rsync, baz-import and graph-ancestry commands).

Plugins also don't have a Bazaar's rigid release cycle, testing
requirements and coding conventions, so they are a convenient way to try
out an idea, before committing to the effort of getting it merged into

The bisect plugin is just as performant as any other bzr command.  (The
whole VCS is in Python.)  Most people don't use it, so we don't ship it
as part of the base install, but anyone who wants it can have it.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNVy70F+nu1YWqI0RAnlxAJ9+ZXryG/KJxi6hjpz+U/gU3y06MQCdH2Ez
cFlnxwWksB+q2b1dXI3cfwo=
=HAy6
-----END PGP SIGNATURE-----
-

From: Sean
Date: Tuesday, October 17, 2006 - 3:56 pm

On Tue, 17 Oct 2006 18:44:11 -0400

Shrug, it's really not that tough to do in regular ole source code.
On Fedora for instance you have your choice of which rpms you want

Hmm.. It's pretty easy to test out Git ideas too.  People do it all
the time, and without plugins.  Junio maintains several such trees
for instance.  Dunno.. I just think plugs _sounds_ good to developers

Sure, and anyone who wants to use StGit on top of Git can download and
use it as well.

Sean
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 4:11 pm

/me too post ;-)


git-core, git-email, git-arch, git-cvs, git-svn, gitk
(and git-debuginfo).

gitk and gitweb were developed in its own repositories, but some time
ago got incorporated into git repository. We have contrib/ area.

Thanks to many low lewel (plumbing in git-speak) commands it is very
easy to prototype (write actually) new command in language suitable
for fast prototyping, i.e. shell or Perl (or Python, too). Then if it is
performance critical, or if it get troublesome to manage shell script
version, it gets rewritten in C as builtin command.
-- 
Jakub Narebski
Poland
-

From: Charles Duffy
Date: Wednesday, October 18, 2006 - 2:04 pm

Example time!

There's a plugin for Bzr which adds support for Cygwin-compatible 
symlink support on Windows. (IIRC, this involves monkey-patching some of 
the Python standard library bits).

Now, this is something which is *proposed* as a feature to be merged 
into upstream bzr, and it may happen at some point. That said, when I 
have a Windows-using coworker who wants to check out a repository that 
has symlinks in it (with his win32-native, no-cygwin-required bzr 
upstream binary), I don't need to tell him to go download and build bzr 
from a third party; instead, I just need to tell him to run a single 
command to check out the plugin in question into the bzr plugins folder.

 From an end-user convenience perspective, it's a pretty significant win.

-

From: Sean
Date: Wednesday, October 18, 2006 - 2:29 pm

On Wed, 18 Oct 2006 16:04:52 -0500

You'll need a better example than that.  Git has supported a version
of Cygwin-compatible symlink support on Windows for quite some time.
And no plugins were needed.

Sean
-

From: Charles Duffy
Date: Wednesday, October 18, 2006 - 4:31 pm

The win32-compatible symlink support is not, in and of itself, the point.

The point is that core, pervasive functionality can be modified at 
runtime, with no recompilation or installation of tools not included in 
the bzr package itself, simply by dropping a directory into place. This 
means that folks who don't have the skillset to merge three branches 
together (say, upstream plus two different trees adding extra 
functionality) and run a build can still install a few plugins to 
enhance their copy of bzr (which was installed by their IT staff, or a 
shiny click-through idiot-friendly Windows installer, etc).

And yes, there are people like that who are part of bzr's target 
audience. Think (of the lower end of the set of) DBAs, QA folk and such.


Granted, I'm speaking with my IT hat on here rather than my developer 
hat -- but plugins are a pretty clear usability win.

-

From: Johannes Schindelin
Date: Wednesday, October 18, 2006 - 4:48 pm

Hi,


Please note that this is not welcome here. I _need_ to trust my SCM. And 
_that_ means that no strange non-mainline beast can be allowed to change 
core features.

So, the wonderful upside of plugins you described here are actually the 
reason I will never, _never_ use bzr with plugins.

Ciao,
Dscho

--

It's not paranoia. It's called experience.

-

From: Charles Duffy
Date: Wednesday, October 18, 2006 - 6:58 pm

I presume that for this reason you will also never, _never_ use a 
non-mainline branch of git -- even if its actual code only touches UI 
enhancements or something similarly non-core -- because third-party 
branches have the ability, in theory, to make changes to the core of the 
revision control system. And that you will never, _never_ use 
third-party wrappers because they might play LD_PRELOAD tricks. Or run 
any software with root privileges you haven't personally written. Or...

Sean's point that plugins are a comparatively minor win made inexpensive 
on account of bzr's use of Python is reasonable (though we may choose to 
differ on what level of value we attach to the utility). The claim that 
an extensibility mechanism should be rejected wholesale on account of 
being excessively powerful, on the other hand, is just silly.



(If you couldn't write a plugin that *didn't* touch the core, this would 
be a different story. This is, however, very much not the case).

-

From: Johannes Schindelin
Date: Thursday, October 19, 2006 - 4:01 am

Hi,


you neatly clipped the most important part of my email: I quoted you 

NO! The point was that I will not gladly run anything which could change 
the core. If I know it touches only the UI, there is no problem.

If I get a shell script using git-core programs to do its job, I 
_know_ that my repository will not be fscked afterwards.


Most of it comes down to trust. And yes, you are correct, I will not run 
git with some obscure module LD_PRELOADed that some guy from some planet 
sent me.

You might have missed my argument being about the SCM, and not the 

Oh, but NO! An extensibility mechanism which allows for a fragile system 
_is_ silly. Not my rejection of it.

Just take an example (illustrating that once again, one should not 
attribute everything to malevolence...): I write a plugin for bzr. It does 
really wonderful things, it even cooks you dinner.

Only that I happened to make a small mistake (if you followed some threads 
on the git list, you'd know that small mistakes are a hobby of mine), and 
by this mistake, your repository is ... gone. Small mistake, big 
consequence. That is wrong with such a powerful system which caters for 
developers, which are human after all.

Note that such a small mistake would be much more likely caught in git: if 
it touches the core, plenty of eyes look at it.

Ciao,
Dscho

-

From: Charles Duffy
Date: Thursday, October 19, 2006 - 4:10 am

If you're willing to look at the source of a branch to know that it 
touches only the UI, why would you not be willing to look at the source 

It's a silly point. If you're willing to look at what your shell script 
does and validate that it doesn't do LD_PRELOAD tricks or swap out git 
core pieces, why wouldn't you be willing to accept a plugin after a 
similar level of review, rather than stating outright that you would 

Shell scripts allow for a fragile system because they could include C 
code snippets which they then compile and LD_PRELOAD. Sure, they "allow 
for" a fragile system -- but the author has to go out of their way to 
make it so. Similarly, folks writing bzr plugins need to take explicit 
actions to monkeypatch existing code (as opposed to adding a new 
transport/storage format/command/etc but leaving the old ones alone).

If you trust the author of your shell script not to build their own 
LD_PRELOAD at runtime, why don't you trust the author of your bzr plugin 
not to monkeypatch in replacements to core code if they say they aren't?
-

From: Johannes Schindelin
Date: Thursday, October 19, 2006 - 4:24 am

Hi,


That is why I said I'd be gladly using a shell-script using git-core 
programs. It is typically no more than 20 lines, and I can review that 

Well, I do not expect people to misbehave. You do not compile a nasty 
C-program from a shell script _by mistake_.

I also expect people not to constantly miss my point. It could be that I 
am not as proficient in the English language as I thought. In that case, 
I'll better shut up.

Ciao,
Dscho

-

From: Charles Duffy
Date: Thursday, October 19, 2006 - 4:30 am

You also don't replace bzrlib functionality (in your terms, plumbing) in 

I think your point is predicated on a misunderstanding of how plugins work.
-

From: Sean
Date: Wednesday, October 18, 2006 - 4:49 pm

On Wed, 18 Oct 2006 18:31:32 -0500

Sure they can be.  But their value I think is overstated, especially
in an open source project where anyone can grab a copy of the source
and update it with a trial feature.  This updated copy can be wrapped
in a nice GUI installer just as easily as any plugin.

Now, I suppose plugins let end users mix and match trial features
slightly easier, but hopefully your base package isn't so devoid of
features that this is honestly necessary.

As Petr pointed out, all this comes to Bzr essentially for free
since it's a part of python.  So be it, but I've yet to hear an
example where plugins were anything more than a minor convenience
rather than a fundamental win over the way Git is developing.

For an example, just look how few lines of git were needed to
implement the essential features of the bzr bundle feature.
With no plugins or monkey business needed ;o)

Sean
-

From: Matthieu Moy
Date: Friday, October 20, 2006 - 2:43 am

The plugin Vs core feature is not a technical problem. The code for a
plugin and for a core functionality will roughly be the same, but in a
different file.

There can be many reasons why you want to implement something as a
plugin:

* This is project-specific, upstream is not interested (for example,
  bzr has a plugin to submit a merge request to a robot, it will
  probably never come in the core).

* The feature is not matured enough, so you don't want to merge it in
  upstream, but you want to make it available to people without
  patching (for example, "bzr uncommit" was once in the bzrtools
  plugin, and finally landed in upstream).

* The feature you're adding are only of use to a small subset of
  users. You don't want to pollute, in particular "bzr help commands"
  with it, especially not to disturb beginners. I've been arguing in
  favor of a configuration option to hide commands from "bzr help
  commands" instead, but nobody seemed interested.

* Explicit divergent points of view between the implementor of the
  plugin and upstream. That avoids a fork. I don't remember any such
  case with bzr.

I'd compare bzr's plugins to Firefox extensions. Geeks used to like
the big Mozilla-with-tons-of-config-options, but
Firefox-with-only-the-most-relevant-features is the one which allowed
a wide adoption by non-geeks. Still, geeks can customize their
browser, and add features without having to wait for Mozilla Fundation
to incorporate it in upstream.

Now, I don't know git enough to know whether the way it is extensible
allow all of the above, but bzr's plugin system it quite good at that.
At the time git was almost exclusively used by the kernel, you didn't
have all those problems since you targeted only one community, but I
guess you already had some needs for flexibility.

-- 
Matthieu
-

From: Lachlan Patrick
Date: Monday, October 23, 2006 - 11:02 pm

So, bzr's plug-in architecture provides a 'protocol' for communicating
with bzr? Or is it functionally the same as a Python module which is
loaded after being named on the bzr command-line (or placed in a special
folder) then executed along with all the other plug-ins? I'm trying to
understand if writing a plug-in is any simpler than understanding the
bzr source code.

Can I ask the git folks what Sean meant in the above about a 'command'.
Are you talking about shell scripts? Is 'git' the only program you need?

AFAIK, 'bzr' is the sole program in Bazaar, and everything is done with
command line options to bzr. Is that true of git? To what extent is git
tied to a [programmable] shell? I've heard someone say there's no
Windows version of git for some reason, can someone elaborate?

Ta,
Loki
-

From: Shawn Pearce
Date: Monday, October 23, 2006 - 11:23 pm

'git' is actually two things:

  1) Its a wrapper command which executes 'git-foo' if you call it
     with 'foo' as its first parameter.  It searches for 'git-foo'
     in the GIT_EXEC_PATH environment variable, which has a default
     set at compile time, usually to the directory you are going to
     install Git into.

  2) Its most of the core Git plumbing.  There are currently around 48
     'builtin' commands.  These are things which 'git' knows how to do
     without executing another program.  If you look at the installation
     these 48 builtin commands are just hardlinks back to 'git'.  For
     example 'git-update-index' is really just a hardlink back to 'git'
     and 'git' knows to perform the update index logic when its called
     as either 'git-update-index' or as 'git update-index'.

We're moving more towards #2, but there are still a large number

No.  In Git at least half of the things Git can do are not builtin to
'git' and thus require exec()'ing an external program (e.g. git-fetch).
However these often appear as though they are command line options to
'git' as 'git fetch' just means exec 'git-fetch' (by #1 above).

On the other hand there are a wide range of tools which are more or
less the same thing, just with different options applied to them.
All of the diff programs, log, whatchanged, show - these are all
just variations on a theme.  Their individual implementations are

Git is still very much tied to a shell.  For example 'git commit'
is really the shell script 'git-commit'.  This is a rather long
shell script and it does a lot of things for the user; not having
it would make Git useless to for most people.  It also has not been
rewritten in C.  There is a roadmap however to convert it to C to
help remove the programmable shell requirement and people have been

Git runs on Cygwin.  But there's no native Win32 (without Cygwin)
version of Git because:

 - Git uses POSIX APIs and expects POSIX behavior from the OS its
   running on.  Without ...
From: Linus Torvalds
Date: Monday, October 23, 2006 - 11:31 pm

Historically, "git" was _only_ a wrapper program. When you did

	git log

it just executed the real program called "git-log", which was often a 
shell-script. That was just so that things could easily be extended, and 
you could use shell-script for simple one-liner things, and native C for 
more "core" stuff.

For example, "git log" used to be a one-line shell-script that just did

	git-rev-list --pretty HEAD | LESS=-S ${PAGER:-less}

but it ended up being a lot more capable, and eventually just rewritten 
as an internal command..

These days, most of the simple things like "git log" are all built into 
the "git" program, although for anything not built in, it still acts as 
just a wrapper, which allows not only random functionality to still be 
written in shell (or sometimes perl), but also ends up being the simplest 
possible plug-in mechanism: you can define your own commands by just 
writing a shell-script thing, calling it "git-mycommand", installing it in 
the proper place, and it ends up being accessible as "git mycommand".


Almost all of "core" git is pure C, which unlike something like python or 
perl obviously tends to have a fair amount of system issues. That said, 
much of it really is fairly portable, so doing the built-in git stuff 
should _largely_ work even natively under Windows with some effort.

The problem ends up being that few enough people seem to develop under 
Windows, and the cygwin port works better (because it handles a number of 
the portability issues and also handles the scripts that are still shell). 
Those two issues seem to mean that not a lot of effort has been put into 
aiming for a native windows binary (or into moving away from shell 
scripts).

Most of the shell scripts really are fairly simple. So if somebody 
_really_ wanted to, it would probably not be hard to spend some effort to 
either just write them as C and turn them into built-ins, or porting them 
to some other scripting language.

Of course, most Windows users ...
From: David Rientjes
Date: Monday, October 23, 2006 - 11:45 pm

Some of the internal commands that have been coded in C are actually much 
better handled by the shell in the first place.  It's much simpler to 
write and extend as well as being much more traceable for runtime 
problems.  The shell commands that would be used for most of these git
routines have options for requesting it to be more verbose so the user 
actually has a lot more power over reporting and/or logging.  In addition 
it tends to be more portable and the amount of code is drastically reduced 
in a script style of programming.  The criticisms against such use of 
shell scripting tends to be a matter of personal taste.  People believe, 
for some reason or another, that it is a lower-class type of programming 
that is less robust and is harder to understand.  Seldom have there been 
cogent arguments for coding such features in C as opposed to shell 
scripting, especially in the case of git where the shell becomes a very 
powerful ally.

		David
-

From: Linus Torvalds
Date: Tuesday, October 24, 2006 - 8:15 am

Yes. However, from a portability (to Windows) standpoint, shell is just 
about the worst choice.

Not that perl/python/etc really help - unless the _whole_ program is one 
perl/python thing. Windows just doesn't like pipelines etc very much.

So I'd like all the _common_ programs to be built-ins..

		Linus
-

From: David Rientjes
Date: Tuesday, October 24, 2006 - 1:12 pm

And I would prefer the opposite because we're talking about git.  As an 
information manager, it should be seen and not heard.  Nobody is going to 
spend their time to become a git or CVS or perforce expert.  As an 
individual primarily interested in development, I should not be required 
to learn command lines for dozens of different git-specific commands to do 
my job quickly and effectively.  I would opt for a much more simpler 
approach and deal with shell scripting for many of these commands because 
I'm familiar with them and I can pipe any command with the options I 
already know and have used before to any other command.

As a developer on Linux based systems, I should not need to deal with 
code in a revision control system that is longer and less traceable 
because the authors of that system decided they wanted to support Windows 
too.  Moving away from the functionality that the shell provides is a 
mistake for a system such as git where it could be so advantageous because 
of the inherent nature of git as an information manager.

This is the reason why I was a fan of git long ago and used it for my own 
needs before tons of unnecessary features and unneeded complexity was 
added on.

		David
-

From: Jeff King
Date: Wednesday, October 25, 2006 - 1:48 am

I don't understand how converting shell scripts to C has any impact
whatsoever on the usage of git. The plumbing shell scripts didn't go
away; you can still call them and they behave identically.


Some C->shell conversions may have made the code "longer and less
traceable." However, many of those conversions caused the code to be
shorter (because communication between C functions is simpler than going
over pipes, and because anything involving a data structure more complex
than a string is difficult in the shell) and more robust (fewer
opportunities for quoting/parsing errors, and none of the shell gotchas
like missing the error code in "foo | bar").

Do you have any specific reason to believe that the git code is of worse

Is there something you used to do with git that you no longer can? Is
there a reason you can't ignore the newer commands?

-Peff
-

From: David Rientjes
Date: Wednesday, October 25, 2006 - 2:19 am

No, my criticism is against the added complexity which makes the 
modification of git increasingly difficult with every new release.  It's a 
pretty limited use case of the entire package, I'm sure, but one of the 
major advantages that I saw in git early on was the ability to tailor it 
to your own personal needs very easily with some simple shell knowledge 

You're ignoring the advantageous nature of the shell with regard to git.  
The shell is so much better prepared to deal with information managers by 
nature than the C programming language.  It's not a matter of shorter 
code, per se, it's about the developer's ability to make small changes to 
the operation of the information manager on demand to tailor to his or her 
_current_ needs.  For any experienced shell programmer it is so much 
easier to go in and change an option or pipe to a different command or 
comment out a simple shell command in a .sh file than editing the C code.  
And sometimes it's necessary to have several different variations of that 
command which is very easy with slightly renamed .sh files instead of 
adding on more and more flags to commands that have become so complex at 
this point that it's difficult to know the basics of how to manage a 
project.

This all became very obvious when the tutorials came out on "how to use 
git in 20 commands or less" effectively.  These tutorials shouldn't need 
to exist with an information manager that started as a quick, efficient, 
and _simple_ project.  You're treating git development in the same light 
as you treat Linux development; let's be honest and say that 99% of the 
necessary git functionality was there almost a year ago and ever since 
nothing of absolute necessity has been added that serious developers care 
about in a revision control system.  Look at LKML, nobody is waiting on 
these new releases and upgrading to them when they're announced.  And this 
is the community that git has _targeted_.  Most other projects don't care 
about the syntactics ...
From: Jeff King
Date: Wednesday, October 25, 2006 - 2:49 am

OK, you seemed to imply problems for end users in your first paragraph,

Yes, it's true that some operations might be easier to play with in the
shell. However, does it actually come up that you want to modify
existing git programs? The more common usage seems to be gluing the
plumbing together in interesting ways, and that is still very much

You can do the same thing in C. In fact, look at how similar
git-whatchanged, git-log, and git-diff are.

I don't understand how a C->shell conversion has anything to do with
options being added. If you look at all of the conversions, they

Sorry, I don't see how this is related to the programming language _at
all_. Are you arguing that the interface of git should be simplified so
that such tutorials aren't necessary? If so, then please elaborate, as
I'm sure many here would like to hear proposals for improvements. If
you're arguing that git now has too many features, then which features

I don't agree with this. There are tons of enhancements that I find
useful (e.g., '...' rev syntax, rebasing with 3-way merge, etc) that I
think other developers ARE using. There are scalability and performance
improvements. And there are new things on the way (Junio's pickaxe work)
that will hopefully make git even more useful than it already is.

If you don't think recent git versions are worthwhile, then why don't
you run an old version? You can even use git to cherry-pick patches onto


I don't agree, but since you haven't provided anything specific enough

Can you name one customization that you would like to perform now that
you feel can't be easily done (and presumably that would have been
easier in the past)?

-Peff
-

From: Andreas Ericsson
Date: Wednesday, October 25, 2006 - 6:49 am

Indeed. I still use my old git-send-patch script whenever I want to send 
patches, simply because I don't like git-send-email and its defaults 
much. The interface hasn't changed one bit since I wrote it. That's 
pretty stable, since send-patch was created couple of hours before git.c 
was submitted to the list, as I wrote the "send-patch" script to send 
the patch that did the rewriting.

I'm personally all for a rewrite of the necessary commands in C 
("commit" comes to mind), but as many others, I have no personal 
interest in doing the actual work. I'm fairly certain that once we get 
it working natively on windows with some decent performance, windows 
hackers will pick up the ball and write "wingit", which will be a log 
viewer and GUI thing for 
fetching/merging/committing/reverting/rebasing/sending patches and 
whatnot. Possibly it will have hooks to Visual C++ or some other IDE. I 
don't know how that sort of thing works, but I'm sure someone clever and 
bored enough will want to investigate the possibilities.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: David Lang
Date: Wednesday, October 25, 2006 - 2:51 pm

a quick lesson on program nameing

              ^^^^^^

how many other people read this as 'wing it' rather then 'win git'? ;-)

David Lang

-

From: Shawn Pearce
Date: Wednesday, October 25, 2006 - 3:15 pm

Yes, that's certainly a less than optimal name...

What about gitk?  Is it "gi tk" or "git k" ?  This has actually
been the source of much local debate.  :-)

-- 
Shawn.
-

From: David Lang
Date: Wednesday, October 25, 2006 - 3:41 pm

in this case I think it's both, (or technicaly git tk with the double t's 
combined to save typeing)

David Lang
-

From: David Rientjes
Date: Wednesday, October 25, 2006 - 10:21 am

Yes, it does.  I'll give you an example from six months ago: there was a 
need for the group that I work with to support a faster type of hashing 
function for whatever reason.  This would have been simple with previous 
versions of git, but if you've ever looked at the SHA1 code in git, you'll 
realize that you're probably better off never trying to touch it.  There 
is absolutely _no_ abstraction of it at all and the code is so deeply 
coupled in the source that abstracting it away is a pain.

Likewise, there is always room for personal or organizational tweaks on 
the part of the developer.  Things like distributed pulling and 
merging should actually be pretty simple to implement if the complexity 
wasn't so high in the merge-* family.  This is something I implemented 
after an enormous headache because we were dealing with very large 
projects: yes, larger than the Linux kernel.  And this is _exactly_ where 
piping would help; we have implementations of distributed grep over very 

No you can't.  Making a one line addition, commenting out a line, or 
changing a simple flag in a shell script is much easier.  And like I 
already said, you can save multiple versions for your common use if you 
work on a specific project much of the time and change how it operates 
depending on the needs of that one project so you never need to do it 
again or you can _distribute_ that shell file to your colleagues so that 
everybody is doing their work via the same method.  This makes it so you 
can just say "type X, then type Y, then type Z" and everybody is operating 

It's not, it's related to the original vision of git which was meant for 
efficiency and simplicity.  A year ago it was very easy to pick up the 
package and start using it effectively within a couple hours.  Keep in 
mind that this was without tutorials, it was just reading man pages.  
Today it would be very difficult to know what the essential commands are 
and how to use them simply to get the job done, unless you use the ...
From: Jeff King
Date: Wednesday, October 25, 2006 - 2:03 pm

First off, thanks for giving examples. I was having trouble seeing where

Is this really an artifact of the C code versus the shell code? A lot of
parts of the system need to touch SHA1 hashes, and I think it has been
sprinkled throughout the code from the beginning. In fact, I think the
libification of git-rev-list has made the code a lot _cleaner_ (and
shorter), in that the C programs can all use the same nice interface.
The external interface is still there, but now there is consistency
among programs when using rev syntax (ISTR issues in the distant past
where program X didn't understand syntax because the parsing was all

I guess I don't see how this was ever any easier. Do you mean that when
we called an external grep, it was easier to plug in your distributed


The "same thing" I referred to was changing behavior trivially based on

Sure, shell can be easier to modify (though in well-written C, you're
likely just commenting out a few lines or a function call -- maybe you
can argue whether or not git is well-written). However, I remain
unconvinced that this is a common use case, or that it is something that
should weigh heavily when compared with portability, efficiency, or


Simplicity is fine if all you want is plumbing. But normal people want
to _use_ git without hacking their own shell scripts, so it makes sense
to provide the scripts that other people have hacked together (as shell,
perl, C, or whatever). Do I want to use git-send-email? Hell no, the
interface is terrible to me. But do the plumbing commands still exist so
that I can use the scripts I hacked together? Absolutely. I can take

Was it? The most common complaint I've heard about git, starting a year
ago, was the lack of documentation and tutorials and the complexity of

I think this has been the case for a long time. It's just that there

No, it illustrates a lack of simplicity that currently exists; it says

There has been work on scaling to larger repositories (e.g., mozilla and
xorg prompting ...
From: Andreas Ericsson
Date: Thursday, October 26, 2006 - 4:15 am

Compared to todays version, original git was neither efficient nor 
simple. Unless you mean "some random version along the way where git had 
everything *I* need and not the useless cruft that other people use", in 

Have you tried "git --help"? It shows the most common commands and a 
short description of what they do. It's a very good pointer to which 
man-pages you need to read, and I imagine this would actually be one of 
the very first commands that new git users try. If they don't but just 
expect things to work according to some premade mental model they have 

No it hasn't. The ten or so commands that Linus first introduced when 
announcing git still work pretty much the same. Nobody in their right 
mind would ever claim that those ten commands made up anything that even 
remotely resembled a complete scm, but they were something to build on 
by anyone who wanted to extend it. So far, ~220 people have wanted to 
extend it in ways that others thought useful, because their patches are 

Well, my head hurt when I tried to learn CVS without a tutorial, and 
mercurial and darcs and svn as well. I didn't pick up the functionality 
of the 'ls' command completely without reading the man-page for it. If 
you want something that works for everyone without having to read any 
documentation what so ever, buy Lego, cause computers ain't for you, my 

Actually, I don't see why git shouldn't be perfectly capable of handling 
a repo containing several terabytes of data, provided you don't expect 
it to turn up the full history for the project in a couple of seconds 
and you don't actually *change* that amount of data in each revision. If 
you want a vcs that handles that amount with any kind of speed, I think 
you'll find rsync and raw rvs a suitable solution.

On the other hand, you fellas at google don't really use git to store 
the data from the search database, do you? I mean, it's written for 
source control management. People that tried to keep their mboxes in git 
failed ...
From: David Lang
Date: Thursday, October 26, 2006 - 9:30 am

actually, there are some real problems in this area. the git pack format can't 
be larger then 4G, and I wouldn't be surprised if there were other issues with 
files larger then 4G (these all boil down to 32 bit limits). once these limits 
are dealt with then you will be right.

David Lang
-

From: Nicolas Pitre
Date: Thursday, October 26, 2006 - 10:03 am

There is no such limit on the pack format.  A pack itself can be as 
large as you want.  The 4G limit is in the tool not the format.

The actual pack limits are as follows:

	- a pack can have infinite size

	- a pack cannot have more than 4294967296 objects

	- each non-delta objects can be of infinite size

	- delta objects can be of infinite size themselves but...

	- current delta encoding can use base objects no larger than 4G

The _code_ is currently limited to 4G though, especially on 32-bit 
architectures.  The delta issue could be resolved in a backward 
compatible way but it hasn't been formalized yet.

The pack index is actually limited to 32-bits meaning it can cope with 
packs no larger than 4G.  But the pack index is a local matter and not 
part of the protocol so this is not a big issue to define a new index 
format and automatically convert existing indexes at that point.


Nicolas
-

From: David Lang
Date: Thursday, October 26, 2006 - 10:04 am

the offset within a pack for the starting location of an object cannot be larger 
then 4G.

David Lang
-

From: Linus Torvalds
Date: Thursday, October 26, 2006 - 10:16 am

Well, strictly speaking, even that isn't actually a limit on the _pack_ 
format itself.  It's really just the (totally separate) index that 
currently uses 32-bit offsets.

For example, you can actually use the pack-file to transfer more than 4GB 
of data over the network. You'd not need to change the format at all. Only 
the local _index_ of the result needs to change - but we never transfer 
that at all (it's always generated locally), so that's really a separate 
issue.

It's not even hard to fix. It's just that right now, the biggest 
repository that we know about (mozilla) is not even close to the limit. 
And it took them ten years to get there. So if the mozilla people switch 
to git, and keep going at the same rate, we have about 70 years left 
before we need to fix the indexing ;)

(Of course, other projects, like the kernel, seem to grow faster, so it 
might be "only" a decade or two - but since the index format is a local 
thing, even that won't be too painful, since we don't really need a global 
flag-day once we decide to start supporting larger offsets in the index)

		Linus
-

From: Nicolas Pitre
Date: Thursday, October 26, 2006 - 10:24 am

To be more exact, yes.  But I don't think we'll ever consider use 
scenarios with packs > 4G with the current index format.  There is 
simply no point.


Nicolas
-

From: Junio C Hamano
Date: Wednesday, October 25, 2006 - 2:08 pm

That's also I wondered, but I also can understand where David is
coming from, and I agree with him to a certain degree.

When I learned git, I learned a lot from trying to piece my own
plumbing together, since there weren't much Porcelain to speak
of back then.  Then we had many usability enhancements before
the 1.0 release to add Porcelainish done as shell scripts.

This had two positive effects, aside from adding usability.
Interested people had more shell scripts to learn from.  The
scripts were easy to adjust to feature requests from the list,
and as we learned from user experience based on these scripts it
was definitely quicker to codify the best current practice
workflow in them than if they were written in C.  It would have
taken us a lot more effort to add "git commit -o paths" vs "git
commit -i paths" if it were already converted to C, for example.
This continued and our Porcelainish scripts matured quickly.

Then 1.3 series started to move some of the mature ones into C.
As many people already have pointed out, being written in C and
not doing pipe() has two advantages (better portability to
platforms with awkward pipe support and one less process usually
mean better performance).  git-log family with path limiting had
a real boost in performance because the path limiting can be
done in the revision traversal side not diff-tree that used to
be on the downstream side of the pipe.  So this in overall was a
right thing to do.

One thing we lost during the process, however, is a ready access
to the pool of "sample scripts" when people would want to
scratch their own itches.  Linus's original tutorial talked
about "this pattern of pipe is so useful that we have a three
liner shell script wrapper that is called git-foo", and
interested people can easily look at how the plumbing commands
fit together.

The plumbing is still there, and I and people who already know
git would still script around git-rev-list when we need to (by
the way, scripting around git-log is a ...
From: Jeff King
Date: Wednesday, October 25, 2006 - 2:16 pm

I think this is part of the complication of discussion I'm having with
David. There are really two sets of users for git: people who want to
hack scripts based on plumbing, and people who want everything to "just
work." I think it's a good point that as the system matures (movement

Housing historical implementations seems like it would just lead to

I think this is a better approach. I think it also makes sense to
let people know that it's an acceptable approach to start new features
as shell and then have them mature to C (looking at the current
codebase, and some of Dscho's rantings, one might get the impression
that git isn't accepting new shell scripts).

-Peff
-

From: Junio C Hamano
Date: Wednesday, October 25, 2006 - 2:32 pm

I agree.  Although that ought to be rare in principle, given
that one advertised feature of git is that the plumbing is
supposed to be stable, we occasionally had to have to subtly
break things to improve plumbing and at the same time run around
to make sure that all the script users (both in-tree and

New commands like pickaxe and for-each-ref were easier to code
in C, and cherry rewrite in C was really about how crufty the
shell script version was from the beginning (and there weren't
in-tree users of it left so it was not maintained at all but
thanks to plumbing being stable it just kept working perhaps
correctly but still horribly).

-

From: Junio C Hamano
Date: Wednesday, October 25, 2006 - 2:50 pm

I meant "Documentation/howto"; sorry for the noise.

-

From: Andreas Ericsson
Date: Thursday, October 26, 2006 - 4:25 am

Isn't this how git has been developed since day one, more or less? If a 
command is missing, it gets added as a shell-script. I agree with you on 
the "pipes from this sent here does this, and look how useful it is" 
lectures are gone since many commands were rewritten. Otoh, they're gone 
because they now instead provide examples on how to interface with the 
libified parts of git, so it's not a loss per se, just a switch in what 
it teaches.

I also agree with David that shell is much more fun to muck around with 
and prototype in, because you see results to much faster. However, since 
our plumbing is so rock-solid (and getting extended with --stdin options 
to more and more commands), I see no reason why we shouldn't have a "how 
to extend git" with the old shell-based porcelain scripts up somewhere 
at the web. Perhaps it would kill two birds with one stone and increase 
the addition of new utilities to git, while at the same time keeping the 
already rewritten commands in C.

Btw, the old shell-versions still work with the new plumbing (well, 
mostly anyways). They just have problems with filenames and revisions 
with spaces and special chars and things like that, same as they've 
always had.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Linus Torvalds
Date: Wednesday, October 25, 2006 - 7:29 pm

Others have answered this, but the thing is, it was a _wonderful_ way to 
prototype things, and to add obvious (and nice) early UI issues that made 
git much more usable.

But no, things are not better handled in shell.

Shell tends to make some things really _hard_ to do. A fair chunk of the 
rewrite was because core functionality made things easier. For example, 
the whole internal revision partsing library is really actually a lot more 
capable than we could easily expose as a simple pipeline: the original 
"git log" pipeline worked very well, and you can actually still use those 
kinds of pipelines for a lot of work, but at the same time, some things 
really just work better when you have "deeper" interfaces.

For example, the revision parsing library not only makes "git log" trivial 
as C, it's also needed for an efficient "git annotate/blame/pickaxe" kind 
of thing. There are also things that are just ludicrously hard to do in 
shell-script, like exclusive and atomic file operations.

We used perl and python for some things, but finding people who know them 
tends to be problematic, and python in particular was also a dependency 
problem too, so the fact that the default recursive merge was python 
wasn't wonderful.

So I think the shell-scripts are great (and some of them quite likely will 
remain around for the forseeable future) for prototyping, but for core 
functionality they were not wonderful. 

They are sometimes good examples of how powerful a scripting language git 
can be, though. Scripting is still very important, even though a lot of 
the core stuff doesn't necessarily depend on being scripts itself. 

But error handling in scripting is very hard or inconvenient, especially 
in pipelines. So some things were actively problematic (ie "git-rev-list 
--all --objects | git-pack-objects") and moving it to use the internal 
library interface was simply technically the right thing to do.

Others had real performance issues, eg the new merge in C is a lot ...
From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 3:03 pm

Excuse me? What does that "throws away your local commit ordering" mean?

A fast-forward does no such thing. It leaves the local commit ordering 
alone, it just appends other things on top of it. It's the only sane thing 
you can do, since the work you merged was already based on your top 
commit.

So generating an extra "merge" commit would be actively wrong, and adds 
"history" that is not history at all.

It also means that if people merge back and forth from each other, you get 
into an endless loop of useless merge commits. What's the point? They only 
clutter up the history, and they mean that you can never agree on a common 
state.

There's no reason _ever_ to not just fast-forward if one repository is a 
strict superset of the other.

You must be doing something wrong. Is it just that people want to pee in 
the snow and leave their mark?

		Linus
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 3:53 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Say this is the ordering in branch A:

a
|
b
|
c

Say this is the ordering in branch B:

a
|
b
|\
d c
|/
e

When A pulls B, it gets the same ordering as B has.  If B did not have e

It's not a tree change, but it records the fact that one branch merged

You can pull if you don't want that.  We haven't found that people are

Maybe not in Git.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNV7u0F+nu1YWqI0RAhGtAJwOlWpl088pbl63EHyF04qQCYlXBgCfW0Tm
cfXuE0vqeWelfFbpzffiCNI=
=McQ2
-----END PGP SIGNATURE-----
-

From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 4:09 pm

Sure. But that doesn't throw away any local commit ordering. The original 
order (a->b->c) is still very much there. The fact that there was a branch 
off 'b' and there is also (a->b->d) and a merge of the two at 'e' doesn't 

But that's a totally specious "record". It has no meaning in a distributed 
SCM. There is absolutely zero semantic information in it.

The fact that you _locally_ want to remember where you were is a total 
non-issue for a true distributed system. You shouldn't force everybody 
else to see your local view - since it has no relevance to them, and 

I don't think there is any in bzr either. Can you explain?

In other words, the empty merge is totally semantically empty even in the 
bazaar world. Why does it exist?

		Linus
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 5:23 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


After the pull, it's no longer the mainline ordering for the branch.  c
is represented a revision that was merged into the branch, while d is

It means the the order that revisions are shown in log commands changes,

It records the committer, the date, the commit message, the parent


It exists because it is useful.  Because it makes the behavior of bzr
merge uniform.  Because in some workflows, commits show that a person
has signed off on a change.

It's not something special-- it's just another commit, like regular
commits, and merge commits.  It would be harder to forbid than it is to
permit.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXQQ0F+nu1YWqI0RAnxDAJ4hbuLkEK1eBlyoEOz7NAlqLVth9gCfed4w
nfeiR2KVvN+N9zdSrC8MKcY=
=et73
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 5:46 pm

Well, that is another example while generation number is/can be global,


...but that means that revision numers are totally, absolutely useless.
Unless by some miracle of engineering, or adding namespace, they can be

All totally empty information. What should be commit message? I have
fetched changes from remote repository? You can remove one of parents
(the one of pointing to before fast-forward "merge") without changing
reachability.

              ---------
             /         \

But if you record "fast-forward merge", you force all people pulling
from your repository to have this purely local and without any significant

Signing off the fact of fetching changes? For true merge you are signing
off the fact that there were no conflicts, or you sign off your conflict

Actualy the check is very easy. And you have to do similar check when
fetchin/pushing to ensure that you don't clobber your changes.
-- 
Jakub Narebski
Poland
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 6:00 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


No.  The numbering always follows the leftmost parent.  So each revision


No, because no one pulls unless they're trying to maintain a mirror of

Even if I agreed that the revision was meaningless, the cost of such a

You sign off on the contents of the revision you fetched.  You say "I

Agreed.  It's just that not checking is easier still.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXzD0F+nu1YWqI0RAiGvAJsEbPNNlqZ7QCH7EE39YABqEm/BtwCaAxIo
NHqG4NVZpvymTUlCLYyCqKM=
=YUdC
-----END PGP SIGNATURE-----
-

From: Carl Worth
Date: Tuesday, October 17, 2006 - 6:25 pm

Aaron, thanks for carrying this thread along and helping to bridge
some communication gaps. For example, when I saw your original two two
diagrams I was totally mystified how you were claiming that appending
a couple of nodes and edges to a DAG could change the "order" of the
DAG.

I think I understand what you're describing with the leftmost-parent
ordering now. But it's definitely an ordering that I would describe as
local-only. That is, the ordering has meaning only with respect to a
particular linearization of the DAG and that linearization is

If in practice, nobody does the mirroring "pull" operation then how
are the numbers useful? For example, given your examples above, if
I'm understanding the concepts and terminology correctly, then if A
and B both "merge" from each other (and don't "pull") then they will
each end up with identical DAGs for the revision history but totally
distinct numbers. Correct?

So in that situation the numbers will not help A and B determine that
they have identical history or even identical working trees. So what
good are the numbers?

I can see that the numbers would have applicability with reference to
a single repository, (or equivalently a mirror of that repository),
but no utility as soon as there is any distributed development
happening.

-Carl
From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 8:10 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Well, the linarization for any particular head is well-defined, but

The DAGs will be different.  If A merges B, we get:

a
|
b
|\
c d
|\|
| e
|/
f

If B merges A before this, nothing happens, because B is already a
superset of A.

If B merges afterward, we get this:
a
|
b
|\
d c
|/|
e |
|\|
| f
|/


They are good for naming mainline revisions that introduced particular

Well, there's distributed, and then there's *DISTRIBUTED*.  We don't
quasi-randomly merge each others' branches.  We have a star topology
around bzr.dev.  So when we refer to revnos, they're usually in bzr.dev.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNZsp0F+nu1YWqI0RAkmWAJ9PkrkubIHVgAn5Wbdkg9IBAHCviACdFx2x
6ClmK4GmC1pRuRQACcSijNM=
=SM1Y
-----END PGP SIGNATURE-----
-

From: Andreas Ericsson
Date: Wednesday, October 18, 2006 - 1:39 am

Seems like an awful lot of merge commits. In git, I think these trees 
would be identical (actually both to bazaar and to each other), with the 
exception that the 'g' commit wouldn't exist, since git does 
fast-forward and relies on dependency-chain only to present the graph 
instead of mucking around with info in external files (recording of 

As explained above, they would be identical in git. The fact that you 
register a fast-forward as a merge makes them not so, but this is 

So in essence, the revnos work wonderfully so long as there is a central 
server to make them immutable?

Doesn't this mean that one of your key features doesn't actually work in 
a completely distributed setup (i.e., each dev has his own repo, there 
is no mother-ship, everyone pulls from each other)?

I can see the six-line hook that lays the groundwork for this in git 
before me right now. I'll happily refuse to write it down anywhere. I 
get the feeling that sha's are easier to handle in the long run, while 
revno's might be good to use in development work. In git, we have 
<branch/tag/"committish">~<number> syntax for this.

In my experience, finding the revision sha of an old bug is what takes 
time. Copy-paste is just as fast with 20 bytes as with 4 bytes. Honestly 
now, do you actually remember the revno for a bug that you stopped 
working on three weeks ago, or do you have to go look it up? If someone 
wants to notify you about the revision a bug was introduced, do they not 
communicate the revno to you by email/irc/somesuch?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Peter Baumann
Date: Wednesday, October 18, 2006 - 2:04 am

Ok. This I don't get. Let me recaptulize:

Branch A
a
|
b
|
c

Branch B
a
|
b
| \
d c
| /
e

In branch A, do merge branch B (git pull B) you get as result branch B, because
A fastforwards to B and you don't get a merge commit f

In branch B, do merge branch A (git pull A), the result would be
branch B, because
we are already uptodate.

You _never_ have a commit f or g.

-Peter
-

From: Jakub Narebski
Date: Wednesday, October 18, 2006 - 2:07 am

Revnos were supposed to be superior to using sha1 (or shortened sha1)
as commit identifiers because of two key features:
 1. They were simplier than sha1, therefore easier to use
 2. Given two revisions related by lineage (i.e. one is ancestor of
    the other) you can from a glance know which revision was earlier

But the details invalidated 1.: for complicated history, for a large
project, with many contributors and nonlinear development we have 
www.repository.com:127.2.31.57 vs 988859a (7 chars shortcut of sha1)
to have immutable revno. And we have to use _immutable_ (up to few
years) revison identifiers, unless we want our "simple ids" scheme
to make a mess...

And I'm not sure if 2. is true, if even for revisions with direct
lineage we don't have to compare 127.15.2.16 with 210.2.20.3 for
example. Having generation number would solve 2.; as of now git
check for fast-forward case by checking if merge-base of two
revisions is one of the revisions.
-- 
Jakub Narebski
Poland
-

From: Matthew D. Fuller
Date: Wednesday, October 18, 2006 - 3:32 am

On Wed, Oct 18, 2006 at 10:39:32AM +0200 I heard the voice of

It seems from my somewhat detached perspective that there's a lot of
conflation of 'conventions' with 'capabilities' around this thread...


With a single linear branch, revnos work wonderfully, and are probably
much more useful than any sort of UUID.  It would be silly in this day
and age to design a VCS aimed specifically for this use case, of
course.  That doesn't mean a VCS shouldn't make it easy, though.


With a star config, revnos are useful locally and with reference to
the "main" branch[es].  And, most of the world is star configs of one
sort or another.  Actually, one might say that practically ALL the
world outside of linux-kernel is star-configs   ;)

In many cases in the star setup, a revno (particularly along the
'trunk') is more directly useful than a UUID; consider particularly
the case of somebody who's just mirroring/following, not actively
developing.  In some cases, the UUID is more useful.  Certainly, using
a revno in a case where the UUID is more appropriate is Bad, but
that's just a matter of using the right tool.


With a uber-distributed full-mesh setup, revnos may be basically
useless for anything except local lookups (which boils down to
"useless for most anything you'd identify a revision for").  For that
case, you'd practically always use the UUID, and pretend revnos don't
exist.


The merge revno forms (123.5.2.17 and the like), I'm somewhat
ambivalent about in many ways.  But, you don't have to use them any
more than you have to use "top-level" revnos.  If either form of revno
is Wrong for your case (whether it be because "I hate numbers
wholesale", or because "Numbers don't cover this case usefully"), then
you just use the UUID and pretend the number isn't there.  If you
wanted them completely out of sight, I wouldn't expect it to be very
hard to talk bzr into never showing the revnos and just showing the
UUID ("revid").



[ I don't speak for bzr, despite the fact that ...
From: Andreas Ericsson
Date: Wednesday, October 18, 2006 - 4:19 am

That might be the case today. However, since we introduced git at the 
office, mini-projects are cropping up like mad, and pieces of toy-code 
are being pushed around among the employees. When something is found to 
be useful enough to attract management attention, it's given a spot at 
the "master site". It doesn't need one. It's just that we have this one 
place where gitweb is installed, which management likes whereas devs 
don't have that on their laptop. It's also convenient to have one place 
to find all changes rather than pulling from 1-to-N different people 
just to have a look at what they've done.

The point I'm trying to make here is that the star config might be the 
most common case today because
a) old scm's enforced this use case and it is therefor the most common 
way just out of habit.
b) projects you actually *see* have gotten past the "Joe made some cool 

I can easily imagine the use case Linus pointed out with BK. Because 
revnos work wonderfully 80% of the time, people get confused, frustrated 

But they *do* exist, and they *usually* work, so people are bound to try 
them first. Teaching them when they work and when they don't (or rather, 
when they should and when they shouldn't, cause they will work by 
accident sometimes too) is bound to be a lot harder than sending them a 

So what's the point in having them? You can't seriously tell me that you 
think of 123.5.2.17 as something you can easily remember, do you? Count 

Not really. It's just that case 3 is the most flexible of them all. It's 
trivial to enforce linear development in git. Just add a hook that 
forbids merge commits. Set up a "master repo" and put the hook there and 
you've turned it into CVS with off-line log-browsing (more or less).

Set up a master-server and enable the reflog there and you've turned it 
into bazaar, more or less.

In git, the mothership repo is there for conveniance, because it's nice 
to have one place to set up mailing-list hooks, gitweb, git-daemon and ...
From: Matthew D. Fuller
Date: Wednesday, October 18, 2006 - 5:43 am

On Wed, Oct 18, 2006 at 01:19:10PM +0200 I heard the voice of


c) Stars work well as a mental model for humans.

Heck, in large, Linux is star-ish.  There s "2.6.1", "2.6.2", etc;
that's a trunk.  Any time you have releases, you're establishing a
"master" branch.  For most people using Linux, there's a trunk,
whether it's the kernel.org trunk, or the "What Redhat ships" trunk,
etc.  The closer you drill to the day-to-day work on the kernel, the
farther it gets from trunks, but if it were full-mesh at all levels I
don't think it would be nearly as usable for regular computing tasks
as it is.


Perhaps someday a heavy full-mesh setup will be the common case for
VCS usage.  I find that very difficult to buy for various reasons, but
it could happen.  If it does, bzr may well revisit the choice and
decide revnos contribute little enough marginal value as to be a loss,

Perhaps, for some projects.  And in those cases, perhaps you'd want to
flip a hypothetical "dump those numbers in the bin" switch.  That
doesn't mean every project wants to, or that those projects who don't
and have no trouble and discernible gain from revno usage are

No, I don't.  But I don't use merge revnos for various reasons, one of
the primary ones being that they don't currently intuitively follow
from me (and that intuitiveness is the major attraction of revnos in
the first place).

I rarely refer to non-mainline revisions at all, in fact.  And I use
revnos for mainline revisions regularly.  Heck, I communicate revnos
_verbally_; people handle that easily with numbers, not so easily with
hex strings.  The vast majority of my branches are simple cases, and I
like simple tools that match simple mental models for them.  For the
more intricate cases, revids provide a more rigorous tool, and I WANT
a VCS that lets me choose which is appropriate.  If I wanted a

Yes, but this doesn't necessarily mean everything you seem to try and
cover with it.  The more rigorous tool will cover the simplest case
(those ...
From: Sean
Date: Wednesday, October 18, 2006 - 6:02 am

On Wed, 18 Oct 2006 07:43:20 -0500

Just to be clear here, Git is also able to  supports this model if
you so choose.  It's quite easy for a server to generate Git tags
for every commit it gets.

It's just that this is basically a non issue in the Git world.  People
who use Git aren't crying out for salvation from sha1 numbers.  So I
think this entire discussion is a bit overblown.

But just to be clear, there is nothing in the Git model that prohibits
tagging every commit with something you find less objectionable than
sha1's.  They can appear in the log listings and in gitk etc, and
everyone who pulls from the central server will get them.  In fact,
for some imports of other VCS into Git, exactly that is done; so every
commit can be referenced by its sha1 _or_ the "friendly" number it was
known by in its original VCS.

Sean
-

From: Jakub Narebski
Date: Wednesday, October 18, 2006 - 6:10 am

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 9:07 am

I really don't think that's even true.

Most projects do tend to have a star-like setup, but I think that's 
largely due to historical tools, not mental models. 

For example, I used CVS professionally for too long a few years ago, and 
the thing I _really_ hated was exactly how it forced people who were 
working on "experimental stuff" to be so tightly organized around the 
central repository (and how they had to do things that were visible and 
annoying to the mainline).

And I think that's where the "star-like" situation breaks down: when you 
have a group of people who go off to do something experimental. Suddenly 
the "mainline" in that case isn't the central and most important 
repository any more, and instead you really have another second (and 
third, fourth etc) "centerpoint" that another group works around.

Now, what does that mean? It means that whenever you look at a big project 
from the outside, you tend to see a star-like thing: there's the "big 
common thing", and you won't even be _seeing_ the off-shoots, because they 
tend to be used by developers to try out new ideas etc. So it looks like a 
star, but it really isn't, and shouldn't be.

An SCM should support the _developers_, not the users. The users don't 
need an SCM, they just need a place to fetch the "standard" thing 
(preferably with a vendor that supports them or at least makes them feel 
comfy). But an SCM really should support the off-shoots, because that's 
where the exciting stuff happens.

Btw, this is also why distribution is so fundamentally important:

Most of the off-shoots tend to be failures, but that is as it should be. 
Again, this is where SVN and CVS and other centralized models fail 
_miserably_. Because branches are in a centralized repository, the cost of 
failure is visible to all, and thus people don't like creating branches 
for things that don't look "obviously viable" to the people around the 
central repository.

In contrast, in a truly distributed environmen, a ...
From: Carl Worth
Date: Wednesday, October 18, 2006 - 8:38 am

Wow. Thanks for elucidating---again I was making some incorrect
assumptions about the system, so your answer was surprising and
appreciated.

So, am I correct in my understanding now that it's impossible for two
users to establish identical code history on both sides through merge?
If the two kept merging back and forth the history would pick up a new
commit each time even though there were no code changes. Right?

That's a startling property. I'm surprised to learn that the
generally-used mechanism for getting new changes doesn't have a mode
where it says "you're already up to date---doing nothing".

I do understand that there's a separate "pull" that does allow for
correct synchronization of a local repository with a remote
repository, and it does have the "up to date---doing nothing"
behavior. But as you already said, it's often avoided specifically
because it destroys locally-created revision numbers.

Another way of describing bzr's "pull" is that it establishes a
master-slave relationship between the remote and local repository,
(his numbers are more important than mine, so I'll throw mine away).
I think Linus already provided a good argument in this thread about
why that kind of asymmetry is bad for software projects and why tools
should not provide it.

So there are some aspects of the bzr design that rob from its ability
to function as a distributed version control system. It really does
bias itself toward centralization, (the so called "star topoloogy" as
opposed to something "fully" distributed).

And by the way, some people seem to have the opinion that there's
something unique about the way the linux kernel is developed that
allows is to benefit from a fully distributed system. The assumption
seems to be that projects with a central tree won't benefit the same
way, and don't really need the full set of features of a distributed
system. That's not true in my experience.

With cairo, for example, we had been using cvs. Obviously, it imposes
a centralized ...
From: Matthew D. Fuller
Date: Thursday, October 19, 2006 - 2:10 am

On Wed, Oct 18, 2006 at 08:38:24AM -0700 I heard the voice of

I think this has the causality backward.  It's avoided because it
changes the ancestry of the branch in question, by rearranging the
left parents; this ties into Linus' assertion that all parents ought
to be treated equally, which I'm beginning to think is the base
lynchpin of this whole dissension.


Without a differentiation of the parents, there's no such creature as
a "mainline" on a branch, so it's hard to find anything to base revnos
on from the get-go; the whole discussion becomes meaningless and
incomprehensible then.

With the differentiation, numbering along the leftmost 'mainline'
makes sense, and fits the way people tend to work.  "I did this, then
I did this, then I merged in Joe's stuff, then I did this", and the
numbering follows along that.  And as long as it's the same branch,
those revnos will always be the same; I can't go back and add
something in between my first and second commits.  THAT'S where revnos
are useful; referring to a point on given branch.


Certainly, they're of no (or extremely limited) use when referring to
_different_ branches.  And when you change the arrangement of parents
on a branch, you create a different branch.  That's why bzr (the
project, not the program) tends toward trunks that are merged into,
rather than ephemeral trunks that are merged from and then replaced
with the new trunk, and has its UI optimized by default for that case;
because the ordering of the parents IS considered important and to be
preserved.  Ancestry changes aren't avoided because it would screw up
the revnos; the revnos don't get screwed up because the ancestry
changes are avoided for their OWN sake, and it's BECAUSE of that
pre-existing tendancy that the revnos could come into being in the
first place.


If you need to refer to a specific revision in a vacuum, a revno is
the *WRONG* tool for the job.  Revnos exist to refer to points along a
branch.  And in cases where there's a meaningful ...
From: Andreas Ericsson
Date: Thursday, October 19, 2006 - 4:15 am

You, and others, keep saying "leftmost". What on earth does left or 
right have to do with anything? Or rather, how do you determine which 

So long as the given branch is, in git-speak, "master"? I think I'm 
starting to see how this would work, but I still fail to see how you can 
then come up with revnos such as 2343.1.14.7.19, since the only ones 
that seem to actually make any sense are the ones that track the 
strictly linear development.

In git, this can be accomplished by auto-tagging each update of any 
branch with a tag named numerically and incrementally, although no-one 
really bothers with it.

Let's say you have the following graph, where A is the root commit, B 
introduces the base for a couple of new features that three separate 
coders start to work on in their own repositories. The feature started 
on in D is logically coded as a two-stage change. F fixes a bug 
introduced in D. I is the result of an octopus merge of all three 
branches, where the three features are implemented and all bugs are 
fixed (this is btw by far the most common pattern we have in our repos 
here at work).

   A
   |
   B
  /|\
C |  D
| |  |\
| |  E F
| |  |/
| |  G
| H /
  \|/
   I

Now a couple of questions arise.
- How do I do to get to C, D, E, F, G and H?
- When these get merged, which one will be considered the "left" parent, 




I'm sure it's supported. The question is whether or not bazaar makes it 
easy for those developers to exchange valuable information (revids, 
since their revnos will be mixed up) so they can communicate detailed 
info about "commit X introduced a bug in foo_diddle(). I fixed it in 
commit Y, so if you merge it we can release". If revids are always 
printed anyways, I see even less need for revnos. If it's hard to get 
the revids I wouldn't consider the truly distributed workflow supported 
any more than I consider CVS file rename support 
From: Matthieu Moy
Date: Thursday, October 19, 2006 - 5:04 am

Not sure it's the same in git, but in bzr, a new revision is always
created by a commit (it can be "fetched" by other commands though). If
you "merge", then you have to commit after.

What people call "leftmost ancestor" is the revision which used to be
the tip at the time you commited. For example, if you do "bzr diff;
bzr commit" the diff shown before is the same as the one got with
"bzr diff -r last:1" right after the commit.

I believe this doesn't make a difference for merge algorithms, but in
the UI, it's here when you say, e.g.:

bzr diff -r last:12..before:revid:foo@bar-auents987aue

(once in "last:", and once in "before:")

-- 
Matthieu
-

From: Petr Baudis
Date: Thursday, October 19, 2006 - 5:33 am

Dear diary, on Thu, Oct 19, 2006 at 02:04:14PM CEST, I got a letter

The lack of parents ordering in Git is directly connected with
fast-forwarding.

Consider

 repo1   repo2

   a       a
  /       /
 b       c

Now repo2 merges with repo1:

 repo1   repo2

   a       a
  /       / \
 b       c   b
          \ /
           m

repo1 tip ('b') is not ancestor of repo2 tip ('c') so a three-way merge
is done and a new 'm' merge commit is created.

And now repo1 merges with repo2:

 repo1   repo2

   a       a
  / \     / \
 c   b   c   b
  \ /     \ /
   m       m

Because previous repo1 tip ('b') was ancestor of repo2 tip ('m'), a
fast-forward happenned and repo1 tip simply moved to 'm'. But this
"flipped" the development from repo1 POV - you cannot assume anymore
that the first ("leftmost") parent is special.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Matthieu Moy
Date: Thursday, October 19, 2006 - 6:44 am

Yes, bzr has similar thing too. AIUI, the difference is that git does
it automatically, while bzr has two commands in its UI, "merge" and
"pull".

In your case, the "leftmost ancestor" of m is b, because at the time
it was created, it was commited from b.

One problem with that approach is that from revision m and looking
backward in history (say, running "bzr log"), you have two ways to go
backward:

1) Take the history of _your_ commits, and your pull till the point
   where you've branched.

2) Follow the history taking the leftmost ancestor at each step.

In bzr, the notion of "branch" corresponds to a succession of
revisions, which are explicitely stored in a file (ls
.bzr/branch/revision-history), which is what commands like "log"
follow, and what is used for revision numbering. And this sucession of
revision must obey (at most) one of the above. In the past, it was 1),
which means that "pull" (i.e. fast-forward) was only adding revisions
to a branch. In your scenario, repo1 would get a revision history of
"a c m" while repo2 would have had "a b m" with the same tip.

Today, the revision history follows leftmost ancestor. One good
property of this is that revision history is unique for a given
revision. But the terrible drawback is that "pull" and "push" do not
/add/ revisions to your revision history, they rewrite the target one
with the source one. That means I can have

$ bzr log --line
1: some upstream stuff
2: started my work
3: continued my work

# upstream merges.

$ bzr pull
$ bzr log --line
1: some upstream stuff
2: some other upstream stuff ...
3: ... commited while I was working
4: merged from Matthieu this terrible feature

-- 
Matthieu -- definitely curious to give a real try to git ;-)
-

From: Carl Worth
Date: Thursday, October 19, 2006 - 9:03 am

Yes. We're identifying the core underlying technical difference behind
the recent discussion. Namely bzr treats one parent as special, (the
parent that was the branch tip previously). And this special treatment
eliminates the ability to fast-forward, adds merge commits that
wouldn't exist with fast forwarding, and is able to make its revision

There's a bit more to it than that though. The git command named
"pull" will perform a fast-forward if possible, but will create a
merge commit if necessary. For example:

	a       a                      a
	| pulls | and fast-forwards to |
	b       b                      b
	        |                      |
	        c                      c

whereas:

	a       a                       a
	| pulls | and creates a merge  / \
	b       c                     b   c
                                       \ /
                                        m

So I'm curious. What does bzr pull do in the case of divergence like

It should be mentioned that git can, (annoyingly not by default), save
a file detailing the history of a branch, (time a revision ID for
every time the branch tip moved). This is the "reflog" support and
provides the same information that bzr is encoding in its "leftmost
ancestor" branches.

Importantly, though, git's reflog is entirely local and is not

Uhm, don't you really have to follow both? And the only ambiguity is

OK. With git the two reflogs on the two machines would also have "a c
m" and "a b m". But is this the only kind of log that exists? If I
had code history as above and wanted to ask questions about what led
to commit m, then I would want to know about both b and c which
contribute to it.

And that's what "git log" provides. It lists all the commits that are
reachable from a given commit by following parent links. Surely bzr
has a way to view the complete history that way?

Meanwhile, I suggest that there really is no significance to which
parent of a commit used to have the branch head pointing at ...
From: Matthieu Moy
Date: Thursday, October 19, 2006 - 9:38 am

No.

bzr could trivially do fast-forward too. It's an explicit design

They don't exist either with "pull".

The difference between bzr and git is smaller than you think on this

The bzr command "pull" will do a fast-forward if possible, but will
refuse to continue and ask you to create the merge commit with other


Here, bzr will refuse to pull. It will say "branches have diverged"
and tell you to use merge.

Then, you'll do

$ bzr merge

# optionally "bzr status"

$ bzr commit -m "merged such or such thing"


So, "git pull" seems roughly equivalent to something like


Not yet. The "numbers will be changed" is if b pulls, right after.


Then, one other difference is in the UI. bzr shows you commits in a
kind of hierarchical maner, like (fictive example, that's not the real
exact format).

$ bzr log
commiter: upstream@maintainer.com
message:
  merged the work on a feature
  ------
  commiter: contributor@site.com
  message:
    prepared for feature X
  ------
  commiter: contributor@site.com
  message:
    implemented feature X
  ------
  commiter: contributor@site.com
  message:
    added testcase for feature X
------
commiter: upstream@maintainer.com
message:
  something else

No big difference in the model either, but it probably reveals a
different vision of what "history" means.

-- 
Matthieu
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 4:50 am

I have lost somewhere among many emails in this thread the email I 
wanted to reply to, the one mentioning for the first time the lack of 
parents ordering in GIT, but this one should do.



There are exactly _two_ places where Git treats first parent specially 
(correct me if I'm wrong).

First, <commit-ish>^ is shortcut for <commit-ish>^1, i.e. for first 
parent of commit. <commit-ish>~<n> is shortcut for <commit-ish>^^...^ 
(n-times '^'), which means that <commit-ish>~<n> is n-th parent in 
1st-parent lineage of <commit-ish>. But you can always use names
like for example next~12^2^^2~2.

Second, git-diff with only one <commit-ish> generates diff to first
parent. But you can always use '-c' or '-cc' combined diff format
or '-m' with default diff format to compare to _all_ parents.
-- 
Jakub Narebski
Poland
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 6:26 am

I stand corrected: git-diff refuses to show anything if provided
with only one commit, and commit has more than one parent. So it
does not reat first parent specially.
-- 
Jakub Narebski
Poland
-

From: Karl
Date: Thursday, October 19, 2006 - 4:27 am

Yes, it seems you have found the needle. :-) In git, history is a DAG;
a commit has a _set_ of parents, so by definition they are not
ordered. This has a number of consequences. For example, you can't
really answer the question "Which branch was this commit on?". All you
can say is that "This commit is reachable from (and therefore part of)
branches X, Y, and Z."

In all other SCMs I have seen, a "branch" is conceptually an ordered
series of commits (some of which may be merges). In git, a "branch" is
a pointer to a commit, period. The commit knows its set of parents, so
all its history is there, but there is fundamentally no way to tell
which branch a commit was "on" when it was created.

This is an important point; it means there is no concept of "my" or
"your" branch. Every participant is adding commits to the same DAG,
and may at any point decide to share her additions with someone else,
or keep them private forever. And because "branches" don't really
exist, every commit really is created equal.

Really, every commit. Not even the initial commit of a project is
special -- it's just a commit with an empty parent set. And, it's
perfectly possible to make a (merge) commit whose parents belong to
previously disconnected parts of the DAG. This of course means that
it's not even possible to differentiate commits based on which project
they're part of, since one can create a commit whose parents belong to
different projects. All commits are _really_ born equal! There's just
one great DAG of all git commits that could possibly exist. (This has
been done in git's own history; the graphical viewer gitk was
originally a separate project, with its own initial commit, but that
initial commit is now reachable from all commits currently being made
to git -- that is, it has been merged.)

This structure of things may seem complex, since it's different, but
mathematically it's quite simple, and that's what counts in the end if
you want to do nontrivial things.

-- 
Karl Hasselstr
From: Petr Baudis
Date: Thursday, October 19, 2006 - 4:46 am

Dear diary, on Thu, Oct 19, 2006 at 01:27:59PM CEST, I got a letter
where Karl Hasselstr
From: Matthew D. Fuller
Date: Thursday, October 19, 2006 - 9:01 am

On Thu, Oct 19, 2006 at 01:46:39PM +0200 I heard the voice of

By default, merge will refuse to do its thing if there are uncommitted
changes in the working tree, whether those changes are something
you've done, or the pending results of a previous merge.  A '--force'
arg to merge will make it go forward though, so yes, you can merge
multiple other branches in one merge if you want to.

Actually, I can kill 2 birds here.  Quick little bictopus merge:

% bzr log --show-ids
------------------------------------------------------------
revno: 2
revision-id: fullermd@over-yonder.net-20061019151856-c3b406b8bcdfb537
parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
parent: fullermd@over-yonder.net-20061019151800-2fe41e4949f5e237
parent: fullermd@over-yonder.net-20061019151807-3d7047e387edcad9
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: a
timestamp: Thu 2006-10-19 10:18:56 -0500
message:
  merge
    ------------------------------------------------------------
    revno: 1.2.1
    merged: fullermd@over-yonder.net-20061019151800-2fe41e4949f5e237
    parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
    committer: Matthew Fuller <fullermd@over-yonder.net>
    branch nick: b
    timestamp: Thu 2006-10-19 10:18:00 -0500
    message:
      bar
    ------------------------------------------------------------
    revno: 1.1.1
    merged: fullermd@over-yonder.net-20061019151807-3d7047e387edcad9
    parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
    committer: Matthew Fuller <fullermd@over-yonder.net>
    committer: Matthew Fuller <fullermd@over-yonder.net>
    branch nick: c
    timestamp: Thu 2006-10-19 10:18:07 -0500
    message:
      baz
------------------------------------------------------------
revno: 1
revision-id: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: a
timestamp: Thu 2006-10-19 10:14:37 -0500
message:
  ...
From: Matthew D. Fuller
Date: Thursday, October 19, 2006 - 10:06 am

On Thu, Oct 19, 2006 at 11:01:03AM -0500 I heard the voice of

Let me elaborate a little on this.



for the previously discussed merge, basically duplicating
'fast-forward' behavior.  It doesn't currently, but it could just as
well without disturbing the attributes it gains from assigning meaning
to the left-most parent.  The choice to create E is the result of an
independent decision from the choice to treat the left path as
special.


What the leftmost discussion impacts is the case of 

    a-.
    |\ \
    | b c
    |/ /
    D-'

vs

    a-.-.
     \ \ \
      b c |
     / / /
    D-'-'

Now, the branches are distinct to bzr, but they're not different.  If
you try to merge one from the other, merge will quite rightly tell you
there's nothing to do, since you both have all the same revs.  git
doesn't recognize the distinction at all, of course.  The difference
is mostly cosmetic.  But, it's a cosmetic difference that bzr devs
(and users, I venture) find _useful_, which is why it's fought for.
And everything else seems to follow from that.

If you don't think the distinction is meaningful or useful, you can
ignore it, and the tool should work just fine.  The main place the
distinction would show up is in the cosmetics of how "log" looks (and
probably similarly in any tool that graphically describes ancestry),
and a custom log output formatter could probably be very easily
written to obviate even that.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 8:35 pm

Right. You have to do it your way, because of the "simple revision 
numbers".

Which gets us back to where we started: "simple" is in the eye of the 
beholder. I personally think that git revision naming is a lot simpler, 
exactly because it doesn't impose arbitrary rules on users.

For example, what happens is that:
 - you like the simple revision numbers
 - that in turn means that you can never allow a mainline-merge to be done 
   by anybody else than the main maintainer
 - that in turn means that the whole situation is no longer distributed, 
   it's more like a "disconnected access to a central repository"

The "main trunk matters" mentality (which has deep roots in CVS - don't 
get me wrong, I don't think you're the first one to do this) is 
fundamentally antithetical to truly distributed system, because it 
basically assumes that some maintainer is "more important" than others. 

That special maintainer is the maintainer whose merge-trunk is followed, 
and whose revision numbers don't change when they are merged back. 

That may even be _true_ in many cases. But please do realize that it's a 
real issue, and that it has real impact - it does two things:

 - it impacts the technology and workflow directly itself: "pull" and 
   "merge" are different: a central maintainer would tend to do a "merge", 
   and one more in the outskirts would tend to do more of a "pull", 
   expecting his work to then be merged back to the "trunk" at some later 
   point)

 - it will result in _psychological_ damage, in the sense that there's 
   always one group that is the "trunk" group, and while you can pass the 
   baton around (like the perl people do), it's always clear who sits 
   centrally.

Maybe this is fine. It's certainly how most projects tend to work. 

I'll just point out that one of my design goals for git was to make every 
single repository 100% equal. That means that there MUST NOT be a "trunk", 
or a special line of development. There is no "vendor branch". ...
From: Aaron Bentley
Date: Wednesday, October 18, 2006 - 8:10 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


That's not true of bzr development.  The "main maintainer" that runs the
bzr.dev is an email bot.  It's not an integrator-- its work is purely
mechanical.  It can't resolve merge conflicts.

Most of the merge work is done in integration branches run by the core
developers.  Although Martin is our project leader, lays out ground
rules, and makes design decisions, he doesn't have to be involved in any

Linus, if you got hit by a bus, it would still be a shock, and it would
still take time for the Linux world to recover.  Your insights and
talent, both technical and social, make you the most important kernel
developer.  And it stays that way because you deserve it.  Projects with
good leadership don't fork, or if they do, the fork withers and dies
pretty quickly.

It is fine to say all branches are equal from a technical perspective.
- From a social perspective, it's just not true.

The scale of Bazaar development is much smaller than the scale of kernel
development, so it doesn't make sense to maintain long-term divergent
branches like the mm tree.  We do occasionally have long-lived feature



As I mentioned earlier, there are four people who each run their own

I think you're implying that on a technical level, bzr doesn't support
this.  But it does.  Every published repository has unique identifiers
for every revision on its mainline, and it's exceedingly uncommon for
these to change.  There are special procedures to maintain bzr.dev, but
there's nothing technically unique about it.  People develop against
bzr.dev rather than my integration branch, because they have
non-technical reasons for wanting their changes to be merged into

On an actively-developed bzr branch, the first parent *is* special:
- - it's a revision that you committed
- - the diff between a revision and its first parent is the same as the

I don't think your analysis holds together completely, because all
actively-maintained branches have very stable ...
From: Carl Worth
Date: Wednesday, October 18, 2006 - 10:21 pm

That's actually a very important insight, but supporting the wrong
conclusion.

In a healthy situation, the only thing that makes a branch special are
social issues, such as you describe. That's how it should be.

But think about your favorite example of an unhealthy social situation
around a software project and a big, nasty fork. Every example I can
think of involves some technical distinction that makes one branch
more special than another.

Now, those situations also involve social problems, and those are even
more significant. But the technical blessing of one branch does not
help. And I think it contributes to the social problems in many cases.

So, I think the technical thing that is distributed version control is
an extremely important thing for us to use to help maintain healthy
social software projects. Reducing the technical hurdle of a fork, (to
where continual forking is actually a totally expected part of the
process), is a very healthy thing.

Now, both bzr and git are distributed systems, and either one will
help a great deal in the respects I'm talking about compared to
something like cvs.

As far as the revision numbers, my impression is that the numbers
would be confusing or worthless if I were to use bzr the way I'm

Which just says to me that the bzr developers really are sticking to a
centralized model. That's fine, but it does have impacts, and the tool

Every argument you make for the number change being uncommon just
strengthens the argument that it will be all that more
confusing/frustrating when the numbers do change.

In cairo, for example, we've made a habit of including a revision
identifier in our bug tracking system for every commit that resolves a
bug. I like having the assurance that those numbers will survive
forever. And it doesn't matter if the repository moves, or the project
is forked, or anything else. Those numbers cannot change.

I understand that bzr also has unique identifiers, but it sounds like
the tools try to hide ...
From: Martin Pool
Date: Wednesday, October 18, 2006 - 10:56 pm

There is a mix of 

 - Just giving the overall tarball version number, which is most 
   meaningful to users (and not related to bzr versions)

 - Giving a mainline revision number, which will never revert because we
   never pull (fast-forward) that branch.  That has the substantial
   (imo) benefit that you can immediately compare these numbers by eye,
   and they are easy to quote.

 - Giving a unique id, which is obviously most definitive and
   appropriate if you're talking about something which is not 
   on the mainline or a well known branch.  The launchpad.net 
   bug tracker links branches to bugs and does this through 
   revision ids.

-- 
Martin
-

From: Aaron Bentley
Date: Thursday, October 19, 2006 - 7:58 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I'm not as familiar with those details.  The one fork that I know a lot
about, when Baz (the old Bazaar architecture) forked off from Arch,
showed me that for each developer branch, one branch must be special.

This is just because it is hard to maintain a branch that applies
cleanly to two diverging codebases.  So each developer must develop
against the fork that they want to merge their code into.  If they want
their code to be applied to the other fork, someone must port it.

So I really do feel that special branches are inescapable.

With bzr, you have the freedom to choose which branch you consider
special, and change your mind at any time.  There are no technical

They would remain stable if you only used pull to update your origin

I don't see why you're reaching that conclusion.  I'd like to understand
that better, because Linus seems to be concluding the same thing, and it

That doesn't follow.  Just because something is arguably true doesn't
make it bad.  And in this case, I'm not arguing that it's true, I'm

We do it the other way around: we put a bug number in the commit
message.  And I personally have been developing a bugtracker that is
distributed in the same way bzr is; it stores bug data in the source

Yes, we put revnos in our bug trackers.  No, we can't prove that they
will always be valid.  But there are significant disincentives to
changing them, so I am quite comfortable assuming they will not change.
 And the older a revno gets, the less likely it is to change.

On the other hand, I think your revision identifiers are not as
permanent as you think.

In the first place, it seems fairly common in the Git community to
rebase.  This process throws away old revisions and creates new
revisions that are morally equivalent[1].  I don't know whether Git
fetches unreferenced revisions, but bzr's policy is to fetch only
revisions referenced in the ancestry DAG of the branch.

In the second place, one must ...
From: Carl Worth
Date: Thursday, October 19, 2006 - 9:59 am

First, I want to point out that I think we're having a delightfully
enlightening conversation here, and I'm glad for that.

Let me provide a couple of hypothetical situations to try to
demonstrate my thinking here. The first is far-fetched but perhaps
easier to understand the implications. But the second is the real,
everyday situation that is much more important.

Far-fetched
-----------
Let's imagine there's a complete fork in the bzr codebase tomorrow. We
need not suppose any acrimony, just an amiable split as two subsets of
the team start taking the code in different directions.

Now, at the time of the fork, all published revision numbers apply
equally well to either team's codebase, (obviously, since they are
identical). But as the projects diverge they each start publishing
revision numbers with respect to their own repositories in their own
bug trackers, etc. Obviously, each project has its own "mainline" so
these new revision numbers are only unique within each project and not
between the two.

Time passes...

Finally the two teams (who had remained good friends after the
breakup) find a unifying theory that will let them work on a single
tool that will meet the needs of both user bases. So they want to
merge their code together.

After the merge, there can be only one mainline, so one team or the
other will have to concede to give up the numbers they had generated
and published during the fork. That is, the numbers will not be usable
within the new, merged repository.

Everyday
--------
Now, the above scenario is just silly. It's not likely to ever happen,
so it's really not worth considering as a motivating case.

But, what does (and should) happen everyday is exactly the same. So
here's a realistic situation that is worth considering:

An individual takes the bzr codebase and starts working on it. It's
experimental stuff, so it's not pushed back into the central
repository yet. But our coder isn't a total recluse, so his friends
help him with the code ...
From: Aaron Bentley
Date: Thursday, October 19, 2006 - 4:01 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



I don't think this is true.  The abandoned mainline does not need to be
destroyed.  It can be kept at the same location that it always was, with
the numbers that it always had.  So the number + URL combo stays
meaningful.  Additionally, the new mainline can keep a mirror of the
abandoned mainline in its repository, because there are virtually no

They certainly can.

The coder says "I've put up a branch at http://example.com/bzr/feature.
 In revision 5, I started work on feature A.  I finished work in
revision 6.  But then I had to fix a related bug in revision 7."

As long as that coder is active, they'll keep their repository at the
same location.  And because branches are cheap (even cheaper than
delta-compressed revisions), there's no reason to delete old branches.

This is true, but his code is likely to all land in the mainline at
once.  Since his own revnos are more fine-grained, he's not likely want


I felt that you were mischaracterizing my _statement_ that "it's
exceedingly uncommon for [revnos] to change" as an _argument_ "it's
exceedingly uncommon for [revnos] to change".  The reality is that we
keep saying revnos don't change because git users keep saying "but what

If you're interested, it's called "Bugs Everywhere" and it's available here:
http://panoramicfeedback.com/opensource/


So actually, not all branches are treated equally by Git users.  Public
branches are treated as append-only, but private branches are treated as


Same here.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOAPm0F+nu1YWqI0RAhkdAJ9InxuEjbToGQU2AOJmfZw124Lb2wCeMmDC
9w08eZbmL19FfVQmtpPcYkQ=
=AmGo
-----END PGP SIGNATURE-----
-

From: Carl Worth
Date: Thursday, October 19, 2006 - 4:42 pm

Sure that's possible, but it gets rather unwieldy the more
repositories you have involved. I've been arguing that bzr really does
encourage centralized, not distributed development, and you were having
trouble seeing how I came to that conclusion. Do you see how "maintain
an independent URL namespace for every distributed branch" doesn't

And this part I don't understand. I can understand the mainline
storing the revisions, but I don't understand how it could make them
accessible by the published revision numbers of the "abandoned"


...which is what you just said there yourself.

On the other hand, git names really do live forever, regardless of
where the code is hosted or how it moves around. When I'm talking
about historical stability, I'm talking about being able to publish
numbers that live forever.

It sounds like bzr has numbers like this inside it, (but not nearly as
simple as the ones that git has), but that users aren't in the
practice of communicating with them. Instead, users communicate with
the unstable numbers. And that's a shame from an historical

What I'd like to be able to do, is advertise a temporary repository,
and while using it, publish names for revisions that will still be
valid when the code gets pushed out to the mainline. That is
supporting distributed development, and everything I'm hearing says

OK.

The original claim that sparked the discussion was that bzr has a
"simple namespace" while git does not. We've been talking for quite a
while here, and I still don't fully understand how these numbers are
generated or what I can expect to happen to the numbers associated
with a given revision as that revision moves from one repository to
another. It's really not a simple scheme.

Meanwhile, I have been arguing that the "simple" revision numbers that
bzr advertises have restrictions on their utility, (they can only be
used with reference to a specific repository, or with reference to
another that treats it as canonical). I _think_ I understand ...
From: Aaron Bentley
Date: Thursday, October 19, 2006 - 6:06 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I understand your argument now.  It's nothing to do with numbers per se,

I meant that the active branch and a mirror of the abandoned branch
could be stored in the same repository, for ease of access.

Bazaar encourages you to stick lots and lots of branches in your
repository.  They don't even have to be related.  For example, my repo

I can see where you're coming from, but to me, the trade-off seems
worthwhile.  Because historical data gets less and less valuable the
older it gets.  By the time the URL for a branch goes dark, there's

When you create a new branch from scratch, the number starts at zero.
If you copy a branch, you copy its number, too.

Every time you commit, the number is incremented.  If you pull, your
numbers are adjusted to be identical to those of the branch you pulled from.


Sure.  It's the "favors centralization" thing that I don't agree with,

In my experience, users who don't understand distributed systems don't


What's nice is being able see the revno 753 and knowing that "diff -r
752..753" will show the changes it introduced.  Checking the revo on a
branch mirror and knowing how out-of-date it is.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOCEf0F+nu1YWqI0RAhgtAJwK4jkWFjjF2iHJb1VyXqgszsHElACff2U7
olZJiAED80tIS6kgkqFsJps=
=BkRZ
-----END PGP SIGNATURE-----
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 10:05 pm

I don't know if that is what Carl's problem is, but yes, to somebody from 
the git world, it's totally insane to have the _same_ commit have ten 
different names just depending on which branch is was in.

In git-land, the name of a commit is the same in every branch.

Do you have something like

	gitk --all

in your graphical viewers? That one shows _all_ the branches of a 
repository, and how they relate to each other in git. How do you name your 
commits in such a viewer, since every branch has a _different_ name for 
the same commit?

			Linus
-

From: Lachlan Patrick
Date: Friday, October 20, 2006 - 12:47 am

I've been following the git-vs-bzr discussion, and I'd like to ask a
question (being new to both bzr and git). How does git disambiguate SHA1
hash collisions? I think git has an alternative way to name revisions
(can someone please explain it in more detail, I've seen <ref>~<n>
mentioned only in passing in this thread). It seems to me collisions are
a good argument in favour of having two independent naming schemes, so
that you're not solely relying on hashes being unique.

A strong argument is that a global namespace based on hashes of data is
ideal because the names are generated from the data being named, and
therefore are immutable. Same data => same name for that data, always
and forever, which is desirable when merging named data from many
sources. But the converse isn't true: one name does not necessarily map
to only that data. Have I misunderstood? Is this a problem?

Ta,
Loki
-

From: Johannes Schindelin
Date: Friday, October 20, 2006 - 1:38 am

Hi,


It does not. You can fully expect the universe to go down before that 
happens.

The only reasonable worry is about SHA-1 being broken some time in future, 
i.e. being able to construct a malign version of some source code _which 
has the same hash_. There were plenty of discussions about that; Please 
search the mailing list. (The consent was that those do not matter, 
because an existing object will _never_ be overwritten by a fetch, so you 
would not get that invalid object anyway.)

Hth,
Dscho


-

From: Petr Baudis
Date: Friday, October 20, 2006 - 3:13 am

Hi,

Dear diary, on Fri, Oct 20, 2006 at 10:38:48AM CEST, I got a letter

  well, that's somewhat a bold statement, since when you have a way to
fabricate malicious objects, you probably can socially engineer to have
it distributed to a large portion of repositories if you try hard
enough. Or you hack kernel.org and replace the object. Who knows.

  But the thing is that noone has come any closer to this kind of attack
at all. Currently known attacks are that you can relatively fast (which
doesn't mean "5 minutes"; I think that in case of SHA1 the complexity is
still huge, just smaller than intended, but I may remember wrong; you
can get a MD5 collision of this kind within one minute on a standard
notebook) create a _pair_ of objects sharing the same hash, where both
objects contain a big binary blob. So you would first have to engineer
to have one of those objects accepted officially, then engineer the
malicious one getting in. Generating an object that hashes to a
predetermined value is much harder problem and AFAIK there's no much
progress in breaking this.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Petr Baudis
Date: Friday, October 20, 2006 - 3:16 am

Dear diary, on Fri, Oct 20, 2006 at 09:47:16AM CEST, I got a letter

This is just a notion that lets you point to revisions relative to a
given id. <id>~<n> means n-th ancestor of the given commit.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 2:57 am

If you want pretty name, you tag it. Tags are exchanged during 
fetch/push operation. And you can have pretty names of revisions
> 752..753" will show the changes it introduced. 
From: Matthieu Moy
Date: Friday, October 20, 2006 - 3:02 am

How does git chose which ancestor to use if this revision has more
than one in this case?

-- 
Matthieu
-

From: Andy Whitcroft
Date: Friday, October 20, 2006 - 3:45 am

Well if there is more than one parent, then there are more than one
diff.  For instance this is a merge commit which I asked to 'see'.

This gets shown in the combined diff format, showing the results of the
conflict resolution.

diff --cc this
index fbbafbf,10c8337..43b7af0
--- a/this
+++ b/this
@@@ -1,3 -1,3 +1,4 @@@
  1
+ 2a
 +2b
  3

If you want to know each individual diff in a more 'standard' form you
can ask about the parents specifically.

apw@pinky$ git diff HEAD^1..
diff --git a/this b/this
index fbbafbf..43b7af0 100644
--- a/this
+++ b/this
@@ -1,3 +1,4 @@
 1
+2a
 2b
 3

apw@pinky$ git diff HEAD^2..
diff --git a/bar b/bar
new file mode 100644
index 0000000..8dc5f23
--- /dev/null
+++ b/bar
@@ -0,0 +1 @@
+this that other
diff --git a/this b/this
index 10c8337..43b7af0 100644
--- a/this
+++ b/this
@@ -1,3 +1,4 @@
 1
 2a
+2b
 3
-

From: James Henstridge
Date: Friday, October 20, 2006 - 3:45 am

If a revision has multiple parents, what does it diff against in this
case?  Do you get one diff against each parent revision?

James.
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 5:01 am

If revision has multiple parents (is merge commit), git-diff
(which is used by git-show) does not show differences (unless you
give two revisions in git-diff case).

You can either use '-m' option to show differences from all its
parents, or '-c'/'--cc' to show combined diff ('--cc' shows more
compact diff).
-- 
Jakub Narebski
Poland
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 4:00 am

> repository. 
From: Jeff King
Date: Friday, October 20, 2006 - 7:12 am

I was accustomed to doing such things in CVS, but I find the git way
much more pleasant, since I don't have to do any arithmetic:
  diff d8a60^..d8a60
(Yes, I am capable of performing subtraction in my head, but I find that
a "parent-of" operator matches my cognitive model better, especially
when you get into things like d8a60^2~3).

Does bzr have a similar shorthand for mentioning relative commits?

-Peff
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 7:40 am

By the way "diff d8a60" also works (unless d8a60 is merge commit, in

By the way, git has the following extended SHA1 syntax for <commit-ish>
(documented in git-rev-parse(1)):
 * full SHA1 (40-chars hexadecimal string) or abbreviation unique for
   repository
 * symbolic ref name. E.g. 'master' typically means commit object referenced
   by $GIT_DIR/refs/heads/master; 'v1.4.1' means commit object referenced
   [indirectly] by $GIT_DIR/refs/tags/v1.4.1. You can say 'heads/master'
   and 'tags/master' if you have both head (branch) and tag named 'master',
   but don't do that. HEAD means current branch (and is usually default).
 * <ref>@{<date>} or <ref>@{<n>} to specify value of <ref> (usually branch)
   at given point of time, or n changes to ref back. Available only if you
   have reflog for given ref.
 * <commit-ish>^<n> means n-th parent of given revision. <commit-ish>^0
   means commit itself. <commit-ish>^ is a shortcut for <commit-ish>^1.
   <commit-ish>~<n> is shortcut for <commit-ish>^^..^ with n*'^', for
   example rev~3 is equivalent to rev^^^, which in turn is equivalent
   to rev^1^1^1

Additionally it has following undocumented extended SHA1 syntax to refer
to trees (directories) and blobs (file contents)
 * <revision>:<filename> gives SHA1 of tree or blob at given revision
 * :<stage>:<filename> (I think for blobs only) gives SHA1 for different
   versions of file during unresolved merge conflict.

I'm not enumerating here all the ways to specify part of DAG of history,
except that it includes "A ^B" meaning "all from A", "exclude all from B",
"B..A" meaning "^B A", "A...B" meaning "A B --not $(git merge-base A B)",
and of course "A -- path" meaning "all from A", "limit to changes in path".

What about _your_ SMC? ;-)
-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Friday, October 20, 2006 - 7:52 am

Hi,


I could be wrong, but I have the impression (even after actually testing 
it) that "git diff d8a60" is equivalent to "git diff d8a60..HEAD", _not_ 
"git diff d8a60^..d8a60".

IIRC we had a "-p" flag to denote "parent" once upon a time, but that no 
longer works...

"git-show" is definitely what you want.

Ciao,
Dscho

-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 8:34 am

Ooops, I mixed git-diff-tree (which behaves as mentioned above) with
git-diff, which according to documentation compares with working tree
(and not HEAD) if only one <tree-ish> is given.

git-diff(1):
       ?  When  one  <tree-ish>  is given, the working tree and the named tree are
          compared, using git-diff-index. The option --cached can be given to com-
          pare the index file and the named tree.

git-diff-tree(1):
       If there is only one <tree-ish> given, the commit is compared with its par-
       ents (see --stdin below).
-- 
Jakub Narebski
Poland
-

From: Aaron Bentley
Date: Saturday, October 21, 2006 - 10:57 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Yes, you could e.g. do:

bzr diff -r before:753..753

Aaron

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOl9s0F+nu1YWqI0RAhW7AJ4vi4kgen/8h6j2AgueU+kcsmLrPwCeKry9
pp68K4rAmXjjkPvK32LvmPk=
=qDn2
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 11:20 am

What about grandparent of commit (d8a60^^ or d8a60~2 in git),
or choosing one of the parents in merge commit (d8a60^2 is second
parent of a commit)? before:before:753 ?

-- 
Jakub Narebski
Poland
-

From: Matthieu Moy
Date: Sunday, October 22, 2006 - 7:27 am

Yes, "before:" can take any revision specifier, including
"before:something-else".

-- 
Matthieu
-

From: Carl Worth
Date: Friday, October 20, 2006 - 2:48 pm

Well, I'm glad to know we each feel like we are communicating at

The entire discussion is about how to name things in a distributed
system. The premise that Linus has put forth in a very compelling way,
is that attempting to use sequential numbers for names in a
distributed system will break down. The breakdown could be that the
names are not stable, or that the system is used in a centralized way
to avoid the instability of the names.

Now, that causality might not accurately describe the way bzr has
developed. It may be that the centralization bias was determined by
other reasons, and that given those, using sequential numbers for
names makes perfect sense.

But it really is fundamental and unavoidable that sequential numbers

Granted, everything can be stored in one repository. But that still
doesn't change what I was trying to say with my example. One of the
repositories would "win" (the names it published during the fork would
still be valid). And the other repository would "lose" (the names it
published would be not valid anymore). Right?

Now, maybe there's some "simple" mapping from old names to new names
for the losing repository, (something like adding a prefix of
"losers/" to the beginning of the names or something or adding a "15."
prefix or whatever). The point is that the old names are
invalidated. And there's no way to guarantee this kind of change won't
happen in the future, (no matter how old a project is).

I constructed that example to show that the naming has a social impact
in forcing a distinction between winners and losers in the merge, (or
mainline and side branch, or whatever you want to name the
distinction). The two re-joining projects could be really amiable,
create a new virgin mainline and treat both histories as side
branches. In this version, everyone loses as all the old names are

Git allows this just fine. And lots of branches belonging to a single
project is definitely the common usage. It is not common (nor
encouraged) for unrelated ...
From: Matthew D. Fuller
Date: Saturday, October 21, 2006 - 6:01 am

On Fri, Oct 20, 2006 at 02:48:52PM -0700 I heard the voice of

I think we're getting into scratched-record-mode on this.


Git: Revnos aren't globally unique or persistent.

Bzr: Yes, we know.

G: Therefore they're useless.

B: No, they're very useful in [situation] and [situation], and we deal
   with [situation] all the time, and they work great for that.

G: But they fall apart totally in [situation].

B: Yes, so use revids there.

G: So use revids everywhere.

B: Revnos are handier tools for [situation] and [situation] for
   [reason] and [reason].

*brrrrrrrrrrrrrrrrip!!!*    *skip back to start*


I'm not sure there's any unturned stone left along this line, so I'm
not sure how productive it really is to keep walking down it.  So, to
make something productive of it, I'm going to put it onto my todo list
to spend some time with bzr trying to use revids for stuff.  I'm
fairly certain that, due to the bzr cultural tendancy to use revnos
where possible, there are some rough edges in the UI when using revids
that should be filed down (though I think it much less likely to turn

I think it's more accurately describable as a branch-identity bias.
The git claim seems to be that the two statements are identical, but I

The term is somewhat overloaded, which is why it's causing you trouble
(and did me).  It refers both to the conceptual entity ("a line of
development" roughly, much like what 'branch' means in git and VCS in
general), and to the physical location (directory, URL) where that
branch is stored, and where it'll often have a working tree.  Branches


Then all branches stored under that 'bzrtest' dir will use the
bzrtest/.bzr/ dir for storing the revisions, and shared revisions will
only exist once saving the space/time for multiple copies.

Probably, you'd actually want 'init-repo --trees' in this case,
because repos default to being [working]tree-less.  In a tree-less
setup, you'd create a [lightweight] checkout of the branch(es) you
wanted to work on ...
From: Jakub Narebski
Date: Saturday, October 21, 2006 - 7:08 am

Dnia sobota 21. pa
From: zindar
Date: Saturday, October 21, 2006 - 9:31 am

This is wrong. There are two kinds of checkouts
lightweight.. and "normal/heavyweight".

I think you are getting this alittle wrong, and I think the reason is
that you are thinking of repositories, while in bzr you normally think
of branches.

For example, I think (correct me if I'm wrong) that if I have a git
repository of a upstream linux-repo (Linus' for example).  I guess
I'll use "pull" to keep my copy up to date with the upstream repo? If
I then would like to hack something special, I would "clone" the repo
and get a new repo and that's where I do my work.  Is that correct?

In bzr you never (well...)  clone a full repository, but you clone one
line-of-development (a branch).  So "bzr branch"  is always a
"one-branch-only "clone" in git or cg".

"bzr checkout" is a "bzr branch" followed by a setting saying
"whenever you commit here, commit in the master branch also".

"bzr checkout --lightweight" is a way to get only a snapshot of the
working tree out of a branch. Whenever you commit, it's done in the
remote branch.

/Erik
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 9:59 am

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 10:41 am

Note: instead of symlinking .git/objects/ objects database,
you can simply set and export GIT_OBJECT_DIRECTORY environment
variable.

-- 
Jakub Narebski
Poland
-

From: Matthew D. Fuller
Date: Saturday, October 21, 2006 - 11:11 am

On Sat, Oct 21, 2006 at 04:08:18PM +0200 I heard the voice of


This is obviously some new meaning of "centralization" bearing no
resemblance whatsoever to how I understand the word.

In git, apparently, you don't give a crap about a branch's identity
(alternately expressible as "it has none"), and so you throw it away
all the time.  Given that, revnos even if git had them would never be
of ANY use to you, so it's no wonder you have no use for the notion.

I DO give a crap about my branchs' identities.  I WANT them to retain
them.  If I have 8 branches, they have 8 identities.  When I merge one
into another, I don't WANT it to lose its identity.  When I merge a
branch that's a strict superset of second into that second, I don't
WANT the second branch to turn into a copy of the first.  If I wanted
that, I'd just use the second branch, or make another copy of it.  I
don't WANT to copy it.  I just want to merge the changes in, and keep
on with my branch's current identity.

Maybe that's what you mean by 'centralization'; each branch is central
to itself.  That seems a pretty useless definition, though.  In my
mind, actually, it's MORE distributed; my branch remains my branch,
and your branch remains your branch, and the difference doesn't keep
us from working together and moving changes back and forth.  Forcing
my branch to become your branch sounds a lot more "centralized" to me.


Now, we can discuss THAT distinction.  I'm not _opposed_ to git's
model per se, and I can think of a lot of cases where it's be really
handy.  But those aren't most of my cases.  And as long as we don't
agree on branch identity, it's completely pointless to keep yakking
about revnos, because they're a direct CONSEQUENCE of that difference
in mental model.  See?  They're an EFFECT, not a CAUSE.  If bzr didn't
have revnos, I'd STILL want my branch to keep its identity.  You could
name the mainline revisions after COLORS if you wanted, and I'd still
want my branch to keep its identity.  Aren't we ...
From: Jeff King
Date: Saturday, October 21, 2006 - 12:19 pm

OK, let's discuss. :)

I think the concept of "my" branch doesn't make any sense in git.
Everyone is working collectively on a DAG of the history, and we all
have pointers into the DAG. Something is "my" branch in the sense that I
have a repository with a pointer into the DAG, but then again, so do N
other people. I control my pointer, but that's it.

So don't think of it as "git throws away branch identity" as much as
"git never cared about branch identity in the first place, and doesn't
think it's relevant."

Now, there are presumably advantages and disadvantages to these
approaches. I like the fact that I can prepare a repository from
scratch, import it from cvs, copy it, push it, or do whatever I like,
and the end result is always exactly the same (revids included). With
your model, on the other hand, it seems the advantages are that in many


The difference, I think, is that it's easier in git to move the upstream
around: you simply start fetching from a different place. I'm not clear
on how that works in bzr (if it invalidates revnos or has other side
effects).

-Peff
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 12:30 pm

That's good example of fully distributed approach. I can fetch directly
(actually, I cannot) from Junio private repository, I can fetch from
public git.git repository, either using git:// or http:// protocol,
I can fetch from somebody else clone of git repository: intermixing
those fetches, and revids (commit-ids) remain constant and unchanged.
-- 
Jakub Narebski
Poland
-

From: Jan Hudec
Date: Saturday, October 21, 2006 - 12:47 pm

Moving upstram around does not invalidate revnos. Switching to different
upstream (ie. the head revisions are different) does. And this may
happen by doing a merge with the previous mainline as non-first parent
-- revnos are simply short aliases for revids, not persistent unique

So they (revids) do in bzr.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: Linus Torvalds
Date: Saturday, October 21, 2006 - 12:55 pm

This is nice for a couple of situations:

 - if some particular machine is down, nobody really cares. It doesn't 
   really change the workflow at all if "master.kernel.org" were to be 
   off-line due to some trouble - it just happens to be a machine with 
   good bandwidth that a number of kernel (and git) developers have access 
   to, but if you want to sync with something else, go wild. We could just 
   sync directly between developers, although most people tend to have 
   firewalls (I certainly have a very anal one - not even ssh gets in) 
   making it usually easier to go through some - any - public place.

   But in git, the "public place" really is just an intermediary. It has 
   nothing to do with anything history-wise, and it's revision ID's are a 
   non-issue. It's just a temporary staging area (although re-using the 
   same repo over and over for pushing things out obviously means you can 
   do just incremental updates, so most everybody does that)

 - sometimes you have multiple branches in the same tree that have very 
   _different_ sources. For example, you might start out cloning my tree, 
   but if you _also_ want to track the stable tree, you just do so: you 
   can just do

	git fetch <repo> <remote-branch-name>:<local-branch-name>

   at any time, and you now have a new branch that tracks a different 
   repository entirely (to make it easier to keep track of them, you'd 
   probably want to make note of this in your .config file or your remote 
   tracking data, but that's a small "usability detail", not a real 
   conceptual issue).

 - the same "multi-source" thing is true for pushing things out too, not 
   just fetching: I still have my personal git.git repository on 
   kernel.org for historical reasons, even though Junio maintains the 
   normal one. So when I did some experimental (and broken) stuff for "git 
   unpack-objects" in a local branch, and others were interested in fixing 
   it, I just pushed it out to my git repo as a ...
From: Jakub Narebski
Date: Saturday, October 21, 2006 - 1:19 pm

Linus Torvalds wrote:
> 
From: Matthew D. Fuller
Date: Saturday, October 21, 2006 - 2:46 pm

On Sat, Oct 21, 2006 at 03:19:49PM -0400 I heard the voice of

This is as I understand it.


But in my mind, it does make sense.  I fundamentally DO think of "my
commits" differently from "revisions I've merged", and I want the tool
to preserve that for me.  "My commits" tend to be steps along a path,
"merges" tend to be completed paths.  I usually use bzr's "log
--short" for looking at logs, which doesn't show merged revs at all.
That works, because most of the time I don't care about them; I know
if I merged something, it's a completed piece, which I described in
the log message; it's not a PART of a task like my commits usually
are.  So, just the message for my merge rev tells me what I need to
know, and if I need to drill down into it, I can use the regular
(--long) log output to look at the revision in it.  This lets me know,
for instance, that if I want to re-check something I did 3 commits
ago, and I just merged another branch, the commit I'm interested in is
the 4th commit back on the mainline; I don't need to grub through a
bunch of revisions that aren't mine to try and find it.

So, if me and Bob are working on different bits of the same project in
parallel, finish up, and merge back and forth to sync up (ignoring for
the moment the "empty merge commit" bit), even though we now both have
the 'same' stuff, we have the same head rev with all the same parents,
the parents are in a different order, and my 'mainline' (the path of
left-most parents, or 'first' as I understand git calls them) is
different than his; my mainline is my commits, his mainline is his.
If one of us were to 'pull' the other, our branch would become a
duplicate of his and so adopt his 'mainline', which we want to avoid
because then it doesn't fit the mental model of "what I did", which is
what I think of my branch as.


Obviously, this is a totally foreign mentality to git, and that's
great because it seems to work for you.  I can see advantages to it,
and I can conceive of situations where I ...
From: Sean
Date: Saturday, October 21, 2006 - 3:06 pm

On Sat, 21 Oct 2006 16:46:29 -0500

It's not completely foreign, it's one of the things you can use the
git reflog feature to record.  It's just that it's utterly clear in
Git that this is a local feature and is never replicated as part

This is where the git model is clearly superior and allows a true
distributed model.  Because there is no concept of a "mainline"
(except locally via reflog) you can always merge with anyone
participating in the DAG without having to overwrite or lose ordering.

Sean
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 3:25 pm

Matthew D. Fuller wrote:

> great because it seems to work for you. 
From: Jeff Licquia
Date: Saturday, October 21, 2006 - 4:42 pm

I don't think so.  Recently, I've been trying to track a particular
patch in the kernel.  It was done as a series of commits, and probably
would have been its own branch in bzr, but when I was trying to group
the commits together to analyze them as a group, the easiest way to do
that was by the original committer's name.

Now, there's probably a better way to hunt that stuff down, but in this
case hunting the user down worked for me.  (It may have made a
difference that I was using gitweb instead of a local clone.)

And the case of hunting down your own commits is just a degenerate case
of hunting down someone else's.

-

From: Carl Worth
Date: Saturday, October 21, 2006 - 4:49 pm

As far as "its own branch in bzr" would such a branch remain available

Vast, huge, gaping, cosmic difference.

Almost none of the power of git is exposed by gitweb. It's really not
worth comparing. (Now a gitweb-alike that provided all the kinds of
very easy browsing and filtering of the history like gitk and git
might be nice to have.)

-Carl
From: Jeff Licquia
Date: Saturday, October 21, 2006 - 5:07 pm

Yes, in the sense that you can recreate the branch by using that
branch's last commit.  But not in the git sense that there's a branch ID
pointing at the commit in question.

You know what?  It occurs to me that much of the problem with git
branches vs. bzr branches might be solved when bzr gets proper tagging
support.  Because, after all, aren't branches more like special tags in

So, very probably, I would have had a far easier time of it if I had
been able to really use git to do the work, instead of gitweb.

I still don't think, though, that it's a sign of a small project to be
concerned about one's own branches more than others.

-

From: Linus Torvalds
Date: Saturday, October 21, 2006 - 5:47 pm

Both branches _and_ tags in git are 100% the same thing: they're just 
shorthand for the commit name. That's _literally_ all they are. They are a 
symbolic name for a 160-bit SHA1 hash.

So yes, you can say that branches are like special tags, or that 
(unsigned) tags are like special branches. There's no real "technical" 
difference: in both cases, it's just an arbitrary name for the top commit.

However, there are some purely UI differences between tags and branches, 
which really don't affect any of the "name->SHA1" translation at all, but 
which affect how you can _use_ a tag-name vs a branch-name.

 - A branch is always a pointer to a _commit_ object.

   In contrast, a tag can point to anything. It can point to a tree (and 
   that means that you can do _diff_ between a tag and a branch, but such 
   a tree doesn't have any "history" associated with it - it's purely 
   about a certain "state", so you cannot say that it has a parent or 
   anything like that).

   A tag can also point to a single file object ("blob": pure file 
   content), which is soemthing that the git.git repository uses to point 
   to the GPG public key that Junio uses to sign things, for example.

   But perhaps more commonly, a tag can also point to a special "tag" 
   object, which is just a form of indirection that can optionally contain 
   an explanation and a digitally signed verification. When I cut a kernel 
   release, for example, my tag's don't point to the commit that is the 
   release commit, they point to a GPG-signed tag-object that in turn 
   points to the commit. 

   With those signed tags, people can verify (if they get my public key) 
   that a particular release was something I did. And due to the 
   cryptographic nature of the hash, trusting the tag object also means 
   that you can trust the commit it points to, and the whole history that 
   points to.

   So while from a _revision_lookup_ standpoint a "branch" and a "tag" do 
   100% the same thing, we put ...
From: Petr Baudis
Date: Sunday, October 22, 2006 - 9:02 am

Dear diary, on Sun, Oct 22, 2006 at 01:49:04AM CEST, I got a letter

http://repo.or.cz/git-browser/by-commit.html?r=linux-2.6.git

It could use plenty of improvement, though.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Andreas Ericsson
Date: Wednesday, October 25, 2006 - 2:52 am

There was one, but it got discontinued due to performance issues. Shame 
that, because it would have been nice to have to show "foreign" visitors 
how gitk/qgit works. It would especially show the way git thinks about 
branches and stuff like that.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 12:41 pm

Perhaps I'd better use "star topology bias" instead of "centralization
bias".

In git branches are lightweight. Branch names are local to repository.
Repositories have identity. Bzr "branch" is strange mix of one-branch
git repository and git branch.

Git main workflow is fully decentralized workflow. All clones of the
same repository are created equal. In bzr the suggested workflow
(with revnos) forces one (or more) branches to be mainline (use "merge",
get empty-merges, revnos don't change) and leaf (use "pull", revnos
change).

I don't understand. If I merge 'next' branch into 'master' in git, I 
still have two branches: 'master' and 'next'.

And I don't understand why you are so hung on branch identities. Yes, if
somebody clones your 'repo' repository, he can have your 'master' branch
(refs/heads/master) named 'repo' (refs/heads/repo) or 'repo/master'
(refs/remotes/repo/master), but why that matters to you. It is _his_

For revnos to work you MUST have one "branch" to be considered
special, the hub in star topology. This very much precludes fully
distributed development. 

BTW. I get that you can use revids in revnos in bzr for fully
distributed and not star-topology geared development. But
Bazaar-NG revids are uglier that Git commit-ids.


In git I can fetch your changes but I don't need to merge them. Take
for example Junio 'pu' (proposed updates) branch: this is the branch
you shouldn't merge as it's history is constantly being rewritten.

If you don't want for your WIP to be publicly available, you don't
publish it. For example as far as I understand Junio works on Git
in his private repository, with many, many feature branches, but
he does push to public [bare] repository only some subset of branches,
and we can fetch/pull only those.

But still, if I am impatient I can pull from Junio every hour, and
I don't get 24 totally useless empty merge messages if he took day

But please, have you realized that in this workflow the two clones
of the same ...
From: David Clymer
Date: Sunday, October 22, 2006 - 12:18 pm

I think you missed the point. Speaking for myself, I want to maintain
the identity of _my_ branches. If you clone one of them, I _don't_ care.
That's your branch. Branch identity as presented here is not intended to

OK, just to clarify what you are saying here:=20

1. revnos don't work because they don't serve the same purpose as revids
or git's SHA1 commit ids.

2. bzr does not support fully distributed development because revnos
"don't work" as stated in #1.

3. Ok, bzr does support distributed development, I just say it doesn't
because I think revids are ugly.

Thus, revids are ugly.

Is this really the argument you want to be making? I'm not disagreeing
with you; it's just that I'm not sure it's relevant.

Can we just put the whole "revnos don't work" thing to rest?

Revnos are only intended to be significant relative to a given branch.
They are not intended to serve as an absolute, global identifier.

Revnos + a url _are_ globally significant, but are not static except in
certain topologies.

Revids are globally significant and static in any topology.

If a user does not like or cannot use revnos, they may use revids.
Revnos are not a tool to be used for every job. In no way does that mean
that they are broken.

If a given developer or group of developers primarily use revnos or
revids, it _may_ indicate that _they_ have a bias towards central (or
star) or distributed development, but does not necessarily have any

I think that when I attempt to pull from one branch to another, if they
are identical, neither branch changes. Merging + pulling results in
identical history, causing revnos on the pulling branch to change. Just
merging maintains divergent views of the same history.=20

Perhaps bzr has a central bias in the view that each developer has the
option of seeing their own branch as the central focus of his/her
development. This view would be the same from each branch; each
developer views his/her own branch as special. If the developer does not
want ...
From: Jakub Narebski
Date: Sunday, October 22, 2006 - 12:57 pm

Branches in bzr are both one-source (one head) DAG (of parents), and
the "mainline" i.e. track of commits commited in this branch-as-place.
Bazaar-NG tries to keep both information in DAG by using first parent
to mark commits on current branch-as-place.

Additionally bzr by default uses revnos, numbering commits on branch,
which needs maintaining mainline identity for revnos not to change
even for one branch-as-place.

This leads to the need to use "merge" if you want to maintain revnos
unchanged, and "pull" if you are not interested in that.


Git correctly realizes that mainline identity is local information,
and instead of trying to save local information in DAG which is shared,
it uses reflog.

That is the EFFECT of preferring fast-forward over preserving
"first parent is my branch" property. So the RESULT is that
shared history is identically ordered.

-- 
Jakub Narebski
Poland
-

From: Jakub Narebski
Date: Sunday, October 22, 2006 - 1:06 pm

Revnos works only locally, or in star-topology configuration. They have
some consequences: treating first parent specially, need for merges
instead of fast-forward even if fast-forward would be applicable,
two different "fetch" operators: "pull" (which uses revids on the
Bazaar is biased towards centralized/star-topology development if we
want to use revids. In fully distributed configuration there is no
I think that bzr revids are uglier that git commit-ids.

If on the pros side of bzr is "simple namespace", you must remember that
it is simple namespace only for not fully distributed development. The
pros of "simple namespace" with cons of "merge" vs "pull" and centralization
required for uniqueness of revids.
-- 
Jakub Narebski
Poland
-

From: David Clymer
Date: Monday, October 23, 2006 - 4:56 am

s/revids/revnos/g  but yes, I think I said this later in my previous

So revnos aren't globally meaningful in fully distributed settings. So
what? I don't see how this translates into bias. There is a lot of
functionality provided by bazaar that doesn't really apply to my use

I think you've switched revids and revnos, but I get what you are
saying. In fact, I think I said pretty much the same thing in the email
you are replying to. I don't think that anyone is disagreeing about
anything other than the assertion that bzr is biased because revnos are
used to simplify cases where it is possible to do so.

In any case, Matthew Fuller & Carl Worth cover this in greater detail in
emails further down in this thread (or one of its siblings), so I think
I'll stop here.

-davidc

--=20
gpg-key: http://www.zettazebra.com/files/key.gpg
From: Jakub Narebski
Date: Monday, October 23, 2006 - 5:54 am

First, bzr is biased towards using revnos: bzr commands uses revnos
by default to provide revision (you have to use revid: prefix/operator
to use revision identifiers), bzr commands outputs revids only when
requested, examples of usage uses revision numbers.

In order to use revnos as _global_ identifiers in distributed development,
you need central "branch", mainline, to provide those revnos. You have
either to have access to this "revno server" and refer to revisions by
"revno server" URL and revision number, or designate one branch as holding
revision numbers ("revno server") and preserve revnos on "revno server"
by using bzr "merge", while copying revnos when fetching by using bzr "pull"
for leaf branches. In short: for revnos to be global identifiers you need
star-topology.

Even if you use revnos only locally, you need to know which revisions are
"yours", i.e. beside branch as DAG of history of given revision you need
"ordered series of revisions" (to quote Bazaar-NG wiki Glossary), or path
through this diagram from given revision to one of the roots (initial,
parentless revisions). Because bzr does that by preserving mentioned path
as first-parent path (treating first parent specially), i.e. storing local
information in a DAG (which is shared), to preserve revnos you need to
use "merge" instead of "pull", which means that you get empty-merge in
clearly fast-forward case. This means "local changes bias", which some
might take as not being fully distributed.

Sidenote 1: Why Bazaar-NG tries to store "branch as ordered series
of revisions"/"branch as path through revisions DAG" in DAG instead
of storing it separately (like reflog stores history of tip of branch,
which is roughly equivalent of "branch as path" in bzr). It needs
some kind of cache of mapping from revno to the revision itself anyway
(unless performance doesn't matter for bzr developers ;-)! All what
left is to propagate this mapping on "pull"...

Sidenote 2: "Fringe" developer using default git ...
From: James Henstridge
Date: Monday, October 23, 2006 - 8:01 am

As has been said before, you can set an alias to always show revision

Why do you continue to repeat this argument?  No one is claiming that
a revision number by itself, as Bazaar uses them, is a global
identifier.  In fact, we keep on saying that they only have meaning in
the context of a branch.  If you want to use a revision number as part
of a globally unique identifier, it needs to be in combination with

I won't dispute that Bazaar has features that make it easier to work
with the revisions in the line of development of the branch you're
working on in comparison to the revisions from merges.  But given that
every Bazaar branch has this same bias towards their own main line of
development, how can that affect whether or not it is distributed?

James.
-

From: Aaron Bentley
Date: Monday, October 23, 2006 - 10:18 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


And, unlike git, Bazaar branches are all independent entities[1], and
they each have a URL.

So:

http://code.aaronbentley.com/bzrrepo/bzr.ab 1695

is a name for

abentley@panoramicfeedback.com-20060927202832-9795d0528e311e31

And it does not depend on any other branch, especially not bzr.dev

Since:
1. anyone with write access to the urls can create them
2. anyone with read access to the urls can read them
3. the maintainers of the mainline have no control over them
   (except as provided by 1)

these identifiers are not centralized.

Aaron

[1] The fact that they may share storage is not important to the model.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFPPlm0F+nu1YWqI0RAlmLAJ9cpw5X7UXQ82EmoIeUrKzEaFbhdACfZPsS
CRJ69XWi7XAWJRi7Fgt9ICU=
=WrV9
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Monday, October 23, 2006 - 10:53 am

If you don't use centralized numbers (i.e. always refering to bzr.dev,
either by using always (bzr.dev URL, revno), or by using "merge" for
bzr.dev and "pull" for rest), the numbers are volatile. If URL vanishes,
then (URL, revno) to revid mapping is no longer valid. Yeah, I know,
cool URI don't change...

Besides, you need [constant] network access for this mapping.
-- 
Jakub Narebski
Poland
-

From: Linus Torvalds
Date: Monday, October 23, 2006 - 11:04 am

I _think_ that Aaron was trying to say that

	abentley@panoramicfeedback.com-20060927202832-9795d0528e311e31

is always constant, so you can use that.

Of course, nobody will ever do that, because in practice they're not 
shown, the same way the "true" BK revision names were never shown and thus 
never really used.

		Linus
-

From: Jakub Narebski
Date: Monday, October 23, 2006 - 11:21 am

By the way, I wonder if accidentally identical revisions
(see example for accidental clean merge on revctrl.org)
would get the same revision id in bzr. In git they would.
-- 
Jakub Narebski
Poland
-

From: Jelmer Vernooij
Date: Monday, October 23, 2006 - 11:26 am

They won't. The revision id is made up of the committers email address,
a timestamp and a bunch of random data. It wouldn't be hard to switch
using checksums as revids instead, but I don't think there are any plans
in that direction.

Cheers,

Jelmer
--=20
Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
From: Jakub Narebski
Date: Monday, October 23, 2006 - 11:31 am

The place for timestamp and commiter info is in the revision metadata
(in commit object in git). Not in revision id. Unless you think that
"accidentally the same" doesn't happen...
-- 
Jakub Narebski
Poland
-

From: Jelmer Vernooij
Date: Monday, October 23, 2006 - 11:44 am

The revision id isn't parsed by bzr. It's just a unique identifier that
is generated at commit-time and is currently created by concatenating
those three fields. It can be anything you like. The bzr-svn plugin for
example creates revision ids in the form
svn:REVNUM@REPOS_UUID-BRANCHPATH and bzr-git uses git:GITREVID. Nothing
will break if bzr would start using a different format.

Cheers,

Jelmer

--=20
Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
From: Linus Torvalds
Date: Monday, October 23, 2006 - 11:45 am

Well, git and bzr really do share the same "stable" revision naming, 
although in git it's more indirect, and thus "covers" more.

In git, the revision name indirectly includes the commit comments too (and 
git obviously also distinguishes between "committer" and "author", and 
those end up being indirectly credited in the name of the commit too). But 
in a very real sense, the bzr stable ("real") revision name does 
effectively contain the same things as a git ID: it's just that it's a 
small subset (only committer+date+random number) of what git includes in 
its names.

So you could more easily _fake_ a commit name in bzr, and depending on how 
things are done it might be more open to malicious attacks for that reason 
(or unintentionally - if two people apply the exact same patch from an 
email, and take the author/date info from the email like hit does, you 
might have clashes. But with a 64-bit random number, that's probably 
unlikely, unless you also hit some other bad luck like having the 
pseudo-random sequence seeded by "time()", and people just _happen_ to 
apply the email at the exact same second).

The git use of hashes and parenthood information make any accidental 
clashes like that a non-issue: if you have exactly the same information, 
it really _is_ the same commit, since the hash includes the parenthood 
too. So you're left with just malicious attacks, and those currently look 
practically impossible too, of course.

So I don't think bzr and git differ in this respect. I think you can 
_trust_ stable git names a lot more, but that's a separate issue.

			Linus
-

From: Jelmer Vernooij
Date: Monday, October 23, 2006 - 11:56 am

There are no requirements on what a revid is in bzr. It's a unique
identifier, nothing more. It can be whatever you like, as long as it's
unique for that specific commit. The committer+date+random\ number is
Bzr stores a checksum of the commit separately from the revision id in
the metadata of a revision. The revision is not used by itself to check
the integrity of a revision.

Cheers,

Jelmer

--=20
Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
From: Shawn Pearce
Date: Monday, October 23, 2006 - 12:02 pm

I think Linus' original point here was that if you communicate the
revision id to another person and they fetch that revision there
is no assurance that the commit they have received is the exact
same commit you had.

In Git that assurance is implicitly present as the unique
identification you communicated to the other person is also that
integrity verification.  Therefore its nearly impossible to spoof.

-- 
Shawn.
-

From: Jakub Narebski
Date: Monday, October 23, 2006 - 12:12 pm

In unpacked git repository commit-id is also commit address. Pack files
adds another level of indirection via pack index file. And functions
as checksum.

P.S. I'm interested what are bzr equivalents of git different types
of objects: commits (revision info) and what is stored in there besides
commit message and "snapshot"; trees/manifest i.e. how files are 
gathered together to form given revision; blob i.e. what is the storage 
format and how it is divided: changeset-like of Arch or file "buckets" 
of Mercurial and CVS, or something yet different together. Is there 
equivalent of git tags and tags objects?
-- 
Jakub Narebski
Poland
-

From: Linus Torvalds
Date: Monday, October 23, 2006 - 12:18 pm

That wasn't what I was trying to aim at - the problem is that the bzr 
revision ID isn't "safe" in itself. Anybody can create a revision with the 
same names - and they may both have checksums that match their own 
revision, but you have no idea which one is "correct".

So you just have to trust the person that generates the name, to use a 
proper name generation algorithm. You have to _trust_ that your 64-bit 
random number really is random, for example. And that nobody is trying to 
mess with your repo.

This isn't a problem in normal behaviour, but it's a problem in an attack 
schenario: imagine somebody hacking the central server, and replacing the 
repository with something that had all the same commit names, but one of 
the revisions was changed to introduce a nasty backhole problem. Change 
all the checksums to match too..

It would _look_ fine to somebody who fetches an update, and the maintainer 
might not ever even notice (because he wouldn't send the _old_ revision 
again, and _his_ tree would be fine, so he'd happily continue to to send 
out new revisions on top of the bad one on the public site, never even 
realizing that people are fetching something that doesn't match what he is 
pushing).

In contrast, in git, if you replace something in a git repository, the 
name changes, and if I were to try to push an update on top of a broken 
repo like that, it simply wouldn't work - I couldn't fast-forward my own 
branch, because it's no longer a proper subset of what I'm trying to send.

So in git, you can _trust_ the names. They actually self-verify. You can't 
have maliciously made-up names that point to something else than what they 
are. 

[ Also, as a result, and related to this same issue: the git protocol 
  actually never sends object names when sending the object itself. It 
  just sends the object data, and the _recipient_ generates the name from 
  that.

  So you can't do the _other_ kind of spoofing, and make a repository that 
  _claims_ to have ...
From: Linus Torvalds
Date: Monday, October 23, 2006 - 11:34 am

git can have no "accidentally identical revisions". They'd have to be 
purposefully done, but yes, they'd obviously (on purpose) get the same 
revision name if that's the case.

You may think of tree (not commit) identity, where git on purpose names 
trees the same regardless of how you got to them. So on a _tree_ level, 
you are always supposed to get the same result regardless of how you 
import things (ie two people importing the same tar-ball should always get 
exactly the same tree ID).

But the actual commit names are identical only if the same people are 
claimed to have authored (and committed) them at the same time - so it's 
definitely not "accidental" if the commits are called the same: they 
really _are_ the same.

Btw, I think you misunderstand the term "accidental clean merge". It means 
that two identical changes on two branches will merge without conflicts 
being reported.

A merge algorithm that doesn't do "accidental clean merge" is totally 
broken. The accidental clean merge is a usability requirement for pretty 
much anything - you often have two branches doing the same thing (possibly 
for different reasons - two people independently found the same bug that 
showed itself in two different ways - so they may even think that they 
are fixing different issues, and may have written totally different 
changelogs to explain the bug, but the solution is identical and should 
obviously merge cleanly).

So "accidental clean merge" may _sound_ like something bad, but it's 
actually a seriously good property (it's really just a special case of 
"convergence" - again, that's a good thing).

		Linus
-

From: Jeff King
Date: Monday, October 23, 2006 - 1:06 pm

Sorry, I don't understand this statement. How are git branches not
independent? Sure, they tend to exist in repositories with other
branches, but there's no need to (it simply allows the sharing of object
storage). There's no reason I can't move any branch from any repo into
its own repo, or vice versa move any unrelated branch into a repo with
other branches.

It all Just Works because there _isn't_ any branch information. It's
simply a pointer into the DAG, so if I have the right parts of the DAG
(which git is careful to make sure of), I can just make a pointer, and I

In cogito, branches can each have a URL, but git-clone doesn't have a
way (that I know of) to clone only a subset of branches. It would be

The git analog is of course:

http://kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git v2.6.18

as a name for

e478bec0ba0a83a48a0f6982934b6de079e7e6b3

The difference being that Linus assigned the "local" name of v2.6.18

Of course. For me, the above commit is actually

  ssh://peff.net/home/peff/git/linux-2.6 v2.6.18

but once it is in my local repository, it's indistinguishable from one I
pulled directly from kernel.org.

And I wonder if THAT is at the root of this discussion. bzr isn't
"centralized" in the sense that you have to talk to a central server, or
rely on it for doing any operations.  But you actually CARE about where
your commits come from, and git fundamentally doesn't.

-Peff
-

From: Jakub Narebski
Date: Monday, October 23, 2006 - 1:29 pm

By the way, git repositories (remember that working area in bzr is
associated with branch, and in git with repository) can share storage,
either sharing only immutable "old history" (part of DAG) via 
$GIT_DIR/objects/info/alternates file or GIT_ALTERNATE_OBJECT_DIRECTORIES
environment variable, or via having shared commit object database
via symlinking $GIT_DIR/objects directory or via setting 
GIT_OBJECT_DIRECTORY variable. 

Git doesn't support latter fully out of the box (you must be careful
with prune) but on the other side bzr doesn't support cloning whole

Well, with exception of reflog, which is local to repository

On the other side Cogito doesn't have way to clone all the branches.

-- 
Jakub Narebski
Poland
-

From: David Clymer
Date: Monday, October 23, 2006 - 8:24 pm

Agreed. Of course, I want the simplest case to be the simplest. When
working on my own branch, regardless if it is a standalone project or
part of a distributed one, I don't want to have to type SHA hashes or
revids. Numbers serve my purposes best in this case. When I communicate

Ok. Let's not repeat this again. I think I said this once, and you've
said it in two following emails. It's a given. Assume that we all know

"local changes bias" I can buy that. I even like it. I don't even care
if that makes bazaar "not fully distributed." I don't think the
distinction between "fully" and "almost, except for some technicality"
distributed is one that has much practical value.

-davidc
--=20
gpg-key: http://www.zettazebra.com/files/key.gpg
From: Carl Worth
Date: Saturday, October 21, 2006 - 1:47 pm

I apologize if I've come across as beating a dead horse on this. I've
really tried to only respond where I still confused, or there are
explicit indications that the reader hasn't understood what I was
saying, ("I don't understand how you've come to that conclusion",
etc.). I'll be even more careful about that below, labeling paragraphs

I'm missing something:

I still haven't seen strong examples for this last claim. When are
they handier? I asked a couple of messages back and two people replied
that given one revno it's trivial to compute the revno of its
parent. But that's no win over git's revision specifications,

Maybe I wasn't clear:

There's no doubt that there has been semantic confusion over the term
branch that has been confounding communication on both sides. Here's
my attempt to describe the situation, (which only became this clear
recently as I started playing with bzr more). This is not an attempt
at a complete description, but is hopefully accurate, neutral, and
sufficient for the current discussion:

  Abstract: In a distributed VCS we are using a distributed process to
  create a DAG, (nodes are associated with revisions and point to parent
  nodes). The distributed nature means that the collective DAG will have
  multiple source nodes, (often termed heads or tips).

  Git: A subset of the DAG is stored in a "repository". The DAG in the
  repository may have many source nodes. A "branch" is a named reference
  to a node (whether or not a source). Multiple local repositories may
  share storage for common objects. There are inter-repository commands
  for copying revisions and adjusting branch references, but basically
  all other operations act within a single repository.

  Bzr: A subset of the DAG is stored in a "branch". The DAG in the
  branch has a single source node. Multiple local branches may share
  storage for common objects through a "repository". Basically all
  operations (where applicable) can act between branches.

Let me know if I ...
From: Jakub Narebski
Date: Saturday, October 21, 2006 - 1:55 pm

git-show-branch also shows git-name-rev like names.

BTW. git-show-branch has somewhat strange, and different from other git 
commands UI. You can think of it as text version of gitk/qgit history 
viewer (although you can use tig for CLI (ncurses) graph).
-- 
Jakub Narebski
Poland
-

From: Jeff Licquia
Date: Saturday, October 21, 2006 - 4:07 pm

Having used both (though my familiarity with git is less), in my opinion
the biggest win is the obvious one: sequential numbers work in the head
better than SHA1 checksums.

"But it's not a problem in practice!" is a good retort, except that I
wonder whether the set of "practices" you're using includes anyone who
decided to pass on git in favor of something else--perhaps because they
saw a few SHAs float by and ran in terror.  Beware of self-selection
bias.

Put another way, "strength" of example is often in the eye of the
beholder.  That we continue to give you the same "weak" examples may be
evidence that we have a different impression of their strengths, and
that your analysis of their strengths isn't convincing to us.

I suppose this line of conversation still has value if you don't see any
benefit at all, but OTOH if you really don't see how sequential numbers
are easier to work with in the head than SHA sums with modifiers, I'm


I wonder if part of the problem is that the revno scheme we've been
talking about (the x.y.z... format) doesn't technically exist in any
released version of bzr that I know of.

Previous to 0.12, bzr revnos were absolutely a local thing; revisions
from merges didn't even have revnos (except for the merge commit
itself).  If you merged a branch and you later wanted to recreate that
branch, or see a diff from that branch, etc., you had to use revids.

So when you talk of a "centralization bias" in bzr, a lot of us get
confused, defensive, etc., because from our perspective, bzr and git
weren't all that much different until just recently.

Now it may be that you're right that "global" revnos like bzr has now
introduce a bias in favor of centralization.  If that's true, I'm not
sure that totally vindicates the git model.  We have to ask if the bias
is a good thing, but so do you; after all, we may have done so because
of user demand, and if our users want it, maybe yours will want it too
someday.

(I say "may" because I haven't been paying ...
From: Sean
Date: Saturday, October 21, 2006 - 4:25 pm

On Sat, 21 Oct 2006 19:07:10 -0400

There is no need to speculate, the numbers will only be reliable on a local
basis.  So yes you can force a single repository like bzr.dev to always "win"
any conflict and force the other guy to change ie. a central repo model.
But they can not be maintained consistently in a truly distributed
system.  As Linus pointed out that is fact, not opinion.

Now the opinion of the bzr people is that it doesn't matter and that for
all important cases it works well enough.  If all the people who don't like
the look of sha1's self select bzr, so be it, but that doesn't change the
fundamental argument.

But just to reiterate, the design of Git is flexible enough to where you
can automatically generate "revno" tags for every commit in your repo
_today_.  You'd end up with the exact same problems that bzr will
eventually hit, but Git already has everything you need today to refer
to every commit in your repo as r1 r2 r3 r4 etc...  

Sean
-

From: Matthew D. Fuller
Date: Sunday, October 22, 2006 - 5:46 am

[ Time to trim up CC's a bit ]

On Sat, Oct 21, 2006 at 01:47:08PM -0700 I heard the voice of

Oh, I don't mean the whole topic in general.  It's just that there are
only so many ways one can say "revnos are only valid in certain
situations", and I really think we must have hit them all by now.  We
all agree on that; we just disagree (probably highly based on


This seems correct; at least, it's correct enough to work from until


Rather, unless you can one way or another access the branch the number

I think it's using that 'c' word there that's causing contention here;
we're ascribing different meanings to it.

Revnos only apply to a specific "branch" (in this usage, I'm talking
about branch abstractly and somewhat specifically; more in a moment),
and so except by wild coincidence are only useful in talking about
that branch.  One of the two cases (the second discussed later) where
that's useful is when you have long-lived branches.  In git,
apparently, you don't have long-lived "branches" in this particular
meaning of the word, but the way people use bzr they do.  Perhaps this
is what you mean by 'centralization'.

That long-lived branch doesn't have to be any sort of "trunk", though
it usually is; it could as easily be something totally peripheral.


Now, details of that use of "branch".  In mathematical terms, a branch
may be defined purely by its head rev (and the graph built up by
recursing through all the parents), but in [bzr] UI and mental model
terms, a "branch" is that plus its mainline[0]; the left-most or first
line of descent, which colloquially is the difference between 'things
I commit' and 'things I merge'.

Let me try flexing my git-expression muscles here.  Given a branch at
a specific point in time, you point at the head rev, and there's a
subset we call 'mainline' of the whole set of parents, which is
expressed by following the 'first' parent pointers back to a single
origin (there can be 50 origins in the whole graph, of course, but
only one of ...
From: David Clymer
Date: Sunday, October 22, 2006 - 12:36 pm

I would say that: revnos are handier tools than revids...etc

I think that since G: was making a statement about revids, B: was making
an implicit comparison with them.

bzr log -r before:1  =20

being handier than

bzr log -r before:revid:david@zettazebra.com-20061022175244-4b85cb5f0cbc79a=
d


-davidc
--=20
gpg-key: http://www.zettazebra.com/files/key.gpg
From: Andreas Ericsson
Date: Wednesday, October 25, 2006 - 2:35 am

This is new to me. At work, we merge our toy repositories back and forth 
between devs only. There is no central repo at all. Does this mean that 
each merge would add one extra commit per time the one I'm merging with 
has merged with me?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Jakub Narebski
Date: Wednesday, October 25, 2006 - 2:46 am

From what I understand, "bzr merge" will create one extra commit to
preserve the "first parent is my branch" feature. "bzr pull" will do
fast-forward if your DAG is proper subset of pulled branch/repository
DAG, but at the cost that it would change your revno to revision mapping
to those of the pulled repository.

That's a consequence of preserving branch as "my work" i.e. as path
through "branch DAG" in the DAG using first parent as special, instead
of saving it outside DAG.

-- 
Jakub Narebski
Poland
-

From: James Henstridge
Date: Wednesday, October 25, 2006 - 3:08 am

Actually, "bzr merge" does not create any commits on the branch -- you
need to run "bzr commit" afterwards (possibly after resolving
conflicts).  The control files for the working tree record a pending
merge, which gets recorded when you get round to the commit.

So you can easily check if there were any tree changes resulting from the merge.

If there aren't, or you made the merge by mistake, you can make a call
to "bzr revert" to clean things up without ever having created a new
revision.

James.
-

From: Carl Worth
Date: Wednesday, October 25, 2006 - 8:54 am

One result of this approach is that developers of different trees
don't necessarily have common revision IDs to compare. Imagine a
question like:

	When you ran that test did you have the same code I've got?

In git, the answer would be determined by comparing revision IDs.

In bzr, the only answer I'm hearing is attempting a merge to see if it
introduces any changes. (I'm deliberately avoiding "pull" since we're
talking about distributed cases here).

And to comment on something mentioned earlier in the thread, there's
no need for "wildly complex" distributed scenarios. All of these
issues are present with developers working together as peers, (and
each considering their own repository as canonical).

A harder question (for bzr) is:

	Do you have all of the history I've got?

(The problem being that when one developer is missing some history and
merges it in, she necessarily creates new history, so there's never a
stable point for both sides to agree on.)

-Carl
From: James Henstridge
Date: Thursday, October 26, 2006 - 1:52 am

Can you really just rely on equal revision IDs meaning you have the
same code though?

Lets say that I clone your git repository, and then we both merge the
same diverged branch.  Will our head revision IDs match?  From a quick
look at the logs of cairo, it seems that the commits generated for
such a merge include the date and author, so the two commits would
have different SHA1 sums (and hence different revision IDs).

So I'd have a revision you don't have and vice versa, even though the

Or run "bzr missing".  If the sole missing revision is a merge (and
not the revisions introduced by the merge), you could assume that you

Why does it matter if they create a new revision?  They can still tell
if they've got all the history you had.

James.
-

From: Junio C Hamano
Date: Thursday, October 26, 2006 - 2:33 am

If you two have the same commit that is a guarantee that you two
have identical trees.  The reverse is not true as logic 101
would teach ;-).

Doing fast-forward instead of doing a "useless" merges helps
somewhat but not in cases like two people merging the same
branches the same way or two people applying the same patch on
top of the same commit.  You need to compare tree object IDs for

Is it "you could assume" or "it is guaranteed"?  If former, what
kind of corner cases could invalidate that assumption?


-

From: James Henstridge
Date: Thursday, October 26, 2006 - 2:57 am

That was the point I was trying to make.  Carl asserted that in git
you could tell if you had the same tree as someone else based on
revision IDs, which doesn't seem to be the case all the time.

The reverse assertion (that if you have the same revision ID, you have

Sure, you can do the same in Bazaar by comparing the inventories for

The merge revision will also include any manual conflict resolution.
If the other person resolved the conflicts differently.

James.
-

From: Jeff King
Date: Thursday, October 26, 2006 - 3:10 am

If you have the same revision (commit IDs), you have the same tree (at
the same time, by the same committer, etc).

If you have a different revision (commit), you may or may not have the
same tree. You can then check the tree id, which will either be the same
(you have the same tree) or differ (you don't).

Thus, in the converse, if you have the same tree, you _will_ have the
same tree id. You may or may not have the same commit id.

-Peff
-

From: Vincent Ladeuil
Date: Thursday, October 26, 2006 - 3:52 am

>>>>> "Jeff" == Jeff King <peff@peff.net> writes:

    Jeff> On Thu, Oct 26, 2006 at 05:57:20PM +0800, James Henstridge wrote:
    >> >If you two have the same commit that is a guarantee that you two
    >> >have identical trees.  The reverse is not true as logic 101
    >> >would teach ;-).
    >> 
    >> That was the point I was trying to make.  Carl asserted that in git
    >> you could tell if you had the same tree as someone else based on
    >> revision IDs, which doesn't seem to be the case all the time.

    Jeff> If you have the same revision (commit IDs), you have
    Jeff> the same tree (at the same time, by the same committer,
    Jeff> etc).

    Jeff> If you have a different revision (commit), you may or
    Jeff> may not have the same tree. You can then check the tree
    Jeff> id, which will either be the same (you have the same
    Jeff> tree) or differ (you don't).

    Jeff> Thus, in the converse, if you have the same tree, you
    Jeff> _will_ have the same tree id. You may or may not have
    Jeff> the same commit id.

Ok, so git make a distinction between the commit (code created by
someone) and the tree (code only).

Commits are defined by their parents.

Trees are defined by their content only ?

If that's the case, how do you proceed ? 

Calculate a sha1 representing the content (or the content of the
diff from parent) of all the files and dirs in the tree ?  Or
from the sha1s of the files and dirs themselves recursively based
on sha1s of the files and dirs they contain ?

I ask because the later seems to provide some nice effects
similar to what makes BDD
(http://en.wikipedia.org/wiki/Binary_decision_diagram) so
efficient: you can compare graphs of any complexity or size in
O(1) by just comparing their signatures.

    Vincent



-

From: Jeff King
Date: Thursday, October 26, 2006 - 4:13 am

Yes (a commit is a tree, zero or more parents, commit message, and



Recursively. Each tree is an ordered list of 4-tuples: pathname, type,
sha1, mode. If the type is "blob" then the sha1 is the hash of the file
contents. If the type is "tree" then the sha1 is the id of a sub-tree.

Yes, if two trees' hashes compare equal, they contain the same data. I
believe we are not currently using this optimization to find merge
differences, but there was some discussion earlier this week about doing
so.

-Peff
-

From: Jeff King
Date: Thursday, October 26, 2006 - 4:15 am

Sorry, I should clarify: a commit is a _tree id_, zero or more _parent
ids_, commit message, etc.

-Peff
-

From: Linus Torvalds
Date: Thursday, October 26, 2006 - 8:05 am

Commits are defined by a _combination_ of:
 - the tree they commit (which is recursive, so the commit name indirectly 
   includes information EVERY SINGLE BIT in the whole tree, in every 
   single file)
 - the parent(s) if any (which is also recursive, so the commit name 
   indirectly includes information about EVERY SINGLE BIT in not just the 
   current tree, but every tree in the history, and every commit that is 
   reachable from it)
 - the author, committer, and dates of each (and committer is actually 
   very often different from author)
 - the actual commit message

So a commit really names - uniquely and authoratively - not just the 

Where "contents" does include names and permissions/types (eg execute bit 

If you compare the commit name, and they are equal, you automatically know
 - the trees are 100% identical
 - the histories are 100% identical

If you only care about the actual tree, you compare the tree name for 
equality, ie you can do

	git-rev-parse commit1^{tree} commit2^{tree}

and compare the two: if and only if they are equal are the actual contents 


This is exactly what git does. You can compare entire trees (and 
subdirectories are just other trees) by just comparing 20 bytes of 
information.

How do you think we can do a diff between two arbitrary kernel revisions 
so fast? Why do you think we can afford to do a 

	git log drivers/usb include/linux/usb*

that literally picks out the history (by comparing state) for every commit 
in the tree?

I can do the above log-generation in less than ten _seconds_ for the last 
year and a half of the kernel. That's 20k+ lines of logs of commits that 
only touch those files and directories. And I _need_ it to be fast, 
because that's literally one of the most common operations I do.

And the reason it's fast is that we can compare 20,000 files (names, 
contents, permissions) by just comparing a _single_ 20-byte SHA1.

In git, revision names (and _everything_ has a revision name: commits, ...
From: Vincent Ladeuil
Date: Thursday, October 26, 2006 - 9:04 am

>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:

    Linus> On Thu, 26 Oct 2006, Vincent Ladeuil wrote:
    >> 
    >> Ok, so git make a distinction between the commit (code created by
    >> someone) and the tree (code only).
    >> 
    >> Commits are defined by their parents.

    Linus> Commits are defined by a _combination_ of:

    Linus>  - the tree they commit (which is recursive, so the
    Linus>  commit name indirectly includes information EVERY
    Linus>  SINGLE BIT in the whole tree, in every single file)

And here you keep that separate from any SCM related info,
right ?

    Linus>  - the parent(s) if any (which is also recursive, so
    Linus>  the commit name indirectly includes information about
    Linus>  EVERY SINGLE BIT in not just the current tree, but
    Linus>  every tree in the history, and every commit that is
    Linus>  reachable from it)

    Linus>  - the author, committer, and dates of each (and
    Linus>  committer is actually very often different from
    Linus>  author)

    Linus>  - the actual commit message

    Linus> So a commit really names - uniquely and authoratively
    Linus> - not just the commit itself, but everything ever
    Linus> associated with it.

Thanks for the clarification. But no need to shout about EVERY
SINGLE BIT, the pointer to BDDs was already talking a bit about
bits :) 

But I agree, this is the important point that may be missed.

    >> Trees are defined by their content only ?

    Linus> Where "contents" does include names and
    Linus> permissions/types (eg execute bit and symlink etc).

Which can also be expressed as: "Everything the user can
manipulate outside the SCM context", right ?

    >> If that's the case, how do you proceed ? 

    Linus> If you compare the commit name, and they are equal,
    Linus> you automatically know

    Linus>  - the trees are 100% identical
    Linus>  - the histories are 100% identical

And that's the only info you can get, no ...
From: Linus Torvalds
Date: Thursday, October 26, 2006 - 9:21 am

I don't understand that question.

The commits contain the tree information. A raw commit in git (this is the 
true contents of the current top commit in my kernel tree, just added 
indentation and an empty line between the command I used to generate it 
and the output, to make it stand out better in the email) looks something 
like this:

   [torvalds@g5 linux]$ git-cat-file commit HEAD

   tree ba1ed8c744654ca91ee2b71b7cdee149c8edbef1
   parent 2a4f739dfc59edd52eaa37d63af1bd830ea42318
   parent 012d64ff68f304df1c35ce5902f5023dc14b643f
   author Linus Torvalds <torvalds@g5.osdl.org> 1161873881 -0700
   committer Linus Torvalds <torvalds@g5.osdl.org> 1161873881 -0700
   
   Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
   
   * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
     [SPARC64]: Fix memory corruption in pci_4u_free_consistent().
     [SPARC64]: Fix central/FHC bus handling on Ex000 systems.

where the _name_ of the commit is 

   [torvalds@g5 linux]$ git-rev-parse HEAD

   e80391500078b524083ba51c3df01bbaaecc94bb

ie the commit itself contains the exact tree name (and the name of the 
parents), and the name of the commit is literally the SHA1 of the contents 

Again, I'm not sure what you mean by that. The SCM does not track 
_everything_. It does not track user names and inode numbers, so in a 
sense a developer can change things that the SCM simply doesn't _care_ 
about and never tracks. But yes, the tree contents uniquely identify the 

No, there is ordering there too. But yes, the ordering is not in the name 
itself, you have to go look at the actual commit history to see it.


No. 

If the signatures are equal, the contents are equal, and vice versa. It 

No. Don't even think that way. That just confuses you. The hash is 
cryptographic, and large enough, that you really can equate the contents 
with the hash. Anything else is just not even interesting.

		Linus
-

From: Joseph Wakeling
Subject: git and bzr
Date: Monday, November 27, 2006 - 5:01 pm

Hello all,

Following the very interesting debate about the differences between bzr
and git, I thought it was about time I tried to learn properly about git
and how to use it.  I've been using bzr for a good while now, although
since I'm not a serious developer I only use it for simple purposes,
keeping track of code I write on my own for academic projects.

So, a few questions about differences I don't understand...

First off a really dumb one: how do I identify myself to git, i.e. give
it a name and email address?  Currently it uses my system identity,
My Name <username@computer.(none)>.  I haven't found any equivalent of
the bzr whoami command.

Now to more serious business.  One of the main operational differences I
see as a new user is that bzr defaults to setting up branches in
different locations, whereas git by default creates a repository where
branches are different versions of the directory contents and switching
branches *changes* the directory contents.  bzr branch seems to be
closer to git-clone than git-branch (N.B. I have never used bzr repos so
might not be making a fair comparison).

With this in mind, is there any significance to the "master" branch (is
it intended e.g. to indicate a git repository's "stable" version
according to the owner?), or is this just a convenient default name?
Could I delete or rename it?  Using bzr I would normally give the
central branch(*) the name of the project.

(* Central or main on my own system.  Not intended to be central in the
sense of a CVS-style version control setup:-)

Any other useful comments that can be made to a bzr user about working
with this difference, positive or negative aspects of it?

Next question ... one of the reasons I started seriously thinking about
git was that in the VCS comparison discussion, it was noted that git is
a lot more flexible than bzr in terms of how it can track data (e.g. the
git pickaxe command, although I understand that's not in the released
version [1.4.4.1] yet?).  A ...
From: Sean
Date: Monday, November 27, 2006 - 5:40 pm

On Tue, 28 Nov 2006 01:01:46 +0100

Assuming you have a recent version of git, then:

$ git repo-config --global user.email "you@email.com"
$ git repo-config --global user.name "Your Name"

Will setup a ~/.gitconfig in your home directory; these settings
will apply in any repo you use.  Drop the "--global" to set them

It's just a common convention and carries no special significance;

Don't be afraid to git-clone your local repo, especially with the -l
and -s options.  That will get you a separate repo/working directory
while not taking up much extra disk space (objects from your first
repo will be shared with the second).

Once you get comfortable with multiple branches in a single repo/
working directory, it often is much better than the alternatives.

The Git cherry-pick command lets you grab specific commits from
other branches in your repo.  But cherry-pick works at the commit
level, there is no easy way to grab a single function for instance
and merge just its history into another branch.

However, you can merge an entire separate project into yours even
though they don't share a base commit.  This has been done several
times in the history of Git itself. For instance you can see two
separate "initial" commits in the Git repo with a command like
"gitk README gitk" which gives a graphical history of the "gitk"
and "README" files and shows each started life in a separate
initial commit.  Use "git show 5569b" to see Linus bragging on

Don't think a direct bridge between the two has been written yet.

Cheers,
Sean
-

From: Linus Torvalds
Date: Monday, November 27, 2006 - 7:57 pm

Depending on whether you like editing config files by hand or not, you 
would either just edit your ~/.gitconfig file and add a section like:

	[user]
		name = My Name Goes Here
		email = myemail@work.com

or you would use "git repo-config" to do it for you. Personally, I find it 
easier to just edit the .gitconfig file directly, since the config file 
syntax is actually rather pleasant, but if you want to do it with a git 
command, you'd do

	git repo-config --global user.name "Joseph Wakeling"
	git repo-config --global user.email joseph.wakeling@webdrake.net

(where the "--global" just tells repo-config to use the user-global 
~/.gitconfig file - you can also do this on a per-repository basis in the 
repository .git/config file if you want to have different identities for 

You can do either, it's almost purely a matter of taste.

Using a local branch and switching between them in place has some 
advantages once you get used to it: most notably you can trivially use git 
commands that work on data from different branches at the same time. So 
with that kind of setup it's very natural to do things like "show me 
everything that is in branch 'x', but _not_ in branch 'y'", and once you 
get used to that, you really appreaciate it.

But at the same time, if you want to actually keep several branches 
checked out at the same time, and prefer to work on them that way, just 
use "git clone" to create the other branch instead. It really is just a 
matter of taste.

I suspect that most people tend to end up using the "multiple branches in 
the same directory and switching between them" approach after a time, but 
that's really just an unsubstantiated feeling, and it certainly isn't 

It's just a convenient default name, and it has no real meaning otherwise. 
Feel free to rename it any way you want (just make sure to edit HEAD to 

There should be no difference, although since everybody seems to use 
"master" by default, the documentation is probably geared towards it, ...
From: Joseph Wakeling
Date: Tuesday, November 28, 2006 - 7:23 pm

Thanks to everyone for your very detailed responses. :-)

On the subject of blame and pulling patches from unrelated branches,




So ... if I understand correctly, I can get patches from somewhere else,
but in the branch history, I will not be able to tell the difference
from having simply newly created them?

With regards to git blame/pickaxe/annotate, the idea of tracking *code*
rather than files was one thing that really excited me when I read about
it in the earlier discussion, and is probably the main reason I'm trying
out git.  I'd like to understand this properly so is there a simple
exercise I can do to demonstrate its capabilities?  I tried an
experiment where I created one file with two lines, then cut one of the
lines, pasted it into a new file, and committed both changes at the same
time.  But git blame -C on the second file just gives me the
time/date/sha1 of its creation, and no indication that the line was
taken from elsewhere.

Back to the more basic queries ... one more difference I've observed
from bzr, after playing around for a while, involves the commands to
undo changes and commits.  It looks like git reset combines the
capabilities of both bzr uncommit and bzr revert: I can undo changes
since the last commit by resetting to HEAD, and I can undo commits by
resetting to HEAD^ or earlier.

Some things here I'm not quite sure about:
(1) the difference between git reset --soft and git reset --mixed,
probably because I don't understand the way the index works, the
difference between changed, updated and committed.
(2) How to remove changes made to an individual file since the last commit.

Last, could someone explain the git merge command?  git pull seems to do
many things which I would need to use bzr merge for---I can "pull"
between branches which have diverged, for example.  I don't understand
quite what git merge does that's different, and when to use one or the
other.

Many thanks again to everyone,

    -- Joe
-

From: Linus Torvalds
Date: Tuesday, November 28, 2006 - 8:51 pm

Think of it this way: if the _patch_ looks like it's a code movement, then 
"git blame" will show it as a code movement. Ie, if the patch (to a human) 
looks like it's moving a function from one file into another (which in a 
patch will obviously be a question of removing it from one file, and 
adding it to another), then git will also see it that way, and then "git 
blame" will also follow its history as it moved.

But if somebody sends you a patch that just adds a new function that 
didn't exist in that context at all, then "git blame" won't ever realize 

Actually, I think you found a bug.

Now, with small changes, "git blame -C" will just ignore copies entirely, 
so your particular test might not have even been supposed to work, but 
trying with a new git repo with two bigger files checked in at the initial 
commit, I'm actually not seeing "git blame -C" do the right thing even for 
real code movement.

And the problem seems to go to the "root commit": if the file existed in 
the root, the logic in "git blame" to diff against the (nonexistent) 
parent of the root commit won't do the right thing, and that just confuses 
git blame entirely.

I think Junio screwed up at some point. I'll send him a bug-report once 
I've triaged this a bit more, but I can recreate your breakage if I start 
a new git database and create two files in the root, and move data between 
them in the second commit (but if I instead create the second file in the 
second commit, and do the movement in the third commit, git blame -C works 

I'm not quite sure what "bzr revert" does. Git does have a "revert" too, 
but it will append a _new_ commit that actually undoes the commit you're 
asking to revert. If you want to just "undo history" (whether it's one 
commit or many - I don't see why it would be different) then yes, "git 
reset" is the thing to use.

I _suspect_ that bzr people use "uncommit" to undo a commit in order to 
fix it up. In git, you could do that with "git reset" and a new ...
From: Joseph Wakeling
Date: Wednesday, November 29, 2006 - 5:17 am

Obvious when I think about it, otherwise every 'int i;' in the kernel

Actually my setup was like the latter situation you describe, so blame
was probably working fine and just ignoring the small change.  But
serendipity is a wonderful thing. :-)

    -- Joe

-

From: Linus Torvalds
Date: Wednesday, November 29, 2006 - 9:39 am

Indeed. We didn't do that heuristic originally, and the most common 
sequence that was "blamed" on being copied from somewhere else was 
something like the string

	"<tab><tab><tab>}<nl><tab><tab>}<nl><tab>}<nl>"

which is obviously very common in C, especially when you have coding 

Yeah. As it turns out, the bug was really that "git blame" ended up just 
not showing the filenames (that it had followed correctly), because it had 
decided (incorrectly) that they weren't interesting because it all came 
from the same commit, and it had already shown that commit (just not that 
_file_ in that commit).

So it's fixed now, and probably would never trigger except for the stupid 
special case that was "let's just show an example of this" ;)

		Linus
-

From: Joseph Wakeling
Date: Thursday, November 30, 2006 - 11:24 am

I'm very happy my stupidity could help. ;-)

On a related note ...


I do think that bzr has quite an intuitive set of commands, and it is
easy to learn, though at this point I don't feel git is really *that*
much more difficult in itself.  Although the terminal output for some
problems could be improved, most of my difficulties are stemming from
overlap of command names when the commands themselves do different
things, and the fact that git's documentation is somewhat more technical
than bzr's.

What would be nice would be to have in the documentation a whole bunch
of stupid examples for the main commands, something where someone can
create a repo from scratch, create and modify some simple files
according to instructions, and see the particular command in action.
The tutorials do this, of course, but only for a few cases, when to be
honest it's the more complex commands that most need such explanation.
For beginners, especially less technically skilled ones, it would be
good to have a lot more of, "Do this, here's what git will respond, this
is what it means, here's how to fix it...."

As a relatively non-technical user, perhaps I should keep track of my
difficulties (and others') and try to write something up.

    -- Joe
-

From: Linus Torvalds
Date: Thursday, November 30, 2006 - 11:44 am

100% agreed. A lot of the man-pages etc have been written to be about the 
technology, not about the _use_ of it.

I encouraged people at some point to add an "Examples" section to some of 
the functions to show what it all _means_, so for "man git-log", I think 
some of the most useful stuff is that examples section that shows the 
combination of revision naming and path-name limiting, for example. I 
personally think that that is a much better way of teaching people what 
the commands actually do than by mentioning the arguments one by one.

But that only exists for a couple of man-pages, and mostly for the simple 
ones at that. And a lot of the real examples would need "real data" to 
work on, so it can't easily be done as a trivial example in a man-page, it 
really needs a tutorial to "build up" to the situation where you can then 

Yeah. The git "tutorial.txt" should be extended, and preferably be a while 
nice set of "follow along with the bouncing ball" kind of web-page 
sequence.

So I absolutely agree. It's just that at least me personally, I just can't 
write documentation. I wrote some of the original tutorial, I've written 
some of the original tech docs, but I just can't get into the whole 
"document it" mindset, especially not from a user perspective. It doesn't 
float my boat, and judging by a lot of the discussions, I obviously also 
don't even see why something could _possibly_ cause confusion.

To make things worse, a lot of the docs (and by that I also mean some of 
the error messages and helpful hints) tend to be old.

The whole fact that "git commit" mentions "git update-index" is exactly 
that kind of thing: it's largely a legacy message. You'd almost never 
actually _use_ git-update-index itself these days, and it's much more 
convenient to just list the files you want to commit to "git commit" 
directly (or just use the -a flag, if that is what you want to do).

But that message exists, because it was written in an earlier age.

		Linus
-

From: Carl Worth
Date: Thursday, November 30, 2006 - 12:55 pm

Here's a crazy idea. How about a "git tutorial" builtin or "git
example" or something that would create a repository into some useful
state for demonstrating something.

I know that I'm regularly putting stuff into emails like:

	mkdir gittest
	cd gittest
	git init-db
	echo hello > hello
	git add hello
	git commit -m "add hello"
	git checkout -b other
	echo other > other
	git add other
	git commit -m "add other"
	git checkout master

	# OK, that was just setup, here's what I want to demonstrate
	git pull . other
	...

So maybe if there was a command to setup a standard example
repository, ("git boilerplate", "git sandbox", "git playground" ?),
then the documentation could use that to have full-fledged examples
without having to duplicate similar setup each time.

And then there could be a way for this command to also spit out the
commands it is using to reach some state so it could even serve as a
sort of self-documenting tutorial of some sort.

Anyone interested in exploring something like that?

-Carl
From: Johannes Schindelin
Date: Thursday, November 30, 2006 - 3:17 pm

Hi,


That sounds fine! Actually, it should be very simple to turn the tutorial 
into such a script, displaying the command with an explanation, and 
executing the command. It could even call gitk from time to time, so the 
user can form a mental model of the ancestor graph.

Ciao,
Dscho

-

From: J. Bruce Fields
Date: Thursday, November 30, 2006 - 3:24 pm

Currently tutorial.txt doesn't work like that--there are places where it
just tells the user to edit a file, or make a few commits, without
listing commands to do so.  It also isn't linear.  That could all be
"fixed", but I think the result would just make it more tedious.

But I agree that a "git tutorial" command to set up a canonical example
repository might be fun.

--b.
-

From: Junio C Hamano
Subject: Re: git blame
Date: Thursday, November 30, 2006 - 3:38 pm

Doesn't one of our existing t/ scripts do that?

-

From: Johannes Schindelin
Subject: Re: git blame
Date: Thursday, November 30, 2006 - 3:53 pm

Hi,


;-) I did not forget... t1200-tutorial.sh

But it serves a different purpose: it makes sure that we did not break the 
commands in the tutorial. (I fear that the script and the tutorial have 
diverged a little bit, though).

git-tutorial should not test that, rather it should show the user what is 
possible, and encourage playing with git.

Ciao,
Dscho

-

From: zindar
Date: Tuesday, November 28, 2006 - 5:10 am

usage: bzr annotate FILENAME
aliases: ann, blame, praise

Show the origin of each line in a file.


/Erik
-

From: Nicholas Allen
Date: Thursday, November 30, 2006 - 5:36 am

I also have a basic question about git regarding its content tracking 
and merging.

Does this mean if I have, for example, a large C++ file with a bunch of 
methods in it and I move one of the methods from the bottom of the file 
to the top and in another branch someone makes a change to that method 
that when I merge their changes git will merge their changes into the 
method at the top of the file where I have moved it?

If so that would be really quite impressive!

Cheers,

Nick


-

From: Johannes Schindelin
Date: Thursday, November 30, 2006 - 5:47 am

Hi,


As for now, no, it does not. This is a shortcoming of RCS merge which does 
the heavy-lifting.

Having said that, stay tuned for new developments: the functionality of 
merge is being integrated in git. This opens the door to make use of the 
code tracking support in git, to do exactly what you just proposed.

Ciao,
Dscho

-

From: Linus Torvalds
Date: Thursday, November 30, 2006 - 9:45 am

Right now (and in the near future), nope. "git blame" will track the 
changes (so the pure movement wasn't just an addition of new code, but 
you'll see it track it all the way down to the original), but "git merge" 
is still file-based.

In other words, "git merge" does uses a data similarity analysis that 
could be used for smaller chunks than a whole file, but at least for now 
it does it on a file granularity only (and then passes it off to the 
standard RCS three-way merge on a file-by-file basis).

That said, if the movement happens _within_ a file, then just about any 
SCM could do what you ask for, by just using something smarter than the 
standard 3-way merge. So that part isn't even about tracking data across 
files - it's just about a per-file merge strategy.

The "track data, not files" thing becomes more interesting when you factor 
out a file into two or more files, and can continue to merge across such a 
code re-filing event. Git can do it for "annotate", but doesn't do it for 

Indeed, and it's one of the potential future goals that was discussed very 
early in the git design phase. The point of _not_ doing file ID tracking 
is exactly that you can actually do better than that by just tracking the 
data.

So some day, we may do it. And not just within one file, but even between 
files. Because file renames really is just a very specific special case of 
data movement, and I don't think it's even the most common case.

That said, there are several reasons why you might not actually _ever_ 
want it in practice, and why I say "potential future goal" and "we may do 
it". I think this is going to be both a matter of not just writing the 
code (which we haven't done), but also deciding if it's really worth it.

Because merges are things where you may not want too much smarts:

 - Quite often, a failed merge that needs manual fixup may even be 
   _preferable_ to a successful merge that did the merge "technically 
   correctly", but in an unexpected ...
From: Andreas Ericsson
Date: Thursday, October 26, 2006 - 2:50 am

Yes. Because each commit contains parent revision id's, which in turn 
contain *their* parent revision id's, which in turn..., you know you 
have exactly the same revision, code, and history leading up to that 
revision. You may have other revisions on top or on other branches, but 
all commits, including merge-points and whatnot, leading to that 

Merges preserve author and commit info. You may need to create a new 
branch (a git branch, the cheap kind which is a 41-byte file) and fetch 
"his" into "yours". This will be very cheap if you both have the same 
code but not the same history, as everything but a few commit-objects 
will be shared. A more likely scenario though is this;

Bob writes a feature that doesn't work as per spec. He doesn't know why.
He asks Alice to have a look, so he communicates the commits to her by 
"please pull this branch from here", or by sending patches and telling 
Alice the branch-point revision to apply them to.
Alice creates the "bobs-bugs/nr1232" at the branch-point and fetches 
Bobs branch into that or applies the patches on top of that (in the 
fetch scenario she wouldn't need to know the branch point, since git 
would figure this out for her).
She knows this should create a revision named 00123989aaddeddad39, so if 
it doesn't, she doesn't have the same code.


I imagine this works roughly the same in bazaar, although the original 
case where tests have already been done and the testers wanted to know 

"assume" != "know", or was that just sloppy phrasing?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Matthieu Moy
Date: Wednesday, October 25, 2006 - 2:57 am

Two things differ in bzr and git, here:

* bzr doesn't do "autocommit" after a merge. So, new revisions are
  created only if you use"commit".

* bzr has two commands, "pull" and "merge". "pull" just does what the
  git people call "fast-forward", and only this (it refuses to do
  anything if the branches diverged). In particular, you never have to
  commit after a pull (well, except if you had some local, uncommited
  changes). "merge" changes your working directory, and you have to
  commit after. "merge" will never do fast-forward, it will never
  change the revision to which your working tree revfers to, and it's
  your option to commit or not after (if you see that it introduces no
  changes, you might not want to commit).

The final rule in bzr would be "you create an extra commit each time
you commit" ;-).

As a side-note, it could be interesting to have a git-like merge
command (chosing automatically between merge and pull), probably not
in the core, but as a plugin.

-- 
Matthieu
-

From: Aaron Bentley
Date: Saturday, October 21, 2006 - 1:05 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


So I'd say that revnos without the context of a location can only refer
to the current branch that the user is working on.  They don't refer to
the mainline, which typically has its own numbers that don't match the
user's.

If you're saying that bzr is "centralized" in that the user's current

Right.  You need something guaranteed to be unique.  It's the revno +
url combo that is unique.  That may not be permanent, but anyone can

No.  It would be silly for the losing side to publish a mirror of the
winning branch at the same location where they had previously published
their own branch.  So the old number + URL combination would remain valid.

If the losing faction decided to maintain their own branch after the
merge, they'd have two options

1. continue to develop against the losing "branch", without updating its
numbers from the "winning" branch.  It would be hard to tell who had won
or lost in this case.

2. create a new mirror of the "winning" branch and develop against that.
 I'm not sure what this point of this would be.

I think the most realistic thing in this scenario is that they leave the
"losing" branch exactly where it was, and develop against the "winning"

Right.  This is a difference between Bazaar and Git that's I'd
characterize as being "branch-oriented" vs "repository-oriented".  We'll

I got the impression there was also a local ordering of revisions.  Is
that wrong?

A Bazaar branch is a directory inside a repository that contains:
 - a name referencing a particular revision
 - (optional) the location of the default branch to pull/merge from
 - (optional) the location of the default branch to push to
 - (optional) the policy for GPG signing
 - (optional) an alternate committer-id to use for this branch
 - (optional) a nickname for the branch
 - other configuration options

A Bazaar branch doesn't contain any commit objects ("revisions" in
Bazaar parlance).  Those are retrieved from the ...
From: Jakub Narebski
Date: Saturday, October 21, 2006 - 1:48 pm

No, there is no such thing like local ordering of revisions.

Each revision (commit) has link to its parent(s). Branch technically
is just a reference to a particular commit object. The commit itself
gives us sub-DAG of DAG of whole history, the DAG of all parents of
said commit. Such lineage of commit pointed by branch is conceptually
a branch; i.e. branch is DAG of development (not line of development,
as there is no special meaning of first parent).

You can have (in git repository) also reflog, which records values
of branch-as-reference, or branch tip of branch-as-named-lineage.
But for example fetch and fast-forward 5 commits in history is
Erm, wasn't revno to revid mapping also part of bzr "branch"?

We store configuration per repository, not per branch, although
there is some branch specific configuration.


Workingtree:
~/


Gaah, it's even more inconvenient. Certainly more than using name

Is there a command to list all branches in bzr? Is there a command

Thats opposite to git view. In git, working area is associated with
repository (clone of repository), not branch. We copy whole repositories

Which shells? If I understand it '^' was chosen (for example as
NOT operator for specify sub-DAG instead of '!') because of no problems
for shell expansion. And considering that many git commands are/were
written in shell, one certainly would notice that.

-- 
Jakub Narebski
Poland
-

From: Edgar Toernig
Date: Saturday, October 21, 2006 - 3:52 pm

In the traditional Bourne shell ^ is an alias for the pipe symbol |.

Ciao, ET.
-

From: Aaron Bentley
Date: Saturday, October 21, 2006 - 4:39 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



It's not part of the conceptual model.  The revno-to-revid mapping is
done using the DAG.  The branch just tracks the head.

The .bzr/branch/revision-history file is from an earlier model in which
branches had a local ordering.  Nowadays, it can be treated as:
 - a reference to the head revision

The notation was that ~/repo would contain the .git directory for the

Of course if you have a copy of bzr.dev on your computer, you don't need
to type the full URL.  it's just like the 'merge ../b' above.

But how can you use the branch name of a branch that isn't on your
computer?  I suspect git requires a separate 'clone' step to get it onto



Sorry, it's been quite a long time since people complained at me for
using ^, so I don't remember.  Perhaps Edgar is right about it being the
pipe character in old shells.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFFOq+80F+nu1YWqI0RAp/KAJ9Bw1q9/nd3gUAjcX3c+24aoEifeQCYlbD0
tUZ01ra11vkQ7V3RzarXeg==
=oFIC
-----END PGP SIGNATURE-----
-

From: Carl Worth
Date: Saturday, October 21, 2006 - 5:04 pm

No. You can merge a branch from a remote repository in a single step:

	git pull http://example.com/git/repo branch-of-interest

But if you want to do something besides (or before) a merge, (for
example, just explore its history, do some diffs etc.) then you would
fetch it instead, assigning it a local branch name in the process:

	git fetch http://example.com/git/repo branch-of-interest:local-name

After which "local-name" is all one would need to use. So after a
fetch like the above, the equivalent of "bzr missing --theirs-only"
would be:

	git log ..local-name

[This shows some of the expressive power of git revision
specifications. There's no need for a separate "missing" command. It's
just one case of viewing a particular subset of the DAG. And the
specification language makes almost all interesting subsets easy. The
--mine-only specification would be "local-name.."]

And beyond what bzr missing does (I believe) it's easy to also see the
patch content of each commit with:

	git log -p ..local-name

And then if everything is happy, one could merge that branch in:

	git pull . local-name

(And, yes, it is the case that "pull" with a repository URL of "." is
how merging is done. It's bizarre to me that this is not "git merge
local-name" instead. There actually _is_ a "git merge" command that
could be used here, but it is somewhat awkward to use, (requiring both
a commit message (without the -m of git-commit(!)) and an explicit
mention of the current branch). So using it would be something like:

	git merge "merge of local-name" HEAD local-name

I've never claimed that git is completely free of its UI
warts---though there are fewer now than when I started using it.)

But, yes, the notion in git is to bring things in to the current
repository and then work with them locally. This has an advantage that
network traffic is spent only once if doing multiple operations, (say
the three steps shown above: 1) investigate commit messages, 2)
investigate patch content, ...
From: Jakub Narebski
Date: Saturday, October 21, 2006 - 5:14 pm

In git DAG is DAG od parents. There are no "child" links. So it is natural
to refer to n-th ancestor of given commit (in git <ref>~<n>, in bzr -<m>).

To have incrementing (from 1 for first revision on given branch) revision
numbers you either have to have links to "children", which automatically
means that revisions cannot be immutable to allow for branching at
arbitrary revision, or to transverse DAG here and back again (perhaps
with cache of revno-to-revid mapping to help performance).

Additionally to have incrementing revision numbers you have to remember
which part of DAG is our branch; which parent in merge to chose to follow.
Bazaar-NG decides here to distinguish first parent; to have first parent
immutable it doesn't use fast-forward and always use merge, sometimes

The default layout of "clothed" repository is

 Repository:
 ~/repo/.git/

 Branches:
 ~/repo/.git/refs/heads/

 Workingtree:

No, as it was said in other messages in this thread, you can fetch
a branch (branches), even from other repository that the one you cloned
from, into given branch (branches). For git it would be
  $ git fetch <URL> <remotebranch>:<localbranch>
You probably would want to save above info in remotes file or in config.
For cg (Cogito) it would be
  $ cg branch-add <localbranch> <URL>#<remotebranch>
  $ cg fetch <localbranch>

In git you always use names like 'master', 'next', 'HEAD' (meaning current
branch) and also HEAD^, next~5 when comparing branches, viewing history,
merging branches, switching to branch etc. Not '../master'...

-- 
Jakub Narebski
Poland
-

From: Sean
Date: Saturday, October 21, 2006 - 1:53 pm

On Sat, 21 Oct 2006 16:05:18 -0400

Of course it works as long as you accept the implicit requirements of
supporting them and ignore the cases where they change out from
underneath the user.  But as soon as users want to embrace distributive
models where there isn't a central shared repo, at best revno's are
unhelpful and at worst they are counterproductive.  The proof of this
is that if revno's were sufficient bzr wouldn't need revid's.

Since the utility provided by revno's seems so minimal even in the
case where they do work, Git simply doesn't bother with them.  And
"our" experience is that Git really does work well without them.

Sean
-

From: Linus Torvalds
Date: Saturday, October 21, 2006 - 2:10 pm

Yes. This really is what it boils down to.

The _only_ time you actually use revision numbers (as opposed to 
branch-names or tag-names) is when you want a _stable_ number.

It's that simple. You never really need a revision number otherwise. In 
other situations, you do things like 

	git log --since=2.days.ago
	gitk v2.6.18..
	git diff --stat --summary ORIG_HEAD.. 

or whatever. It's clearly not "stable", but it's also clearly not a 
revision number from a UI perspective.

When you want a revision number is _exactly_ when you're moving things 
between branches, or reporting a bug to somebody else, or similar. And 
that's also _exactly_ when you want the number to be stable and meaningful 
(ie the other end should be able to rely on the number).

And if you need refer to a central repository to do that, it's clearly not 
distributed. Not needing such a central reference point is what the word 
"distributed" _means_ in computer science for chrissake!

			Linus
-

From: Jan Hudec
Date: Sunday, October 22, 2006 - 12:45 am

But it is *not* *distributed*. The definition of a distributed system
among other things require, that resource identifiers are independent on
the location of the resources. So only using the revision-ids is really

I regularly use bzr and I never used git. But I'd not hesitate a second
to pull --overwrite over the old location. Because the url has a meaning
"the base I develop against" for me and I'd want to preserve that

This is one of things I on the other hand like better on bzr than git.
Because it is really branches and not repositories that I usually care
about.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: Jakub Narebski
Date: Sunday, October 22, 2006 - 2:05 am

>>>> repository. 
From: zindar
Date: Sunday, October 22, 2006 - 2:56 am

Why not? I think it really does.  And due to the fact that merges are
merges and will show up as such, I think it's very suitable for
feature branches.

In fact, in the bzr development of bzr itself.  All commits are done
in feature branches and then merged into bzr.dev (the main "trunk" of
bzr) when they are considered stable.

Consider the following
bzr branch mainline featureA
cd featureA
hack hack; bzr commit -m 'f1'; hack hack bzr commit -m f2; etc
No I want to merge in mainline again
bzr merge ../mainline; bzr commit -m merge
hack hack; bzr commit -m f3; hack hack bzr commit -m f4; etc

right now, I would have something line this in the branch log
-----------------------------------------------------------------
committer: Erik B
From: Jakub Narebski
Date: Sunday, October 22, 2006 - 6:23 am

I think I haven't properly explained what "feature branch" means.
"Feature branch" is short (or medium) lived branch, created for
development of one isolated feature. When feature is in stable
stage, we merge feature branch and forget about it. We are not
interested in the fact that given feature was developed on given
branch. BTW. for example in published git.git repository are
only available in the form of "digest" 'pu' (proposed updates)
branch.

I guess what you are talking about are long lived "development
branches" (like git.git 'maint', 'master', 'next' and 'pu' branches),
or perhaps long lived another user's clone of given git repository.

Git considers having clones of given repository totally equivalent,
and having fast-forward property more important than remembering
"which branch (which clone) has this commit came from" or at least
"this commit is from this (current) branch-clone".

You have graphical history viewers (bzr has it's own: bzr-gtk),
committer and author info, and reflog if enabled if you really,
Which if I remember correctly (at least by default) needs and generates

As it clarified during this long discussion, bzr "branches" are
something between git branches and one-branch [local] clones.
Can you for example create branch starting from an arbitrary revision,
not only tip of branch?

The above sequence of operations can be done in (at least) two different
ways in git.

Less used:
 $ cd /somewhere/else
 $ git clone -l -s <mainrepo>/.git featureA
 $ cd featureA
 $ hack; hack; git commit -a -m "f1"; hack; hack; git commit -a -m "f2"; etc   
 $ cd <mainrepo>
 $ git pull /somewhere/else/featureA/.git
 (this does commit and merge)

But more common used is:
 $ git branch featureA mainline
 $ git checkout featureA
 $ hack; hack; git commit -a -m "f1"; hack; hack; git commit -a -m "f2"; etc
 $ git checkout mainline
 $ git pull . featureA
The automatic merge message takes care of this, if we enable
merge.summary config option. For ...
From: Carl Worth
Date: Sunday, October 22, 2006 - 7:25 am

At Sun, 22 Oct 2006 11:56:32 +0200, "=3D?ISO-8859-1?Q?Erik_B=3DE5gfors?=3D"=

Thanks for sharing this example. I think when we look at concrete
things that the tools actually let you do, we have a better
conversation. Plus, this example highlights some very interesting
differences between the tools.

So here is a complete sequence of git commands to construct the
scenario (even the extra hacking in mainline):

	mkdir gittest; cd gittest
	git init-db
	touch mainline; git add mainline; git commit -m "Initial commit of mainlin=
e"
	git checkout -b featureA
	touch f1; git add f1; git commit -m f1
	touch f2; git add f2; git commit -m f2
	git checkout -b mainline master
	touch sd; git add sd; git commit -m "something done in mainline";
	touch se; git add se; git commit -m "something else done in mainline";
	git checkout featureA
	git pull . mainline
	touch f3; git add f3; git commit -m f3
	touch f4; git add f4; git commit -m f4

For reference, here's the same with bzr:

	mkdir bzrtest; cd bzrtest
	bzr init-repo . --trees
	bzr init mainline; cd mainline
	touch mainline; bzr add mainline; bzr commit -m "Initial commit of mainlin=
e"
	cd ..; bzr branch mainline featureA; cd featureA
	touch f1; bzr add f1; bzr commit -m f1
	touch f2; bzr add f2; bzr commit -m f2
	cd ../mainline/
	touch sd; bzr add sd; bzr commit -m "something done in mainline"
	touch se; bzr add se; bzr commit -m "something else done in mainline"
	cd ../featureA
	bzr merge ../mainline/; bzr commit -m "merge"
	touch f3; bzr add f3; bzr commit -m f3
	touch f4; bzr add f4; bzr commit -m f4

[As has recently been pointed out, the tools really are more the same

OK. So here is a difference in the tools. With git, you don't get the
indentation for the "non-mainline" commits. This is because git
doesn't recognize any branch in the DAG to be more significant than
any other. Instead, git provides a flat, and (heuristically)
time-sorted view of the commits. (It's heuristic in that git just uses
the time stamps in ...
From: Jakub Narebski
Date: Sunday, October 22, 2006 - 7:55 am

Carl Worth wrote:
> Erik B
From: zindar
Date: Sunday, October 22, 2006 - 7:48 am

Thanks for this mail, this makes me happy to see. The tools are pretty
much the same but have some different view on how to do things..


If I understand you correctly, you'll get the same thing with "bzr missing".

$ bzr missing ../mainline/
You have 1 extra revision(s):
------------------------------------------------------------
revno: 2
committer: Erik B
From: Jakub Narebski
Date: Sunday, October 22, 2006 - 8:04 am

From: Matthew D. Fuller
Date: Sunday, October 22, 2006 - 11:53 am

On Sun, Oct 22, 2006 at 07:25:41AM -0700 I heard the voice of

This throws me a little.  I'd expect it to Just Do It when it's
fast-forwarding, but if it's doing a merge, I'd prefer it to stop and
wait before creating the commit, even if there are no textual
conflicts.  I realize you can just look at it afterward and back out

Every branch has a nickname, settable with 'bzr nick' (defaulting to
whatever the directory it's in is), and that's stored as a text field
in each commit.  It's mostly cosmetic, but it's handy to see at a


From what I can gather from this, though, that means that when I merge
stuff from featureA into mainline (and keep on with other stuff in
featureA), I'll no longer be able to see those older commits from this
command.  And I'll see merged revisions from branches other than
mainline (until they themselves get merged into mainline), correct?
It sounds more like a 'bzr missing --mine-only' than looking down a

The branch: (head) and ancestor: (latest common rev) revspecs let you
refer to the respective bits of other branches, which I think would

Well, what would be the fun in that?   8-}


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: Jakub Narebski
Date: Sunday, October 22, 2006 - 12:27 pm

Or you can use --no-commit option to git pull, and commit later.
But it is true that you can always amend the commit with

If I remember correctly Linus argued against it, because branch
name is something local to repository (most common example is
"mine 'master' is yours 'origin'").

There was proposal for "note" header for notes like merge algorithm
used, or branch name, visible only in 'raw' mode, but it wasn't 

That's true. That is what history viewers are for (gitk, qgit, tig,
gitview, git-show-branch, git-browser) are for.

And there is always reflog (if you enable it, of course).
-- 
Jakub Narebski
Poland
-

From: David Lang
Date: Monday, October 23, 2006 - 9:57 am

one thing you are missing 'mainline' in this git command is not saying 
'everything that's in the 'main' published branch'. it's saying 'everything 
reachable by the tag 'mainline'

so when you branched off for your feature development you could set a tag that 
says 'branchpoint' and no matter what gets merged in mainline after that you can 
always do branchpoint..featureA and find what you've done.

that being said, mainline..featureA is also extremely useful, it tells you what 
development stuff you have done that have not yet been merged into mainline

David Lang
-

From: Linus Torvalds
Date: Monday, October 23, 2006 - 10:29 am

The thing that the bzr people don't seem to realize is that their choice 
of revision naming has serious side effects, some of them really 
technical, and limiting.

I already briought this up once, and I suspect that the bzr people simply 
DID NOT UNDERSTAND the question:

 - how do you do the git equivalent of "gitk --all"

which is just another reason why "branch-local" revision naming is simply 
stupid and has real _technical_ problems.

I really suspect that a lot of people can't see further than their own 
feet, and don't understand the subtle indirect problems that branch-local 
naming causes. 

For example, how long does it take to do an arbitrary "undo" (ie forcing a 
branch to an earlier state) in a project with tens of thousands of 
commits? That's actually a really important operation, and yes, 
performance does matter. It's something that you do a lot when you do 
things like "bisect" (which I used to approximate with BK by hand, and 
yes, re-weaving the branch history was apparently a big part of why it 
took _minutes_ to do sometimes).

Again, this is something that people don't expect to have _anything_ to do 
with revision numbering, but the fact is, it's a big part of the picture. 
If you have branch-local revision numbering, you need to renumber all 
revisions on events like this, and even if it is "just" re-creatigng the 
revno->"real ID" cache, it's actually an expensive operation exactly 
because it's going to be at least linear in history.

One of the git design requirements was that no operation should _ever_ 
need to be linear in history size, because it becomes a serious limiter of 
scalability at some point. We were seeing some of those issues with BK, 
which is why I cared.

So in git, doing things like jumping back and forth in history is O(1). 
Always (with a really low constant cost too). Of course, checking out the 
end result is then roughly O(n), but even there "n" is the size of the 
_changes_, not number of revisions or number of ...
From: Matthew D. Fuller
Date: Monday, October 23, 2006 - 3:21 pm

On Mon, Oct 23, 2006 at 10:29:53AM -0700 I heard the voice of

I for one simply DO NOT UNDERSTAND the question, because I don't know
what that is or what I'd be trying to accomplish by doing it.  The

I don't understand the thrust of this, either.  As I understand the
operation you're talking about, it doesn't have anything to do with a
branch; you'd just be whipping the working tree around to different

I agree, and I currently find a number of places bzr doesn't hit the
level of performance I think it should.  I'm not convinced, however,
that any notable proportion of that has to do with the abstract model
behind it.  And insofar as it has to do with the physical storage
model, that can easily be (and I'm confident will be, considering it's

I consider it a _technical_ sign of a way of thinking about branches I
prefer   8-}


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: David Lang
Date: Monday, October 23, 2006 - 3:28 pm

on many modern VCS systems it's O(n) on the number of changes (start from where 
you are and apply the patch to change it to rev -1, then apply the patch to 
change it to rev -2, etc)

on git it's O(1) (write the new files into place)

David Lang
-

From: Linus Torvalds
Date: Monday, October 23, 2006 - 3:44 pm

gitk (and all other logging functions) can take as its argument a set of 
arbitrary revision expressions.

That means, for example, that you can give it a list of branches and tags, 
and it will generate the combined log for all of them. "--all" is just 
shorthand for that, but it's really just a special case of the generic 
facility.

This is _invaluable_ when you want to actually look at how the branches 
are related. The whole _point_ of having branches is that they tend to 
have common state.

For example, let's say that you have a branch called "development", and a 
branch called "experimental", and a branch called "mainline". Now, 
_obviously_ all of these are related, but if you want to see how, what 
would you do?

In git, one natural thing would be, for example, to do

	gitk development experimental ^mainline

(where instead of "gitk" you can use any of the history listing 
things - gitk is just the visually more clear one) which will show you 
what exists in the branches "development" and "experimental", but it will 
_subtract_ out anything in "mainline" (which is sensible - you may want to 
see _just_ the stuff that is getting worked on - and the stuff in mainline 
is thus uninteresting).

See? When you visualize multiple branches together, HAVING PER-BRANCH 
REVISION NUMBERS IS INSANE! Yet, clearly, it's a valid and interesting 
operation to do.

An equally interesting thing to ask is: I've got two branches, show me the 
differences between them, but not the stuff in common. Again, very simple. 
In git, you'd literally just write

	gitk a...b

(where "..." is "symmetric difference"). Or, if you want to see what is in 
"a" but _not_ in "b", you'd do

	gitk b..a

(now ".." is regular set difference, and the above is really identical to 
the "a ^b" syntax).

And trust me, these are all very valid things to do, even though you're 
talking about different branches.


No. If you "undo", you'd undo the whole history too. And if you undo to a 
point ...
From: Matthew D. Fuller
Date: Monday, October 23, 2006 - 5:26 pm

On Mon, Oct 23, 2006 at 03:44:13PM -0700 I heard the voice of

I have zero problem believing that.  It seems from all accounts a
wonderful swiss-army chainsaw, and while none of that power is useful
to me personally in anything I'm VCS'ing at the moment, I'd feel awful
shiny knowing it was sitting there waiting for me.  All else being
equal, I'd think more highly of a VCS with those capabilities than one
without.

bzr-the-program doesn't have a lot of that capability, and what it
does have is rather more verbose to access.  Perhaps some attribute of
bzr-the-current-storage-model would make some bit of that
significantly more expensive than it has to be (I don't know of any,
and can't think offhand of anywhere it might hide, but that's way off
my turf).

But I don't understand how bzr-the-abstract-data-model makes such
things impossible, or even significantly different than doing so in
git.  In git, you're just chopping off one DAG where another one
intersects it (or similar operations).  To do it in bzr, you'd do...
exactly the same thing.  The revnos, or the mainline, are completely
useless in such an operation of course, but they don't hurt it; the
tool would just just ignore them like it does the SHA-1 of files in

I wouldn't be so absolutist about it, but certainly they're of
extremely limited utility if of any at all in such cases.  And yes, it
can be an interesting operation.  But what does that have to do with
using revnos in other cases?  You keep saying "having" where I would

Well, I guess in this particular case I still don't see why you'd
generally undo big hunks of a branch versus just flipping your working
tree to different versions.  But contrived examples are still
examples, and even if so, truncate()'ing a list of numbers is a
constant time operation.  And even if you had to renumber totally...
my $DEITY, I'd expect my old 200MHz PPro to renumber a hundred

Quite frankly, I just don't think you understand that I WANT to care
about first parents.  No, ...
From: David Lang
Date: Tuesday, October 24, 2006 - 8:58 am

one key difference is that with bzr you have to do this chopping by creating the 
branches at the time changes are done, with git you do this chopping after the 
fact when you are displaying the results.

As such you can chop and compare things in ways that were never contemplated by 


nobody is saying that the bzr approach is invalid for your workflow.

what people are saying is that it doesn't easily support a truely distributed 
workflow. this is a very different statement.

your workflow isn't truely distributed so you bzr's model works well for you. no 
problem, just don't claim that becouse you haven't run into any problems with 
your workflow that there are no problems with bzr with other workflows.

David Lang
-

From: Matthew D. Fuller
Date: Tuesday, October 24, 2006 - 9:34 am

On Tue, Oct 24, 2006 at 08:58:56AM -0700 I heard the voice of

HUH?  Why on earth do you think that?

To do this in a git data model, you point at 2 (or 3, or 4, or...)
revisions, anywhere in the revision-space universe.  You derive back a
DAG of the history from each of them by recursing over parent links.
You figure out where (if anywhere) those DAG's intersect.  And based
on that, you alter what and how you display; including or excluding
certain revs, changing the angles of lines or columnation of dots in a
graph, etc.

To do it in a bzr data model, you would follow *EXACTLY* the same

And it's one that carries around a lot of unstated assumptions about
what "truely distributed" means, which *I*'m certainly not
understanding, because any meaning I can apply to the term doesn't
lead me to the conclusions it does you.  Certainly, depending on your
workflow, certain parts of the UI are of lesser utility than they are
in mine, down to and including zero.  And it's probably certain that
some parts of the UI aren't up to handling various workflows, too,
including OUR workflow.  That's kinda what "in development" means...

But that's a very different statement from the claim that they CAN'T
be without changes to the conceptual model underneath.  Just because a
UI is built around maintaining the fiction of a mainline doesn't mean
the system requires it.  All you'd have to do to abandon it is write a
different log formatter that didn't show revnos and didn't nest merge
commits, and change (or add an option to) 'merge' to fast-forward if
possible.  The difference between the views on how the pieces should
fit together really IS just that fine.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: David Lang
Date: Tuesday, October 24, 2006 - 11:03 am

it sounded like you were saying that the way to get the slices of the DAG was to 
use branches in bzr. to do this you need to create the branches with the correct 
info on each branch. this is only practical if the branches are created as the 
changes are made, if you try to do this after the fact you need to create the 
changes in the branch before you do the slicing.

with git you can look at the DAG and pick any arbatrary points in it as points 

the claim isn't that bzr can't be modified to support these other workflows (it 
sounds as if just changing to tools to use the internal refid's rather then the 
current refno's would come very close to solving this problem), it's that the 
current refno's (use of which is strongly encouraged by the current UI) cannot 
support some workflows, and therefor the claim that it supports fully 
distributed workflows as well as git is false

remember that this entire thing started with a feature comparison checklist, 
the definitions of some of the items on the checklist is being questioned.

after that there's the issue of if the VCS in question has the feature.

this discussion started with two topologies

1. Centralized: all commits must go to one repository, connectivity required to check-in 
2. Distributed: everything else

since then one additional topology has been defined, and one has been redefined

1. Centralized: all commits must go to one repository, connectivity required to check-in

2. Star: one repository is 'special' or 'primary' and all other repositories 
sync to this, but development can take place against local repositories, 
connectivity is only requred when syncing the repositories. as updates take 
place the history is defined by the primary repository, and can overwrite or 
change the history as defined by local repositories.

3. Distributed: all repositories are equal (any definition of 'primary' is a 
matter of convention, not a requirement of the tool) development can take place 
against local ...
From: Matthew D. Fuller
Date: Tuesday, October 24, 2006 - 5:27 pm

On Tue, Oct 24, 2006 at 11:03:20AM -0700 I heard the voice of

I'm not entirely sure I understand what you mean here, but I think
you're saying "Nobody's written the code in bzr to show arbitrary

I think this statement arouses so much grumbling because (a) bzr does
support such a lot better than often seems implied, (b) where it
doesn't, the changes needed to do so are relatively minor (often
merely cosmetic), and (c) disagreement over whether some of the

I think there's a real intent for bzr TO support at least all common
topologies.  I'll buy that current development has focused more on
[relatively] simple topologies than the more wildly complex ones.  I
look forward to more addressing of the less common cases as the tool
matures, and I think a lot of this thread will be good material to
work with as that happens.  It's just the suggestion that providing
fruit for simple topologies _necessarily_ prejudices against complex

That's a good enough reason for me.  Before this thread, I wasn't
interested in using git.  I'm still not, but now I understand much
better /why/ I'm not.  And when (I'm sure it'll happen sooner or
later) some project I follow picks up using git, I'll have enough
grounding in the tool's mental model to work with it when I have to.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: David Lang
Date: Wednesday, October 25, 2006 - 3:40 pm

I think we are talking past each other here.

what I think was said was

G 'one feature of git is that you can view arbatrary slices trivially'

B 'bzr can do this too, you just use branches to define the slices'

G 'but this limits you becouse branches are defined as code is developed, git 
lets you define slices at viewing time'

by the way, I think it's more then just saying 'well, the code could be written 
to do this in $VCS' some decisions and standard ways of doing things can impact 
how hard it is to implement a feature, and some decisions can make it 

one concern that the git people are voicing is that the things that work for 
simple topologies (revno's) can't be used with the more complex ones (where you 
need the refid's). especially the fact that users need to do things 
significantly different when there are fairly subtle changes to the topology.

the scenerio that came up elsewhere today where you have

    Master
    /    \
dev1   dev2

and then dev1 and dev2 both start working on the same thing (without knowing 
it), then discover they are working on the same thing. they now have threeB 
options

1. merge their stuff up to the master so that they can both pull it down.
   but this puts broken, experimental stuff up in the master

2. declare one of the dev trees to be the master

this changes the topology to

Master--dev1--dev2

3. pull from each other frequently to keep in sync.

this changes the topology to

    Master
    /   \
dev1--dev2

if they do this with bzr then the revno's break, they each get extra commits 
showing up (so they can never show the same history).

in git this is a non-issue, they can pull back and forth and the only new 
history to show up will be changes.

this is the situation that the kernel developers are in frequently. it sounds as 
if you haven't needed to do this yet, so you haven't encountered the problems.

David Lang

-

From: Matthew D. Fuller
Date: Wednesday, October 25, 2006 - 4:53 pm

On Wed, Oct 25, 2006 at 03:40:00PM -0700 I heard the voice of

Ah.  This is more like "bzr [mostly] only does this now in terms of a
single branch (or some point back along it)".  The slices that go
between branches are very limited ('missing' gives you one view;
'branch:' and 'ancestor:' revision specifications give you another).
bzrk/'visualize' gives an interface similar to gitk, but also only in
the context of a single branch/head looking backward through its
previous tree AFAIK.  Any random DAG-slicing of what you have in the
revision store can be done, somebody would just have to write the code
for it.  Nothing about 'the workflow preserves parents' would make
that any harder than writing the code for git was.

Much of this is probably a result of the 'branch'-centric (rather than
'repository'-centric) view of the world; similarly to the fact that
branches are referred to by location (local ../otherbranch, or remote
http/sftp/etc) rather than by a name.  This is one of the bits of bzr


These two are either/or, not and; either they pull (in which case
their old mainline is no longer meaningful), or they merge (in which

In git, this is a non-issue because you don't get to CHOOSE which way
to work.  You always (if you can) pull and obliterate your local
mainline.  In bzr, it's only an 'issue' because you CAN choose, and
CAN maintain your local mainline.  You CAN choose, right now, to do a
git and pull back and forth and only new history show up as changed by
creating a 'bzr-pull' shell script that does a 'bzr pull || bzr merge'
(though you'd be a lot better off adding a '--fast-forward-if-you-can'
option to merge and aliasing that over).

More basically, though, I don't think that "histories become exactly
equivalent" is a necessary pass-word to enter the Hallowed City of
Truely Distributed Development.  And I certainly see no reason to
believe we'll agree on it this time any more than We (in broad) have
the last 6 times it came up in the thread.


-- 
Matthew ...
From: Andreas Ericsson
Date: Thursday, October 26, 2006 - 3:13 am

Yes they do. They can (and in this case probably will) create a 
topic-branch named "the-other-dev/featureX" and keep it solely for 
tracking the other peers changes, keeping their own topic-branch for 
their own changes, and another branch where they merge both changes in, 
or cherry-pick from each branch to get to the desired result fast. This 
works easily because in git
a) branches are as cheap as I can ever imagine an SCM making them.
b) the "slice the DAG and view anything you like from any branch you 
like any time you like and mix them however you want" approach of the 
visualizers makes it trivial for a 10-year old fledgling programmer to 
see what changes what, and where, and by whom, and why.

The "b" above was a feature I didn't know I needed until it became 
available to me. Thanks to Paul Mackerras (spelling?) for creating the 
wonderful gitk tool, and to Marco Costalba for making a faster and, imo, 

Git puts emphasis on code. Bazaar puts emphasis on developers and 
branch-structure. Depending on your preferrence, I imagine one suits 
some people better. I really, really, really don't care if my branch-tip 
gets moved because I hadn't made any changes to it while the other dev 
hacked away or if it causes a merge because we had decided to work on 
different parts of the feature. Perhaps this is a result of the insanely 
good visualizers (kudos again to Paul and Marco) that easily lets me see 
who did what when and where anyways. What I *do* care about is being 
able to easily make sure all the devs have the same code to work and 

The only issue I have with bzr's revno's and truly distributed setup is 
that, by looking at the table, it seems to claim that you have found 
some miraculous way to make revnos work without a central server. Since 
everyone agrees that they don't, this should IMO be listed as mutually 
exclusive features.

On a side-note, git has made my life easier, so I childishly want to 
defend it and see it on top of every list in the world. ...
From: zindar
Date: Thursday, October 26, 2006 - 3:45 am

Haha, I feel the same way about bzr. Some of the features that bazaar
has, such as how it preservs the leftmost parent and treats that
specially in some cases, are things that I REALLY love and don't want
to live without.

All in all, I feel that git and bazaar and both excellent products,
what will happen in the future will be interesting to see.

/Erik
-- 
google talk/jabber. zindar@gmail.com
SIP-phones: sip:erik_bagfors@gizmoproject.com
sip:17476714687@proxy01.sipphone.com
-

From: Matthew D. Fuller
Date: Thursday, October 26, 2006 - 5:12 am

On Thu, Oct 26, 2006 at 12:13:39PM +0200 I heard the voice of

Not where I was going with that section of the mail; I was looking at
just the merge vs fast-forward distinction.  In git, you don't get to
choose; in bzr you do.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: Aaron Bentley
Date: Thursday, October 26, 2006 - 6:47 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The "simple namespace" is both a URL and a revno.

And therefore, it's just as distributed and decentralized as the web.

There is very little difference between this:

http://example.com/mywebpage#5

And this:

http://example.com/mybranch 5

In fact, we've been planning to unify them into one identifier.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFQLxr0F+nu1YWqI0RAiVrAJ9rb+uylIuxqMo2VMelI3Qm6oNQOwCfeTAb
kOkp9kOkRl1YEVEP+G3y2SU=
=Zgsg
-----END PGP SIGNATURE-----
-

From: Jan Hudec
Date: Monday, October 30, 2006 - 2:46 pm

Since bzr branch is, and is ONLY, a pointer to a revision, I don't see
any design decision that would make this harder in bzr. The UI was only

The more I read this thread I actually think bzr does support
distributed topology as well as git.

The whole difference is that bzr makes a distinction between the first
and other parents of a revision, while git does not. This distinction is
done in two places:

1. The log shows the first parent and than, as indented subsection the
   ancestry of other parents until the point where the ancestries meet
   again. This actually captures a pattern people usually use. When you
   merge, you usually put in the log something along the lines:

   "merged X, which bars and fixes foo."

   when you actually merge M, which you consider a "mainline" and
   therefore not worth mentioning and X. Linus does it this way too --
   he actually posted a log message as an example, that showed exactly
   this.

2. Assigns revision aliases in this same order (except the "major"
   number for the subsection is based on the common ancestor, not on the
   merge point). They are not special thing that is generated at commit
   time; they are infered from the shape of the DAG (and cached for
   performance reasons).

And the only issue I think is, that the bzr UI and documentation pushes
forward these aliases (revnos) more than appropriate for fully
distributed case and hides the real revision names (revids) too much for

That's a deficiency of merge not telling that a merge is pointless.
Actually I think than bzr merge *should* reduce to pull in all cases:

- If the common ancestor is on the leftmost path of the other branch,
  than the existing revnos as seen on this branch will not change in any
  case, only more than one is added. I think it's safe for merge to
  reduce to pull in this case and consider it a bug in bzr that it does
  not.
- If the common ancestor is not on the leftmost path on the other
  branch, than it is because the branch was ...
From: Matthieu Moy
Date: Tuesday, October 24, 2006 - 2:51 am

There are two things to do:

* Mark the tree as corresponding to a different revision in the past.
  This is roughly "echo 'revision@id-123' > .bzr/checkout/last-revision"
  in bzr. Obviously, writting the file is O(1), but computing the
  revision identifier if you say "bzr switch -r 42" (I'm not sure
  switch accepts this BTW), you have to load the revision history.
  Indeed, bzr would load it anyway to make sure that the revision you
  switch to is in the revision history.

  In bzr, you have .bzr/branch/revision-history for each branch, which
  is a newline-separated list of revision-identifiers. In the case of
  bzr.dev, for example, this file is 112KB as of now. This is
  O(history), with "history" being the length of the path from HEAD to
  the initial commit, following the leftmost ancestor (i.e. number of
  revisions in a centralized workflow, and less than this otherwise).
  That said, the constant factor is very small. For example, on
  bzr.dev, I did "grep -n some-rev-id" (which does revid-to-revno), it
  takes 0.004 seconds (Vs 0.003 seconds to grep in /dev/null
  instead ;-) ), so you'd need many orders of magnitude before this
  becomes a limitation.

  Linus's point AIUI is that this will _never_ be a limitation of git.

* Then, do the "merge" to make your tree up to date. You can hardly do
  faster than git and its unpacked format, but this is at the cost of
  disk space. But as you say, in almost any modern VCS, that's
  O(diff). In a space-efficient format, that's just the tradeoff you
  make between full copies of a file and delta-compression.

-- 
Matthieu
-

From: James Henstridge
Date: Thursday, October 19, 2006 - 7:53 pm

With this sort of setup, I would publish my branches in a directory
tree like this:

    /repo
        /branch1
        /branch2

I make "/repo" a Bazaar repository so that it stores the revision data
for all branches contained in the directory (the tree contents,
revision meta data, etc).

The "/repo/branch1" essentially just contains a list of mainline
revision IDs that identify the branch.  This could probably be just
store the head revision ID, but there are some optimisations that make
use of the linear history here.

If the ancestry of "/repo/branch2" is a subset of branch1 (as it might
be if the in the case of forked then merged projects), then all its
revision data will already be in the repository when branch1 was
imported.  The only cost of keeping the branch around (and publishing
it) is the list of revision IDs in its mainline history.

For similar reasons, the cost of publishing 20 related Bazaar branches
on my web server is generally not 20 times the cost of publishing a
single branch.

I understand that you get similar benefits by a GIT repository with

With the repository structure mentioned above, the cost of publishing
multiple branches is quite low.  If I continue to work on the project,
then there is no particular bandwidth or disk space reasons for me to
cut off access to my old branches.

For similar reasons, it doesn't cost me much to mirror other people's

If you need that level of stability then you want the revision
identifier in both the GIT and Bazaar cases.

As for simplicity, note that Bazaar doesn't extract any special
meaning from the "$email-$date-$random" format of the revision
identifiers.  The only property it cares about is that they are
globally unique.  For example, revision identifiers generated by the
Arch -> Bazaar importer have a different format and are handled the

That is correct.  The revision numbers assigned to particular
revisions in the context of one branch won't necessarily be the same

I can't say anything ...
From: Jakub Narebski
Date: Friday, October 20, 2006 - 2:51 am

And here we have a feature which is as far as I see unique to git,
namely to have persistent branches with _separate namespace_. It means
that we can have hierarchical branch names (including names like
"remotes/<remotename>/<branch of remote>", or "jc/diff"), and we don't
have to guess where repository name ends and branch name begins.

The idea of "branches (and tags) as directories" was if I understand
it correctly introduced by Subversion, and from what can be seen from
troubles with git-svn (stemming from the fact that division between
project name and branch name is the matter of _convention_) at least

You can get similar benefits by a GIT repository with shared object
database using alternates mechanism. And that is usually preferred
over storing unrelated branches, i.e. branches pointing to disconnected
DAG (separate trees in BK terminology) of revision, if that you mean by
multiple head revisions (because in GIT there is no notion of "mainline"

But the revision number in this case _changes_. It is from 7 to
branch:7 but still it changes somewhat.


Emphasisis on _potential_. SHA1 id abbreviated to 6 characters might
be not unique in larger project, but for example the chance that
SHA1 id abbreviated to 7 or 8 characters is not unique is really low.


Yet another analogy:

SHA1 identifiers of commits (and not only commits) can be compared
to Message-Ids of Usenet messages, while revision numbers can be compared
to Xref number of Usenet message which if I understand correctly is unique
only for given news server. But Message-Ids cannot be shortened
meaningfully like SHA1 ids can; newertheless they are used in communication
without any problems. Even if namespace is not simple ;-)

-- 
Jakub Narebski
Poland
-

From: James Henstridge
Date: Friday, October 20, 2006 - 3:42 am

With the above layout, I would just type:
    bzr branch http://server/repo/branch1

This command behaves identically whether the repository data is in
/repo or in /repo/branch1.  Someone pulling from the branch doesn't
have to care what the repository structure is.  Having a separate
namespace for branch names only really makes sense if the user needs
to care about it.

As for heirarchical names, there is nothing stopping you from using
deaper directory structures with Bazaar too.  Bazaar just checks each

I think you are a bit confused about how Bazaar works here.  A Bazaar
repository is a store of trees and revision metadata.  A Bazaar branch
is just a pointer to a head revision in the repository.  As you can
probably guess, the data for the branch is a lot smaller than the data
for the repository.

You can store the repository and branch in the same directory to get a
standalone branch.  The layout I described above has a repository in a
parent directory, shared by multiple branches.

If you are comparing Subversion and Bazaar, a Bazaar branch shares
more properties with a full Subversion repository rather than a

I may have got the git terminology wrong. I was trying to draw
parallels between the .git/refs/... files in a git repository and the
way multiple branches can be stored in a Bazaar repository.

I am not claiming that you'll get bandwidth or disk space benefits for
storing unrelated branches in a single Bazaar repository.  But if the
branches are related, then there will be space savings (which is what

A revision number is only has meaning in the context of a branch.  If
I mirror a branch, the revision numbers in the context of each will
refer to the same revision IDs.


My point was that by shortening the IDs with GIT, you are trading
global uniqueness (i.e. the identifier may clash with one found in a
different context) for the convenience of shorter identifiers.

Provided you know that the tradeoff is being made, it isn't generally
much of a ...
From: Jakub Narebski
Date: Friday, October 20, 2006 - 6:17 am

With Cogito (you can think of it either as alternate Git UI, or as SCM
built on top of Git) you would use

   $ cg clone http://server/repo#branch

for example

   $ cg clone git://git.kernel.org/pub/scm/git/git.git#next

to clone _single_ branch (in bzr terminology, "heavy checkout" of branch).
But you can also clone _whole_ repository, _all_ published branches with

   $ cg clone git://git.kernel.org/pub/scm/git/git.git

With core Git it is the same, but we don't have the above shortcut
for checking only one branch; branches to checkout are in separate
arguments to git-clone.

In bzr it seems that you cannot distinguish (at least not only
from URL) where repository ends and branch begins.

*Sidenote:* In current version of gitweb you can get file
in given repository in given branch using the following
notation:

   http://path/to/gitweb.cgi/repo/sitory/branch/name:file/name

gitweb can detect where branch name ends and repository name
begins; usually (by convention) "bare" git repositories uses
<project>.git name, "clothed" git repositories uses
<project>/.git



Oh, that explained yet another difference between Bazaar-NG (and other
SCM which uses similar model) and Git.

In Git branch is just a pointer to head (top) commit (hence they are stored
under .git/refs/heads/) in given line of development. Git also stores
information (in .git/HEAD) about which branch we are currently on, which
means on which branch git puts new commits. Nothing more (well, there
can be log of changes to head in .git/logs/refs/heads/ but that is optional
and purely local information). In Bazaar-NG you have to store (if I
understand it correctly) mapping from revnos to revisions.
 
By default (it means for example default behavior of git-clone, if we don't
use --bare option) git repository is _embedded_ in working area. We have

   .git/
   .git/HEAD
   ...
   .git/refs/heads/
   ...
   <working area files, e.g.>

So repo/branch wouldn't work, because 'branch' would conflict ...
From: Petr Baudis
Date: Friday, October 20, 2006 - 6:36 am

Dear diary, on Fri, Oct 20, 2006 at 03:17:26PM CEST, I got a letter

Nope, cg clone will in this case clone the master branch (or whatever
the remote HEAD points at). cg clone -a is planned but not implemented

You don't need to, you can switch your working tree between various
branches.  I think Linus said he does that (or was it Junio?), and I do that
as well, as well as many others.

A good question would be "when to create another branch and when to
clone the repository". And I don't think there's any good answer, except
"when you are comfortable with it". :-) Both approaches have pros/cons.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 7:12 am

That's probably because Cogito still uses obsolete branches/


$ git clone git://git.kernel.org/pub/scm/git/git.git

clones _whole_ repository, all the branches and tags, and saves information

I should have said: bring working area to state given by some revision
(instead of "populate working area").

-- 
Jakub Narebski
Poland
-

From: James Henstridge
Date: Friday, October 20, 2006 - 7:59 am

My understanding of git is that this would be equivalent to the "bzr
branch" command.  A checkout (heavy or lightweight) has the property

I suppose that'd be useful if you want a copy of all the branches at


Two points:
(1) if we are publishing branches, we wouldn't include working trees
-- they are not needed to pull or merge from such a branch.
(2) if we did have working trees, they'd be rooted at /repo/branch1
and /repo/branch2 -- not at /repo (since /repo is not a branch).

In case (2) there is a potential for conflicts if you nest branches,
but people don't generally trigger this problem with the way they use

That is fairly similar to the default mode of operation with Bazaar:
you have a repository, branch and working tree all rooted in the same
directory.  If you have separated working trees and branches, then


The layout of a standalone branch would be:
  .bzr/repository/ -- storage of trees and metadata
  .bzr/branch/ -- branch metadagta (e.g. pointer to the head revision)
  .bzr/checkout/ -- working tree book-keeping files
  source code

If we use a shared repository, the contained branches would lack the
.bzr/repository/ directory.  The parent directory would instead have a
.bzr/repository/, but usually wouldn't have .bzr/branch/ (unless there
is a branch rooted at the base of the repository).

if we are publishing a branch to a web server, we'd skip the working
tree, so the source code and .bzr/checkout/ directory would be
missing.

In the case of a checkout, the .bzr/branch/ directory has a special
format and acts as a pointer to the original branch.  If the checkout
is lightweight, the .bzr/repository/ directory would be missing, and

Okay.  So using Bazaar terminology, this seems to be an issue of the
working tree being associated with the repository rather than the
branch?



Well, a branch can easily have multiple URLs even if there is only one
copy of it.  I might write to it via local file access or sftp (which
would be a file: or sftp: ...
From: Jakub Narebski
Date: Friday, October 20, 2006 - 3:50 pm

Not exactly (my mistake in explaining it). "cg clone git://host/repo@branch"
clones only part of history DAG of commits reachable from given branch.
Still it is full repository. You can add branches to it later with

That is _very_ useful. And that is default option for Git. For
example with git.git repository I'm interested both in 'master'
branch (main line of development), and in 'next' branch (development
branch). For example I send some patches, based on 'master', they
get accepted but in 'next' (to cook for a while for example), and
I want to do further work in this direction I have to base my
new work on 'next' branch.

It looks like the Bazaar-NG "branches" are equivalent of the
one-branch-clone of Git.

And if there is no command to clone whole repository, how
you do public repository?

See below.


Same with Git. Public repositories are usually "bare" clones, i.e.
without working directory. We can clone/fetch from "clothed" repo


There is no problem in Git to have git repository nested within
working area: of course you better ignore .git directory; you can
ignore files in this embedded repository or not.


The layout of git repository (git clone, as it is equivalent of bzr branch)
you have the following layout:
  .git/objects/ -- repository objects database
  .git/refs/ -- heads (branches) and tags
  .git/index -- staging area for commit (adding files, merge resolving)
  .git/HEAD -- which branch is current branch

The equivalent of shared repository would be having .git/objects/
to be symlink to some directory which would serve as common area
to store object database.

You can use alternates file: .git/objects/info/alternates can have
list of absolute pathnames (one per line) where objects can be found
instead. If I understand correctly new objects gets commited to current
repository object database, therefore to have equivalent of symlinking
.git/objects directory you would have for every repository which you
want to share object database to have ...
From: Petr Baudis
Date: Friday, October 20, 2006 - 3:58 pm

Dear diary, on Sat, Oct 21, 2006 at 12:50:31AM CEST, I got a letter

It's not exactly convenient, but you can do

	xpasky@machine[0:0]~/git$ GIT_ALTERNATE_OBJECT_DIRECTORIES=../cogito/.git/objects cg-diff -r `GIT_DIR=../cogito/.git cg-object-id -c HEAD`..HEAD

I don't personally think it's worth a special UI, but there're no
boundaries for initiative... :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Carl Worth
Date: Thursday, October 19, 2006 - 10:01 am

First, I want to point out that I think we're having a delightfully
enlightening conversation here, and I'm glad for that.

Let me provide a couple of hypothetical situations to try to
demonstrate my thinking here. The first is far-fetched but perhaps
easier to understand the implications. But the second is the real,
everyday situation that is much more important.

Far-fetched
-----------
Let's imagine there's a complete fork in the bzr codebase tomorrow. We
need not suppose any acrimony, just an amiable split as two subsets of
the team start taking the code in different directions.

Now, at the time of the fork, all published revision numbers apply
equally well to either team's codebase, (obviously, since they are
identical). But as the projects diverge they each start publishing
revision numbers with respect to their own repositories in their own
bug trackers, etc. Obviously, each project has its own "mainline" so
these new revision numbers are only unique within each project and not
between the two.

Time passes...

Finally the two teams (who had remained good friends after the
breakup) find a unifying theory that will let them work on a single
tool that will meet the needs of both user bases. So they want to
merge their code together.

After the merge, there can be only one mainline, so one team or the
other will have to concede to give up the numbers they had generated
and published during the fork. That is, the numbers will not be usable
within the new, merged repository.

Everyday
--------
Now, the above scenario is just silly. It's not likely to ever happen,
so it's really not worth considering as a motivating case.

But, what does (and should) happen everyday is exactly the same. So
here's a realistic situation that is worth considering:

An individual takes the bzr codebase and starts working on it. It's
experimental stuff, so it's not pushed back into the central
repository yet. But our coder isn't a total recluse, so his friends
help him with the code ...
From: J. Bruce Fields
Date: Thursday, October 19, 2006 - 10:14 am

Note that the id's are still permanent in this case; they will never
(module some assumptions about the crypto) be reused.  So a given id
points at one and only one object, for all time; it's just that we may

So in this case you can certainly lose the launch codes.  But you have
forever granted everyone a way to determine whether a given guess at the
launch codes is correct.  (Again, assuming some stuff about SHA1).

--b.
-

From: Jeff King
Date: Friday, October 20, 2006 - 7:31 am

In what sense? Yes, you can make a guess if you have stored the SHA1
that contained the launch codes. But the point is that that particular
SHA1 is no longer part of the repository. Keeping that SHA1 is no easier
than just keeping the launch codes in the first place.

-Peff
-

From: J. Bruce Fields
Date: Friday, October 20, 2006 - 8:33 am

Well, I thought the discussion was about what meaning references have
after branches were modified or removed.  In which case the interesting
situation is one where an object is gone but someone somewhere still
holds a reference (because the SHA1 was mentioned in a bug report or an

Could be.

Anyway, the important difference between the SHA1 references and small
integers is that there's no aliasing in the former case.  Which is
important--I'd rather have a reference to nothing than a reference to
the wrong thing....

--b.
-

From: Jeff King
Date: Friday, October 20, 2006 - 8:43 am

Git tries very hard to make sure you don't have a reference to something
that doesn't exist. But yes, you could have a reference to the SHA1 in
another, non-git source, and try to guess the data from it. However,
there's a bit of a two-step procedure, since the SHA1 will likely be of
the commit. You have to guess the commit author, date, message, and
the contents of the rest of the tree to make a correct guess.

In practice I think most "launch code" scenarios are less about
guessable confidentiality, and more about ceasing to publish things you
shouldn't be (like copyright or patent encumbered code).

-Peff
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 8:25 am

bzr seems to use the classic UUID format, and it's funny how much it looks 
like a real BK ChangeSet revision number ("key").

Here's the quoted bzr "true" revision ID:

	Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d

and here's a BK "ChangeSet Key":

	adi@zaphod.bitmover.com|ChangeSet|20031031183805|57296

(I don't have BK installed anywhere, so I had to google for changeset 
keys, and this was just some random key in the BK bugzilla ;)

Looks very similar, don't they? And yes, the true revision ID is stable 
over time (at least it was in BK, and I assume it is in bzr too).

The biggest difference seems to be that in bzr, the final checksum is 
64-bit, while for BK, it was just a 16-bit checksum/unique number (the 
rest is just user-name/machine-name and date: I assume that the bzr commit 
was done at 10/17/2006 3:20:29PM, and the example BK ChangeSet was created 
10/31/2003 6:38:50PM - it looks like _exactly_ the same date format).

With BK, you can also use a "md5 key", and I don't actually know how they 
work. They may just be the md5 hash of the ChangeSet key, I think that may 
be how those things are indexed. So in bkcvs, you'll see a line like this:

	BKrev: 42516681VmgTWL0bkLcltPGiI6Yk5Q

which is the BK md5 key for my last kernel revision in BK (2.6.12-rc2). 
Again, these numbers are stable, unlike the simple revisions.

Note that from a usability standpoint, the UUID's look more readable to a 
human, but are actually much worse than the md5 keys (or the SHA1's that 
git uses). At least with a hash, the first few digits are likely to be 
unique, so you can do things like auto-completion (or just short names). 
With the email+date+random number kind of UUID, you don't have that.

(Pure hashes obviously also tend to just all have the same length, and are 
easier to parse automatically, so from a programmatic standpoint they are 
a lot easier too - but the surprising thing is how they are actually 
easier on humans too, even if the UUID's look more ...
From: Matthew D. Fuller
Date: Thursday, October 19, 2006 - 9:13 am

On Thu, Oct 19, 2006 at 08:25:26AM -0700 I heard the voice of

Actually, as best I know, it's not a checksum, just random bits (a

This I agree with, at least in part.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 9:49 am

Ahh. They may be that even in BK. I know BK had various 16-bit CRC 
checksums, but they were probably on the actual _file_ contents, not in 
the key itself.

		Linus
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 11:30 am

Btw, I do believe that bzr seems to be acting a lot like BK, at least when 
it comes to versioning. I suspect that is not entirely random either, and 
I suspect it's been a conscious effort to some degree.

Which is fine, in the sense that there are certainly much worse things to 
try to copy.

That said, at least BK was up-front about the versions changing, and 
didn't try to do anything to hinder it. It still confused some people, and 
it wasn't a great naming system, but it did work.

In the big picture, the version naming between BK and git hasn't been an 
issue for anybody in practice, I suspect.

So if you want to look at features that actually matter more, try out 
something like

	gitk drivers/scsi include/scsi

on the kernel archive (I assume that somebody has tried importing the 
kernel git tree into bzr - quite frankly, if bzr cannot handle that size 
tree without problems, you have much bigger issues!).

In other words, being able to look at history of more than a single file 
has been a _huge_ bonus. 

The other big difference is being able to do merges in seconds. The 
biggest cost of doing a big merge these days seems to literally be 
generating the diffstat of the changes at the end (which is purely a UI 
issue, but one that I find so important that I'll happily take the extra 
few seconds for that, even if it sometimes effectively doubles the 
overhead).

Looking at the dates of the merges yesterday, they're literally half a 
minute apart, and that's not me _scripting_ them - that's me actually 
looking up the emails, typing in the "git pull " and pasting the source 
repository, and git fetching the data over the network and merging it, and 
checking out the result (and me verifying that the resulting diffstat 
matches what the email says). Doing four of those in a row in less than 
two minutes is actually a really big deal.

At some point, "performance" is just more than a question of how fast 
things are, it becomes a big part of ...
From: Matthieu Moy
Date: Thursday, October 19, 2006 - 11:54 am

By curiosity, how would you compare git and Bitkeeper, on a purely
technical basis? (not asking for a detailed comparison, but an "X is
globaly/much/terribly/not better than Y" kind of statement ;-) )

-- 
Matthieu
-

From: Linus Torvalds
Date: Thursday, October 19, 2006 - 1:47 pm

I think git is better for kernel work these days, but a large portion of 
that is that a lot of the features have literally been tweaked for us (for 
very obvious reasons).

For example, the whole "rebase" thing (or explicitly making cherry-picking 
easy) is something that a number of kernel people do, and even if I have 
to admit to not liking the practice very much (it kind of hides the "true" 
development history), it does have huge advantages, and it makes history a 
lot easier to read.

Similarly, I often used the single-file graphical history viewing in BK 
("revtool"), but being able to follow the history of multiple files as one 
"entity" really is something that once you get used to, it's really really 
hard going back, and "gitk" does generate a much more readable graph.

And I think the git way of doing branches is just simply superior. Git 
always did branches in the sense that the way merges happened you _always_ 
had several heads, but actually making them available and switching 
between them was something that wasn't my idea, and that I even was a bit 
apprehensive about. I was wrong. Git branches are branches done right. I 
just don't see how you _could_ do them better.

That said, a lot of the features I like and _I_ consider really important 
are possibly not that important to others. For example, maybe nobody else 
really cares about viewing the history of a particular subsystem, the way 
I do. For a lot of people, single-file is probably ok. 

For example, while git now does "annotate" (or "blame"), it's not 
lightning fast, and I simply don't care. Doing a

	git blame kernel/sched.c

takes about three seconds for me, and that's on a pretty good machine (and 
on the kernel tree, which for me is always in the cache ;). Quite frankly, 
if I cared deeply about that kind of annotation, I'd probably be upset 
about it. There are basically _no_ other git operations that take that 
long. I can get the _full_ log of the last 18 months of the kernel much ...
From: Junio C Hamano
Date: Friday, October 20, 2006 - 10:49 pm

ll.6041-6091 of that file is blamed to arch/ia64/kernel/domain.c
by pickaxe -C (attributed to commit 2.6.12-rc2) while blame says
they are brought in by commit 9c1cfa, which says "Move the ia64
domain setup code to the generic code".  I am slowly realizing
that comparing the output from blame and pickaxe might be a good
way to study the project history.



-

From: Ryan Anderson
Date: Thursday, October 19, 2006 - 4:28 pm

Having used both in a past job setting (simultaneously even),
BitKeeper was a huge win over CVS, but after a while, some of its
tools  were just very frustrating in comparison with comparable Git
interfaces, and I had actually written a terribly slow BK -> Git
converter just so I could incrementally import our BK tree, then use
Git's history-viewing because it was so much more pleasant to work
with.

For small projects (~5 people), they weren't hugely different, but Git
just felt more comfortable after a while.  (It was actually possible
to do a commit from the command line in a single command, without
getting annoyed by the interface, for a trivial example.)
-

From: Junio C Hamano
Date: Thursday, October 19, 2006 - 12:16 pm

An interesting effect on this is when people have a column for
merge performance in a SCM comparison table, they would include
time to run the diffstat as part of the time spent for merging
when they fill in the number for git, but not for any other SCM.

I know you won't misunderstand me but for the sake of others, I
should add this: I am not saying diffstat should be optional.

-

From: Jan Hudec
Date: Wednesday, October 18, 2006 - 10:33 pm

The point here is, that because of using the bot, the revnos on bzr.dev
are indeed stable (and many of the merges are in fact pointless merges
(ie. merges of revision and it's ancestor)). But if you don't use the
bot, than doing:

bzr merge mainline
bzr push mainline

makes your revision the leftmost parent is your revison, not the one
from "mainline". The fact that bzr treats leftmost parent somewhat
specially makes people to replace the above with

bzr branch mainline
cd mainline
bzr merge feature-branch
bzr push

which is, well, more complicated (but you see it's not about main
maintainer -- anybody with write access can push).

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: zindar
Date: Thursday, October 19, 2006 - 12:02 am

I'd like to point out that the same thing has happened in bzr-land.
Back in the "pre-bot" days, only Martin did put things in "his branch"
where most people got bzr from (same as Linus' git branch), but he was
away for a few weeks and during this time, there was 3 (or 4 perhaps)
other branches, called integration branches, that was being used.
They were all maintained by different people.

Everyone learned really quickly to use them instead of Martin's
branch. When Martin came back, he just pulled/merged these branches
and everything was back to normal.

I'd say in this case, bzr was even more "without a trunk" then in the
example Linus gives above.

What seams to be one interesting thing in this discussion is that,
because people use bzr and git in slightly different ways, they think
that one or the other cannot be used in another way.

bzr's use of revision numbers, doesn't mean it hasn't got unique
revision identifiers, and I can't see any reason why it couldn't be
used in the same way as git.  Both are excellent tools, and since git
is more specialized (built to support the exact workflow used in
kernel development), it's more suited for that exact use.

bzr tries to take a broader view, for example, it does support a
centralized workflow if you want one.  Most people don't, but a few
might. Because of this, it probably fits the kernel development less
good than git.  That's fine I think! I happens to fit my workflow
better than git does :)

Regards,
Erik
-

From: Christian MICHON
Date: Thursday, October 19, 2006 - 1:49 am

close to 200 post on bzr-git war!
is this the right place (git mailing list) to discuss about future
features of bzr ?

-- 
Christian
-

From: Andreas Ericsson
Date: Thursday, October 19, 2006 - 1:58 am

Perhaps not, but the tone is friendly (mostly), the patience of the 
bazaar people seems infinite and lots of people seem to be having fun 
while at the same time learning a thing or two about a different SCM.
Best case scenario, both git and bazaar come out of the discussion as 
better tools. If there would never be any cross-pollination, git 
wouldn't have half the features it has today.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Matthieu Moy
Date: Thursday, October 19, 2006 - 2:10 am

I second this.

I'm bzr user and occasionnal developper, and I learnt a lot about git
in the discussion. I hope I also could explain well some of the
features of bzr to some git guys, it's always interesting to
understand why other people do things on a different way, or why they
do it in the same way.

-- 
Matthieu
-

From: Tim Webster
Date: Thursday, October 19, 2006 - 7:57 am

Thanks everyone for taking time to explain details.

However, I don't use SCM for code development. I use it for collaborative
documentation, white boarding and tracking configurations.
In fact in my company no one uses SCM for code development.
Everyone here uses it for collaborative documentation and white boarding.
Only I use SCM for tracking configurations.

I think of SCMs in terms of an SCM core and SCM tools.

First I want to say every SCM I know of sucks when it comes to tracking
configurations, simply because they don't record or restore file metadata,
like perms, ownership, and acl. I don't see recording or restoring
file metadata as part of the SCM core. I do however feel an SCM core needs to
have provisions for extended file inventory information. The problem
with extended file inventory information, it is fs specific. For this reason I
feel it is essential that the SCM core allow multiple sets of extended file
inventory information. The SCM tools are responsible, based on the local
config, for recording metadata and creating extended file inventory,
translating file metadata of one file system. When tracking configurations
octopus merges are surprisingly common. If a configuration changed is
not signed off by a responsible person, it can not be accepted. Doing
otherwise is simply an invitation to attackers and makes trouble shooting
far too difficult. Also configuration file in one directory will most often not
be members of the same repo. For example each file etc in directory would
members of different repos according to its associated application/pkg.

Somethings I like the SCM tools to handle. Personally I would like the
SCM tools to be platform independent. This would ensure that correct
things happening on ext3 mounted on windows.
I don't think execute bit belongs in the basic file inventory information.
Instead I would like to use this replace by a filter in the extended
file inventory
indicating what file metadata if any should be recorded or ...
From: Aaron Bentley
Date: Thursday, October 19, 2006 - 8:30 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Arch supports that kind of metadata.

I believe SVN supports recording arbitrary file properties, so it's just

Our choices have been predicated on producing the best SCM we can for
the purpose of developing software.  We find that the execute bit is
very useful for build scripts and other incidental scripts.

The other attributes didn't seem useful for software development, so

An XML diff/patch or merge will not handle ODF properly.  There's too


The bzr "webserve" plugin provides rss feeds.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFN5oB0F+nu1YWqI0RAjSoAJ9xrZtSrZpVVoz6qAf/sZnd/StsUACfenqX
6bemNgMSbhtL0JjIlvulrb4=
=bSpK
-----END PGP SIGNATURE-----
-

From: Tim Webster
Date: Thursday, October 19, 2006 - 8:14 pm

yes svn has arbitrary properties which can be manipulated.
They are not really intended for permissions, ownership, and acl.
To use the svn properties for this requires adding scm tools.
Also svn does not allow files in the same directory to live in

I have only experiment with xml diffs on odf files.
From my experience xml diffs work fine on svg files.
For more information, please refer to

yes, Multiple merge sources is handy for collaborative document editing
-

From: Aaron Bentley
Date: Thursday, October 19, 2006 - 9:05 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Agreed.  I think it's okay to require extra work to set the scm up to

It would surprise me if many SCMs that support atomic commit also

That's something I'd like for software development, too.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOEsO0F+nu1YWqI0RAo+6AJ9lzF0+O1I8rgkyCOdhsir1gjo0NQCfXEVV
EIsDmS+eR/7cHKQfmnPJRA4=
=g5jk
-----END PGP SIGNATURE-----
-

From: Jan Hudec
Date: Saturday, October 21, 2006 - 5:30 am

In fact I think svk would. You would have to switch them by setting
an environment variable, but it's probably doable. That is because
unlike other version control systems, it does not store the information
about checkout in the checkout, but in the central directory and that
can be set. I don't know git well enough to tell whether git could do
the same by setting GIT_DIR.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: Matthieu Moy
Date: Thursday, October 19, 2006 - 9:14 am

That's not a simple matter.

Tracking ownership hardly makes sense as soon as you have two
developers on the same project. What does it mean to checkout a file
belonging to user foo and group bar on a system not having such user
and group?

Just restoring the complete user/group/other rwx permission is already
a mess. In my experience (GNU Arch did this):

1) It sucks ;-). Me working with umask 022 so that my collegues can
   "cp -r" from me, working on a project with people having umask 077,
   I got some files not readable, some yes, well, a mess. *I* have set
   my umask, and *I* want my tools to obey.

2) It's a security hole. If you work with people having umask=002 (not
   indecent if your default group contains just you), you end-up with
   world-writable files in your ${HOME}.

That said, it can be interesting to have it, but disabled by default.

The 'x' bit, OTOH, is definitely useful.

-- 
Matthieu
-

From: Tim Webster
Date: Thursday, October 19, 2006 - 8:40 pm

Yes I agree it should be disabled by default. And enabled based on the
local settings.
-

From: Ramon Diaz-Uriarte
Date: Thursday, October 19, 2006 - 8:45 am

I fully agree with Andreas: I am just a bzr user (not even a bzr
developer) and when looking for a decentralized VCS I also looked at
git and a few others. I think I am learning quite a bit  about bzr,
git, and VCS in general.



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
-

From: Petr Baudis
Date: Thursday, October 19, 2006 - 4:37 am

Dear diary, on Thu, Oct 19, 2006 at 09:02:16AM CEST, I got a letter

There is perhaps no "technical" reason, but it's also what the user
interface is designed around - most probably, using UUIDs instead of
revnos would be a lot less convenient for bzr people because you
probably primarily show revnos everywhere and UUIDs only in few special
places and/or when asked specifically through a command (correct me if
I'm wrong). Also, do you support "UUID autocompletion" so that you can

I think they are in fact just as flexible (+-epsilon). Git can support
centralized workflow as well - you have some central repository
somewhere and all the developers clone it, then pull from it and push to
it in basically the same way they would use CVS. And it is perhaps
currently even more used in practice than the "single-man" workflow
nowadays, as more project are using Git.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Matthew D. Fuller
Date: Thursday, October 19, 2006 - 8:17 am

[ trim back CC a bit ]

On Thu, Oct 19, 2006 at 01:37:31PM +0200 I heard the voice of

The primary place you'd see either is in 'log'.  To show the UUID,
you'd add a "--show-ids" arg to it (and via per-user config aliasing,
you could just alias 'log' to 'log --show-ids' if you always wanted to
see them, so you wouldn't have to type it.  The output looks something
like:

revno: 1
revision-id: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: a
timestamp: Thu 2006-10-19 10:14:37 -0500
message:
  Foo

(without --show-ids, it's the same, except not showing the

With the form of bzr UUID's, that's not particularly useful, since
you're probably into the minutes/seconds of the timestamp before it
becomes unique, at which points you're close to 2/3 of the way through
the whole string.



-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
-

From: Horst H. von Brand
Date: Friday, October 20, 2006 - 6:22 am

So? It makes no sense to me to cater only to "successful projects"... most

Yes, but what matters here is the principle... if branches aren't equal, it
makes some things unnecessarily hard (i.e., forking, passing maintainership
over, ...). Sure, they aren't activities that should be actively


"Very rare" != "never". The "very rare" cases /will/ come back to bite you,
once you grow accustomed to "hasn't ever happened"


What makes a "published repository" special, as oposed to my local

Are they different among repositories, even though they came from another


OK.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239
Casilla 110-V, Valparaiso, Chile               Fax:  +56 32 2797513
-

From: Christian MICHON
Date: Friday, October 20, 2006 - 6:46 am

funny. I actually read another post from Linus, and when I
"merge" with your post (understand: bisect), the following
comes out:

- git is the fastest scm around
- git has the smallest scm footprint
- git is also aimed at small(ish) projects

my personal proof of concept on the last point is that I'm a
IC design engineer who threw away other scm in favor of git
since git-1.4.2 and regret now the years wasted on _other_
scm. But your mileage may vary.

-- 
Christian
-

From: Ryan Anderson
Date: Tuesday, October 17, 2006 - 8:25 pm

In the Git world that happens via "git tag -s", i.e, a
cryptographically strong "signoff".
(There's also the secondary convention of appending Signed-off-by: to
email-applied patches, but that's something that would translate
effectively to any other system, since it's outside the SCM.)
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 4:24 pm

Aaron Bentley wrote:

> You can pull if you don't want that. 
From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 4:50 pm

For non-git people (and maybe even git people who didn't follow some of 
the "reflog" work):

 - git does actually have "local view" support, but it is very much 
   _defined_ to be local. It does not pollute any history as seen by 
   anybody else. It's called "reflog" (where "ref" is just the git name 
   for any reference into a tree, and the "log" part is hopefully obvious)

So each git repository can have (if you enable it) a full log of all the 
changes to each branch. But it's not in the core git datastructures that 
get replicated - because the local view of how the branches have changed 
really _is_ just a local view. It's just a local log to each repository 
(actually, one per branch).

It's what allows a git person to say

	git diff "master@{5.hours.ago}"

because while "5 hours ago" is _not_ well-defined in a distributed 
environment (five hours ago for _whom_?) it's perfectly well-defined in a 
purely _local_ sense of one particular branch.

So there's no need for a fakey "merge" that isn't a real merge and that 
doesn't make sense for anybody else because it doesn't actually add any 
real knowledge about the _history_ of the tree (only about a single 
repository). If you want to see how the history of a particular repository 
has evolved, you can just look at the reflog (although admittedly, common 
tools like "gitk" don't even show it - the data is there if they would 
want to, but the most common usage is the above kind of "show me what 
happened in the last five hours in my current branch".

			Linus
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 4:35 pm

Dnia wtorek 17. pa
From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 1:15 am

Dnia wtorek 17. pa
From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 2:20 am

How that works with branching point, and with merges? For example
in the case depicted below, how you refer to commit marked by X?

          ---- time --->

    --*--*--*--*--*--*--*--*--*-- <branch>
          \            /
           \-*--X--*--/

The branch it used to be on is gone...


Besides, in git commit object has pointers (in the form of sha1 ids)
to all its parents. So <ref>^ (parent of <ref>), or <ref>^<m> (m-th
parent of <ref>), or <ref>~<n> (n-th parent in 1st-parent lineage
of <ref>) are natural, and fast. <ref>+<n> (which would add yet another
character as forbidden in branch name) would need either serial number
(per repository or per branch) to commit id database, or getting full
history and looking it up in full history.

Branches in git are remembered not by their starting points, but by

Git could do that too, by having file (files) with serial number
or branch/tag+serial number to commit id mapping. But this would
have to be local matter. And this would take some disk space, and
would seriously affect fetch performance (now git just downloads
what it doesn't have and dumps it into repository database).

BTW. what if repository is moved from one URL to another, for example
moving to different host? All "abstracted away" identifiers get

Two words: post-commit hook. You can automate action of adding tags
(especially now with packed refs, which means that we can have huge number

That is the alternate solution, but this would mean that merge would be
recorded (unless you squash it). And for published branches (like 'next'
for example) it is better solution, because rebase is in fact rewriting
history.

But rebase means that you had

                 A---B---C topic
                /
           D---E---F---G master

Rebasing 'topic' branch on top of master would mean that you would get

                         A'--B'--C' topic
                        /
           D---E---F---G master

where A', B', C' represent the same changeset as A, ...
From: Robert Collins
Date: Tuesday, October 17, 2006 - 2:40 am

In bzr 0.12 this is :
2.1.2

(assuming the first * is numbered '1'.)

These numbers are fairly stable, in particular everything's number in
the mainline will be the same number in all the branches created from it
at that point in time, but a branch that initially creates a revision or
obtains it before the mainline will have a different number until they
syncronise with the mainline via pull.

-Rob
--=20
GPG key available at: <http://www.robertcollins.net/keys.txt>.
From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 3:08 am

So basically anyone can pull/push from/to each other but only so long as 
they decide upon a common master that handles synchronizing of the 
number part of the url+number revision short-hands?

One thing that's been nagging me is how you actually find out the 
url+number where the desired revision exists. That is, after you've 
synced with master, or merged the mothership's master-branch into one of 
your experimental branches where you've done some work that went before 
mothership's master's current tip, do you have to have access to the 
mothership's repo (as in, do you have to be online) to find out the 
number part of url+number shorthand, or can you determine it solely from 
what you have on your laptop?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 3:47 am

I can't say for bzr 0.>12 which do not exist ;-)

For previous versions, it didn't have that "simple" number, and you
had to use the rev-id.

-- 
Matthieu
-

From: Robert Collins
Date: Tuesday, October 17, 2006 - 9:55 pm

Anyone can push and pull from each other - full stop. Whenever they
'pull' in bzr terms, they get fast-forward happening (if I understand
the git fast-forward behaviour correctly). After a fast-forward, the
dotted decimal revision numbers in the two branches are identical - and
they remain immutable until another fast forward occurs. Push always
fast forwards, so the public copy of ones own repository that others
pull or merge from is identical to your own. In a 'collection of
branches with no mainline' scenario, people usually have fast forward
occur from time to time, keeping the numbers consistent from the point

You can determine it locally - if you know any of the motherships
revisions locally, we can generate the dotted-revnos that the
motherships master-branch would have from the local data - and the last
merge of mothership you did will have given you that details. I dont
think we have a ui command to spit this out just yet, but it will be
trivial to whip one up.

More commonly though, like git users have 'origin' and 'master'
branches, bzr users tend to have a branch that is the 'origin' (for bzr
itself this is usually called bzr.dev), as well as N other branches for
their own work, which is probably why we haven't seen the need to have a
ui command to spit out the revnos for an arbitrary branch.

-Rob

--=20
GPG key available at: <http://www.robertcollins.net/keys.txt>.
From: Andreas Ericsson
Date: Wednesday, October 18, 2006 - 1:53 am

This is where it breaks down for me. "until another fast forward occurs" 


To me, this means bazaar isn't distributed at all and I could achieve 
much the same distributedness(?) by rsyncing an SVN repo, working 
against that and then rsyncing it back with some fancy merging. In other 
words, bazaar requires there to be one Lord of the Code, or some of the 
key features break down.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Petr Baudis
Date: Wednesday, October 18, 2006 - 4:15 am

Dear diary, on Wed, Oct 18, 2006 at 10:53:16AM CEST, I got a letter

Well as far as I understand, the Lord of the Code is whoever you pulled
from the last time.

It's just a different focus here. If I understood everything in this
thread correctly, both Git and Bazaar have persistent (SHA1, UUID) and
volatile (revspec, revision number) revision ids. The only difference is
that Git primarily presents the user with the SHA1 ids while Bazaar
primarily presents the user with a revision number (and that revspecs
change after every commit while revision numbers change only after a
merge).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 8:31 am

You mis-understand.

git doesn't have a "ui command to spit out the revnos for an arbitrary 
branch" either.

Normally, you'd just use the branch-name. Nobody ever uses the SHA1's 
directly.

What git does (and does very well) is to be _scriptable_. It was designed 
that way. I'm a UNIX guy. I think piping is very powerful. And when you 
script things, your scripts pass SHA1's around internally.

So for example, to repack a git archive, you'd normally do

	git repack -a -d

and you don't have any "UI" with SHA1 numbers. But internally, this used 
to be

	git-rev-list --all --objects |
		git-pack-objects 

where "git-rev-list" is the one that lists all object names (which are the 
SHA1 numbers), and "git-pack-objects" is the one that takes a list of 
objects and packs them. 

(These days, since our internal C libraries have become so much better, 
the object traversal is done internally to packing, so we don't actually 
use the pipe any more for repacking an archive, but that's just an 
implementation detail)

You seem to think that we use SHA1 names as _humans_. We don't. The SHA1 
names are used internally, and humans just use the branch names.

The only case you'd (as a human) use the SHA1 name is when you want to 
pass it on to another person that may have a different archive (ie you 
mail somebody a revision that is problematic). It would obviously be 
totally unworkable to say "it's the grand-parent of my current HEAD 
commit", since that's a local description. So instead, you'd say "it's 
commit 9550e59c4587f637d9aa34689e32eea460e6f50c".

So I think people (totally incorrectly) think that git users use a lot of 
SHA1 names, just because they see the git users on the kernel mailing list 
sending each others SHA1 names. But that's because you see only the case 
where you _want_ to communicate a stable revision name to another side. 
Sending a number like 1.57.8.312 to describe what commit broke would be a 
_bug_, because a person who has a differently ...
From: Jakub Narebski
Date: Wednesday, October 18, 2006 - 8:50 am

With the exception of having sometimes commit-ids in the commit messages,
for example "Fixes bug introduced by aabbcc00" (although usually you just
write "Fixes bug in some_function in some_file"), and automatically
generated 
  This reverts d119e3de13ea1493107bd57381d0ce9c9dd90976 commit.
(in addition to 'Revert "<Commit title>") for git-revert generated
commit messages.

And it is true that you usually use branchname, or branchname~n syntax.
Git even has git-name-rev to convert from sha1 to temporary, local
ref^m~n... syntax.


By the way, git has very powerfull syntax to get revisions, and
revision lists. For example "git-rev-list foo bar  ^baz" means
"list all the commits which are included in foo and bar lineage,
but not in baz", or more useful "git log origin..next".

How's that in bzr?
-- 
Jakub Narebski
Poland
-

From: Linus Torvalds
Date: Wednesday, October 18, 2006 - 9:22 am

Yes. But in both cases, that's usually because you literally ended up 
having the commit name because somebody else (which _can_ be you) searched 
for it (with something like "bisect") and gave it to you.

So even that case is really about communicating a stable name from one 
place (the "find the bug") to another (the "revert the buggy commit").

So yes, _communication_ should always happen by full SHA1's, because those 
are the only thing that always remain stable.

(The fact that "gitk" and I think "gitweb" can then turn them into 
hyperlinks in the commit message is obviously one reason we then tend to 
give them such prominent visibility - they actually end up being very 
useful later on).

In bzr, either you don't get the hyperlinks, or you need to use the 
non-simple name in the commit messages, since the simple names don't 
actually work. Either way, it's an inferior setup.

			Linus
-

From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 9:41 am

And here, by "fairly stable", you really mean "totally idiotic", don't 
you?

Guys, let's be blunt here, and just say you're wrong. The fact is, I've 
used a system that uses the same naming bzr does, and I've used it likely 
longer and with a bigger project than anybody has likely _ever_ used bzr 
for.

It sounds like bzr is doing _exactly_ what bitkeeper did. 

Those "simple" numbers are totally idiotic. And when I say "totally 
idiotic", please go back up a few sentences, and read those again. I know 
what I'm talking about. I know probably better than anybody in the bzr 
camp.

Those "simple" numbers are anything but. They may be short, most of the 
time, but when you bandy things like "-r 56" around, what you're ignoring 
is that for a _real_ project you actually get numbers like "1.517.3.57", 
which isn't really any simpler or shorter than saying "7786ce19". You 
still want to cut-and-paste it.

And the "simple" numbers have a real downside, which is that THEY CHANGE.

What happens is that somebody else started _another_ branch at revision 2, 
and did important work, and and they also had a "2.1.2" revision, and then 
they merged your work, and you merged their merge back, that "simple" 
revision number changed, didn't it? Suddenly "2.1.2" means something 
different for one of the users.

We had people in the bitkeeper world that _never_ actually understood that 
the numbers changed. The "simple" numbers were stable enough that a lot of 
people thought they were real revisions, and then they were really 
_really_ confused when a number like "1.517.3.57" suddenly went away after 
a merge, and became something else instead.

And yes, bitkeeper had a "real key" internally too. If you actually wanted 
to give a real revision, you had to give something that looked a lot like 
what the bzr internal revision numbers look like.

Of course, most users didn't even _know_ or understand those revision 
numbers, so as a result, you had tons of people who used the ...
From: Robert Collins
Date: Tuesday, October 17, 2006 - 3:27 pm

Be as blunt as you want. You're expressing an opinion, and thats fine. I
happen to think that we're right : users appear to really appreciate
this bit of the UI, and I've not yet seen any evidence of confusion
about it - though I will admit there is the possibility of that
occurring.

I think its completely ok that git and bzr have made different choices
in this regard, but I *dont* think our choice is in any regard 'totally
idiotic'.

[snip examples that are clearly predicated on how bk worked, not on how
bzr works].

-Rob
--=20
GPG key available at: <http://www.robertcollins.net/keys.txt>.
From: Sean
Date: Tuesday, October 17, 2006 - 4:18 pm

On Wed, 18 Oct 2006 08:27:58 +1000

Yeah, but it's an opinion that is based on a huge real world project with
hundreds of developers.  If Bazaar is ever used in a project of that
size it may just see the same type of issues as Bk.  As has been mentioned
elsewhere, Git users really appreciate the short forms it provides for
referencing commits, so much so that there is no reason to invent a
new (unstable) numbering system or attempt to hide the true underlying
commit identities.

Just out of curiosity is there a Bazaar repo of the Linux kernel available
somewhere?

Sean
-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 2:59 am

Yup. The new command will also automagically appear in the "git help -a" 
output. Those two functions have been available since the C wrapper was 
born, although "git help -a" was the only available output for "command 
not found" until someone introduced the more newbie-friendly list that 
pops up now adays.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Robert Collins
Date: Tuesday, October 17, 2006 - 2:37 am

Precisely how does this rebase operate in git ?=20
Does it preserve revision ids for the existing work, or do they all
change?


bzr has a graft plugin which walks one branch applying all its changes
to another preserving the users metadata but changing the uuids for
revisions.=20

-Rob

--=20
GPG key available at: <http://www.robertcollins.net/keys.txt>.
From: Sean
Date: Tuesday, October 17, 2006 - 3:01 am

On Tue, 17 Oct 2006 19:37:45 +1000

git rebase does exactly the same as you describe, including changing
the sha1 for each commit it moves.

Sean

-

From: Linus Torvalds
Date: Monday, October 16, 2006 - 4:35 pm

Hey, "simple" is in the eye of the beholder. You can always just define 
Bazaar's naming convention to be simple. 

I pretty much _guarantee_ that a "number" is not a valid way to uniquely 
name a revision in a distributed environment, though. I bet the "number" 
really only names a revision in one _single_ repository, right?

Which measn that it's actually not a "name" of the revision at all. It's 
just a local shorthand that has no meaning, and the exact same revision 
will be called something different when in somebody elses repository.

I wouldn't call that "simple". I'd call it "insane".

In contrast, in git, a revision is a revision is a revision. If you give 
the SHA1 name, it's well-defined even between different repositories, and 
you can tell somebody that "revision XYZ is when the problem started", and 
they'll know _exactly_ which revision it is, even if they don't have your 
particular repository.

Now _that_ is true simplicity. It does automatically mean that the names 
are a bit longer, but in this case, "longer" really _does_ mean "simpler".

If you want a short, human-readable name, you _tag_ it. It takes all of a 


Well, in the git world, it's really just one shared repository that has 
separate branch-namespaces, and separate working trees (aka "checkouts"). 
So yes, it probably matches what bazaar would call a checkout.

Almost nobody seems to actually use it that way in git - it's mostly more 
efficient to just have five different branches in the same working tree, 
and switch between them. When you switch between branches in git, git only 
rewrites the part of your working tree that actually changed, so switching 
is extremely efficient even with a large repo. 

So there is seldom any real need or reason to actually have multiple 

The fact is, git supports renames better than just about anybody else. It 
just does them technically differently. The fact that it happens to be the 
_right_ way, and everybody else is incompetent, is not my fault ...
From: Jakub Narebski
Date: Monday, October 16, 2006 - 4:55 pm

Unless you have branch(es) with totally different contents, like git.git

But without .git being either symlink, or .git/.gitdir "symref"-link,
you have to remember what to ser GIT_DIR to, or parameter for --git-dir
option.

I'd like to mention once again that in Git branches and tags have
totally separate namespace than repository namespace.
-- 
Jakub Narebski
Poland
-

From: Johannes Schindelin
Date: Monday, October 16, 2006 - 5:04 pm

Hi,


But I _do_ work with it! I just don't need to "checkout" it! Example:

git -p cat-file -p todo:TODO


You'd just use alternates for that.

But as Linus mentioned in another email, you mostly can use the _same_ 
working directory. If you want to work on another branch, which is not all 
that different from the current branch (say, you have a bug fix branch on 
top of an upstream branch), you just _switch_ to it. Git recognizes those 
files which are changed, and updates only these. Therefore, if you have 
something like a Makefile system to build the project, you actually save 
(compile) time as compared to the multiple-checkout scenario.

I use this system a lot, since I maintain a few bugfixes for a few 
projects until the bugfixes are applied upstream. BTW the 
multiple-branches-in-one-working-directory workflow was propagated by Jeff 
a long time ago, and it really changed my way of working. Thanks, Jeff!

Ciao,
Dscho

-

From: Linus Torvalds
Date: Monday, October 16, 2006 - 5:23 pm

Ok, if there ever was an example of a strange git command-line, that was 

Well, you can just add

	[alias]
		cat=-p cat-file -p

to your ~/.gitconfig file, and you're there.

[ For all the non-git people here: the first "-p" is shorthand for 
  "--paginate", and means that git will automatically start a pager for 
  the output. The second "-p" is shorthand for "pretty" (there's no 
  long-format command line switch for it, though), and means that git 
  cat-file will show the result in a human-readable way, regardless of 
  whether it's just a text-file, or a git directory ]

So then you can do just

	git cat todo:TODO

and you're done.

[ So for the non-git people, what that will actually _do_ is to show the 
  TODO file in the "todo" branch - regardless of whether it is checked out 
  or not, and start a pager for you. ]

I actually do this sometimes, but I've never done it for branches (and I 
do it seldom enough that I haven't added the alias). I do it for things 
like

	git cat v2.6.16:Makefile

to see what a file looked like in a certain tagged release.

People sometimes find the git command line confusing, but I have to say, 
the thing is _damn_ expressive. I've never seen anybody else do things 
like the above that git does really naturally, with not that much 
confusion really.

Even that "alias" file is quite readable, although I'd suggest writing out 
the switches in full, ie

	[alias]
		cat=--paginate cat-file -p

instead. That kind of helps explains what the alias does and avoids the 
question of why there are two "-p" switches.

			Linus
-

From: Johannes Schindelin
Date: Monday, October 16, 2006 - 5:36 pm

Hi,


Ha! I have that for a long time! Although I named it "s", since "git s 
todo:TODO" is two letters shorter...

Ciao,
Dscho

P.S.: BTW a certain person complained about ~/.gitconfig not being 
documented, but evidently the itch was not big enough for that person to 
document it himself...
-

From: Nguyen Thai Ngoc Duy
Date: Monday, October 16, 2006 - 6:17 pm

This very useful syntax (<ent>:<path>) didn't get documented
"officially" anywhere. It was actually documented in commit log
v1.4.1^0~255^2. Maybe someone should copy and paste it to git
documentation? Maybe core-tutorial.txt or git-rev-parse.txt, is there
any better place?
-- 
Duy
-

From: Christian MICHON
Date: Tuesday, October 17, 2006 - 12:26 am

_WONDERFUL_. Really :)

-- 
Christian
-

From: Linus Torvalds
Date: Monday, October 16, 2006 - 5:08 pm

Yes. I have to say, that's likely a fairly odd case, and I wouldn't be 
surprised if other VCS's don't support that mode of operation at _all_.

The fact that git branches can be independent of each other is very 

I'd strongly suggest that people who do this should actually do

	git clone -l

instead of actually playing games with symlinking .git/ itself or using 
GIT_DIR. It means that the two checkouts get separate branch namespaces, 
but that's really what you'd want most of the time. 

You _can_ share the whole branch namespace and do the symlink of .git (or 
just set GIT_DIR - but that's pretty inconvenient), and it might end up 
being "closer" to what some other VCS would do. But the natural thing to 
do with git is to just share some of the objects through local "slaving" 
of the repositories, and consider them otherwise entirely independent.

		Linus
-

From: Aaron Bentley
Date: Monday, October 16, 2006 - 9:31 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Bazaar also supports multiple unrelated branches in a repository, as
does CVS, SVN (depending how you squint), Arch, and probably Monotone.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNFy90F+nu1YWqI0RAgMeAJ99OikxXspSg+efnN6j3ySoPuOovQCfaKA6
yPCRw5Kl/V+ThnU6fsPA8TQ=
=DYAN
-----END PGP SIGNATURE-----
-

From: Luben Tuikov
Date: Monday, October 16, 2006 - 5:29 pm

It does work, very well at that.

I have a directory for each separate branch and simply use
cd(1) to change the current working directory to that branch.
So, instead of "git checkout <branch>", I do "cd ../<branch>".

One only needs to watch out when one updates the repository.
If there had been updates in those branches, then one needs
to git-reset the "branch" directory... (you know what I mean)
(For example when I come to work in the morning an sync up
 with home from my usb key...)

The script is called:
Usage: git-mkdir-of-branch <original-directory> <branch> <new-directory>
  where <branch> is the name of an existing branch in <original-directory>/.git/refs/heads

and uses simple symbolic links and some git plumbing to do the
job.  It can be found in my git trees.  I never bothered to send
it out to Junio, since it could be considered heretic. ;-)

     Luben

-

From: Aaron Bentley
Date: Monday, October 16, 2006 - 9:24 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Right.  That's why I said all revisions can be named by a URL + a
number, because it's the combination of the URL + a number that is

I agree that a revision is a revision, but I don't think that's a

When two people have copies of the same revision, it's usually because
they are each pulling from a common branch, and so the revision in that
branch can be named.  Bazaar does use unique ids internally, but it's

But tags have local meaning only, unless someone has access to your

The key thing about a checkout is that it's stored in a different
location from its repository.  This provides a few benefits:

- - you can publish a repository without publishing its working tree,
  possibly using standard mirroring tools like rsync.

- - you can have working trees on local systems while having the
  repository on a remote system.  This makes it easy to work on one
  logical branch from multiple locations, without getting out of sync.

- - you can use a checkout to maintain a local mirror of a read-only

You can operate that way in bzr too, but I find it nicer to have one
checkout for each active branch, plus a checkout of bzr.dev.  Our switch
command also rewrites only the changed part of the working tree.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNFrv0F+nu1YWqI0RAgBHAJ9XpmdvuCNDysxFhnyeCmkEG/z0ggCggMsJ
WyW6lqGMokh0k0It1KOdgtk=
=L1SR
-----END PGP SIGNATURE-----
-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 12:50 am

The revision will change between different repos though, so 
random-contributor A that doesn't have his repo publicised needs to send 
patches and can't log his exact problem revision somewhere, which makes 
it hard for random contributor B that runs into a similar problem but on 
a different project sometime later to find the offending code. I prefer 
the git way, but I'm a git user and probably biased.

That said, it shouldn't be impossible to add fixed, user-friendly 
bazaar-like revision numbers for git. We just have to reverse the
<committish>[^~]<number> syntax to also accept <committish>+<number>.

This would work marvelously with serial development but breaks horribly 
with merges unless the first (or last) commit on each new branch gets 
given a tag or some such.

Either way, I'm fairly certain both bazaar and git needs to distribute 
information to the user in need of finding the revision (which url and 
which number vs which sha). I also imagine that the bazaar users, just 
like the git users, are sufficiently apt copy-paste people to never 

Well, if two people have the same revision in git, you *know* they have 
pulled from each other, because ALL objects are immutable. The point of 

I imagine the bazaar-names with url+number only has local meaning unless 
someone has access to your repository too. One of the great benefits of 
git is that each revision is *always exactly the same* no matter in 
which repository it appears. This includes file-content, filesystem 


This I'm not so sure about. Anyone wanna fill out how shallow clones and 

Check. Well, actually, you just clone it as usual but with the --bare 

Works in git as well, but each "checkout" (actually, locally referenced 
repository clone) gets a separate branch/tag namespace.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 7:05 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


No, you don't.  They may have each pulled from a different repository.

Take revision 00aabbcc, created by Linus.  Linus has it because he
committed it.  I have it because I pulled Linus' repository.  You have
it because Andrew Morton pulled Linus' repository, and you pulled Andrew


In Bazaar, a revision id always refers to the same logical entity, but

With most SCMs that store the repository in the root of the tree,
disentangling the tree and repository requires care.  OTOH, this is just


In our terminology, if it can diverge from the original, it's a branch,
not a checkout.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNOM10F+nu1YWqI0RAvNUAJwN/QviOs+sUuN9ep4Otyrgax9SmwCfSH7t
XdxOxo7smshNlzU3qoxq6Nw=
=nxsM
-----END PGP SIGNATURE-----
-

From: Sean
Date: Tuesday, October 17, 2006 - 7:34 am

On Tue, 17 Oct 2006 10:05:41 -0400

Well his point was that they have pulled from each other directly or
indirectly.  You can safely say that rev 00aabbcc.. in _any_ repository
is the same rev.  This discussion started because of doubt expressed
by some here on the list that the "simple" numbering scheme used by
bzr can offer the same guarantee.  That is, rev 1.2.1 may be completely


Why?  Uncommitted changes shouldn't be propagated.  Once you have cloned
the repo, you can checkout your own copy of the working tree files.

Sean
-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 8:05 am

I realized it as I read it now. What I meant was that you know you have 

This I don't understand. Let's say Alice has revision-154 in her repo, 
located at alice.example.com. Let's say that commit is accessible with 
the url "alice.example.com:revision-154". Bob pulls from her repo into 
his own, which is located at bob.example.com.

Lots of questions here, so I'll split them up. Feel free to delete the 
non-applicable ones.

Will the commit in Bob's repo be accessible at 
"bob.example.com:revision-154"?

If it's not, how can you backtrack from old bugreports and find the 
error being discussed?

If it is, how does that work if Bob suddenly wants to commit things 
before Alice is done working with her changes?

Also, suppose they both push to a master-repo where Caesar has pushed 
his changes and nicked the slot for revision-154. Does the master repo 
re-organize everything and then invalidate Bob's and Alice's changes, or 
does it tell Alice and Bob that they need to update and then reorganize 
their repos before they're allowed to push?

I really can't get my head around the usefulness of revision-numbers 
hopping around which is probably why I'm having such a trouble groking 

You get the working tree files by default. Use --bare if you don't want 
them to be checked out (i.e. written to the working tree) after the 

This clears things up immensely. bazaar checkout != git checkout.
I still fail to see how a local copy you can't commit to is useful, but 
it doesn't really matter to me as I've already found a tool that does 
everything I want wrt scm needs.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 8:32 am

Another equation can help.

Revision Identity != Revision Number.

$ bzr log --show-ids
------------------------------------------------------------
revno: 1
revision-id: Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d
committer: Matthieu Moy <Matthieu.Moy@imag.fr>
branch nick: foo
timestamp: Tue 2006-10-17 17:20:29 +0200
message:
  some message


See, bzr has this unique revision identifier (not based on a hashsum).
The design choice of bzr is to hide it as much as possible from the
user interface.

Then, if I'm in the branch in which I typed this command, I can reffer
to this revision with simply

  bzr whatever -r 1

In the general case, I can access it with

  bzr whatever -r revid:Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d

(There's currently a lack in the UI to specify a remote revision-id,
but that's not a problem in the model itself)

bzr's internal use almost exclusively revision ID (ancestry
information is all about revision id), and revno are a UI layered on
top of it.

I don't have strong needs in revision control, but I actually never
encountered a case where I had to access a revision by providing its
ID. So, for people like me, revision numbers are sufficient, and they
are simple (for example, I can tell without running any command that
revision 42 is older than revision 56 in a particular branch).

-- 
Matthieu
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 12:44 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


bzr differentiates between pull and merge.  Pull is a mirroring command.
 So with pull, yes revision-154 will be accessible at
bob.example.com:revision-154.

With merge, it won't.  Bob can refer to it as "154:alice.example.com",


I don't see how this applies.  You can always commit in a branch.  If
alice and bob both commit, then they are diverged and can't pull.  If


My bzr is run from a local copy I can't commit to.  To get the latest
changes from http://bazaar-vcs.org, I can run "bzr update ~/bzr/dev".
To merge the latest changes into my branch, I can run
"bzr merge ~/bzr/dev".  It's also convenient for applying other peoples'
patches to.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNTKl0F+nu1YWqI0RAhRkAJ0d5KyRElEiFm/m5iRrTIk00RyqywCfe2IY
dhW46SYWm+FTQpN30VY5tPs=
=6SFm
-----END PGP SIGNATURE-----
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 4:28 pm

Dear diary, on Tue, Oct 17, 2006 at 09:44:37PM CEST, I got a letter

The question is, why is it useful to enforce the "no commit" rule? Git
can work exactly the same, it just doesn't _enforce_ the rule. And is
the capability of enforcing such a rule important enough to warrant its
own column in the comparison table?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 4:39 pm

> My bzr is run from a local copy I can't commit to. 
From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 5:24 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Sure.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXRU0F+nu1YWqI0RAptIAJ0btflKFEjF9a7Kt/qVZufK003DpACeK7Dc
leW4ICG1LbOC9DGrAd5ztlY=
=JGvL
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 1:30 am

Tags are propagated during clone, and during fetch/pull (getting changes
from repository). So in that sense they are global.

If you don't publish your repository, then neither tags, nor <URL>+<rev no>


In git we usually use "git clone --local" (with repository database
hardlinked) or "git clone --shared"/"git clone --reference <repository>"
(which automatically sets alternates, i.e. file pointing to alternate
repository database) for that. This way one gets his/her own refs
namespace, so two people can work on different branches simultaneously.

Alternate solution would be to symlink .git, or .git/objects (i.e.

In git you can access contents _without_ checkout/working area.
For example gitweb (one of git's web interfaces) uses only repository

Luben (IIRC) works this way.
-- 
Jakub Narebski
Poland
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 4:19 am

Bazaar can do this too. For example,
"bzr cat http://something -r some-revision" gets the content of a file
at a given revision. But that's not what Aaron was refering to.

In Bazaar, checkouts can be two things:

1) a working tree without any history information, pointing to some
   other location for the history itself (a la svn/CVS/...).
   (this is "light checkout")

2) a bound branch. It's not _very_ different from a normal branch, but
   mostly "commit" behaves differently:
   - it commits both on the local and the remote branch (equivalent to
     "commit" + "push", but in a transactional way).
   - it refuses to commit if you're out of date with the branch you're
     bound to.
   (this is "heavy checkout")

In both cases, this has the side effect that you can't commit if the
"upstream" branch is read-only. That's not fundamental, but handy.

I use it for example to have several "checkouts" of the same branch on
different machines. When I commit, bzr tells me "hey, boss, you're out
of date, why don't you update first" if I'm out of date. And if commit
succeeds, I'm sure it is already commited to the main branch. I'm sure
I won't pollute my history with merges which would only be the result
of forgetting to update.

Once more, that's not fundamental, but handy.

The more fundamental thing I suppose is that it allows people to work
in a centralized way (checkout/commit/update/...), and Bazaar was
designed to allow several different workflows, including the
centralized one.

-- 
Matthieu
-

From: Sean
Date: Tuesday, October 17, 2006 - 4:38 am

On Tue, 17 Oct 2006 13:19:08 +0200

Git can do this from a local repository, it just can't do it from
a remote repo (at least over the git native protocol).  However,
over gitweb you can grab and unpack a tarball from a remote repo.

This doesn't sound right, at least in the spirit of git.  Git really
wants to have a local commit which you may or may not push to a
remote repo at a later time.  There is no upside to forcing it all to
happen in one step, and a lot of downsides.  Gits focus is to support
distributed offline development, not requiring a remote repo to be

Again this seems really anti-git.  There is no reason for your local
branch to be marked read only just because some upstream branch is

This is exactly the same in Git.  You really only ever push upstream
when your local changes fast forward the remote, (ie. you're up to date).

While Git really isn't meant to work in a centralized way there's nothing
preventing such a work flow.  It just requires the use of some surrounding
infrastructure.

Sean
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 5:03 am

Anyway, given the price of disk space today, this only makes sense if
you have a fast access to the repository (otherwise, you consider your
local repository as a cache, and you're ready to pay the disk space
price to save your bandwidth). In this case, it's often in your

I lied in my above description ;-).

I should have said "by default" ... but you have "commit --local" if
you want to have a local commit on a bound branch (at this point, I
should remind that not all branches are "bound branches". "bzr branch"

Will, take the example of my bzr setup.

I have one repository, say, $repo.

In it, I have one branch "$repo/bzr.dev" which is an exact mirror of
http://bazaar-vcs.org's branch.

I also have branches for patches (occasional in my case) that I'll
send to upstream. Say $repo/feature1, $repo/feature2, ...

If, by mistake, I start hacking on bzr.dev itself, I'll be warned at
commit time, create a branch, and commit in this new branch. I believe
git manages this in a different way, allowing you to commit in this
branch, and creating the branch next time you pull. But you know this

Yes, but you will have to do a merge at some point, right ? While I'm
keeping a purely linear history (not that it is good in the general
case, but for "projects" on which I'm the only developper, I find it
good. For example, my ${HOME}/etc/).

But don't get me wrong, I also prefer the decentralized way in most
case. And I'm happy that bzr and git work like this by default. Just
that at least *I* have cases where a centralized approach suits me
better, and then I'm happy with that particular feature of bzr.

-- 
Matthieu
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 5:56 am

Matthieu Moy wrote:
>> This is exactly the same in Git. 
From: Sean
Date: Tuesday, October 17, 2006 - 5:57 am

On Tue, 17 Oct 2006 14:03:21 +0200

This is most likely the reason that people using Git don't clammor
more for the ability to work without a local repository.  Disk is cheap
and it just makes sense the vast majority of the time to have a complete
copy of the repository yourself.  There are a lot of powerful things
you can do once you have all that information in your repo.  Not the least
of which is performing any and all operations while flying on a plane

Well, with Git the default is to only commit locally.  Of course, you
could set your post commit hook to always push it to a remote if

Well, it's just a slight difference in perspective rather than any
big issue here.  Git treats all repositories as peers, so it would never
assume that just because one other particular repo has a branch marked
as read only that it should be marked read only locally.  It lets you
commit to it, and then push to say a third and fourth repo that are
writable as well.  In practice this doesn't really cause any

Well if you're committing changes from multiple different machines,
how is that different from having say 3 different developers committing
changes to the central repo?  How does bzr avoid a merge when you're
pushing changes from 3 separate machines? 

You mentioned that if you try to push and you're not up to date you'll
be prompted to update (ie. pull from the upstream repo).  When you do such
a pull do your local changes get rebased on top or is there a merge?   By
your comments I guess you're saying they're rebased rather than merged, and
this is how you keep a linear history.  Git can do this easily, but it's
not done by default.

Sean
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 6:44 am

The workflow is different.

If I commit broken changes on a repository shared by multiple
developers, they'll insult me, and they'll be right. While I find
nothing wrong in commiting broken changes to my ${HOME}/etc/ when

Err, the same way people have been doing for years ;-). If you don't
have local commits, "bzr update" will work in the same way as "cvs
update", it keeps your local changes, without recording history. Like
"git pull" does if you have uncommited changes I think.

-- 
Matthieu
-

From: Sean
Date: Tuesday, October 17, 2006 - 7:01 am

On Tue, 17 Oct 2006 15:44:36 +0200

Ah, okay.  Well Git can definitely manage this.  Just means you have to
rebase any local changes before pushing.  This will keep the history
linear and make sure that no merges are needed in the case you were asking
about.

So far, it sounds to me like bazaar and git are more alike than they are
different.  Each have a few commands the other doesn't but all in all
they sound very similar.  But i'm a Git fanboy so I aint switching
now ;o)

Sean
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 7:19 am

Sure. As I said before, the little add-on of checkouts is that you say
once "I don't want to do local commit here", and bzr reminds you this
each time you commit. Well, where it can make a difference is that it
does it in a transactional way, that is, you don't have that little
window between the time you pull and the time you push your next

Sure. And at least, if you want to prove that your decentralized SCM
is the best, you'd better look at features other than the ability to
commit on a local branch ;-). If you want a _real_ flamewar, better
talk about rename management or revision identity.

The thing is that most people migrated from CVS/svn, so they found
their new SCM to be incredibly better the existing. But it's generally
not _so_ much better than the other modern alternatives ;-). (and
don't forget to thank Darcs and Monotone who brought most of the good

Probably not going to switch either, but that might happen.

-- 
Matthieu
-

From: Sean
Date: Tuesday, October 17, 2006 - 8:06 am

On Tue, 17 Oct 2006 16:19:46 +0200

Yeah, it would be bad luck, but Git wouldn't actually let the push
succeed if someone had changed the upstream repo in that small window.
It would complain that your push wasn't a fast forward and ask you

Heh, true enough.  And the fact is they're all "borrowing" the
best ideas from one another.  All of a sudden the others are all
getting git-like bisect and gitk guis.  And of course Linus has
said that he got quite a bit of inspiration from Monotone
originally.

Beyond the distributed offline nature of using Git, the killer
"feature" for me is its raw speed and flexibility[1].  It's
really nice to be able to branch in under a second and try
out a line of development etc.  Maybe this is just as easy
in Bazaar but it's not true of say Mercurial.  Honestly, I
just can't imagine any other SCM meeting my needs better than
Git.  So I have a hard time taking complaints about rename
management or revision identity seriously.

While they don't affect my usage, IMHO the two biggest failings
of Git are its lack of a shallow clone and its reliance on shell
and other scripting languages so there is no native Windows version.
I'm sure both of these areas are handled better by Bazaar and/or
some of the other new SCMs where they'd be a better choice than
Git.

Sean

[1] As an aside, I don't understand why bazaar pushes the idea
of "plugins".  For instance someone mentioned that bazaar has
a bisect "plugin".  Well Git was able to add a bisect "command"
without needing a plugin architecture.. so i'm at a loss as 
to why plugins are seen as an advantage.

-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 5:25 pm

Dear diary, on Tue, Oct 17, 2006 at 02:03:21PM CEST, I got a letter

(In rich countries. This may still be very different in poorer
countries.  E.g. some actual mplayer developer(s) from Turkey opposed
transition to a distributed version control system simply because they
have trouble affording the required additional diskspace for the full
history.  SVN is already very space-hungry for them.  (It stores
basically two complete checkouts in parallel.))

But the much bigger practical problem is bandwidth, plenty of people
still have internet connections where downloading several tens/hundreds
of megabytes of the complete history is quite a big thing, and the
servers ain't gonna be happy from that either, nor those paying the
bandwidth bills. ;-) And this is one of the big problems the Mozilla
guys have - having everyone download 450M worth of the full CVS-imported
history (and I'll bet no other VCS will beat that size) seems to be not

So how is the light checkout actually implemented? Do you grab the
complete new snapshot each time the remote repository is updated? Do all
the (at least read-only, like "log" and "diff", perhaps "status")
commands work on such a light checkout?

This is something sorely missing in Git but if it's really only "we just
provide bandwidth-expensive way to keep your tree up-to-date and that's
all," that would not be hard at all to implement in Git too, using
git-archive --remote.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 5:38 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


No, the lightweight checkouts store very little.  They have
- - a copy of tree shape (filenames, paths, sha1 sums) from the last
  commit.
- - a copy of tree shape for the current working directory

Yes.  And if you check out from a read-write branch, all write commands,
work, too.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXeN0F+nu1YWqI0RAsdrAJ0bUj4swxm5sod9WnsbPZ9yIQ7FVQCdE4UB
8x0ddFkbr5cPISTihw96d8c=
=/XAr
-----END PGP SIGNATURE-----
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 5:42 pm

Dear diary, on Wed, Oct 18, 2006 at 02:38:37AM CEST, I got a letter

I see, I guess that means "the index file and tree objects for the last

Ok, one last question - do you do most of the work locally, fetching
bits of data as you need, or remotely, only taking input/producing
output over the network (the pserver model)?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 5:50 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Personally, I do not do remote commits over slow links.  At home, I use
a single machine, and mirror my repository to a public machine using
rsync.  At work, I store my repository on an NFS server, and push my
repository to a public machine using rsync.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXpu0F+nu1YWqI0RAjPTAJ4w9YOM5XLpnIP9jYywtfMr+LZLvACfdycA
/TYAGUVGweR5+cPtDVAIBq4=
=rsNR
-----END PGP SIGNATURE-----
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 5:57 pm

Dear diary, on Wed, Oct 18, 2006 at 02:50:54AM CEST, I got a letter

I meant the work of the commands (bzr log and such), not your personal
workflow. :-) Sorry for being unclear.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 6:05 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


When using the native network protocol, work can happen remotely.  (But
the native protocol is quite new, and support for "smart" operations is
currently limited.)  When using the dumb protocols, data is fetched from
the remote system and processed locally.  Light checkouts are not
recommended when the server is on a slow link, but heavyweight checkouts
are quite suitable in that situation.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNX3j0F+nu1YWqI0RAtRcAJ0fEZam6H3hs3YHY/dEYEhk3A73BQCdENHY
s9+KZTfqnDJg8mHNmC2C/Ok=
=Nqcn
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 5:48 pm

Ah. So in git terminology it stores index and working directory
(and perhaps the name of branch). 

-- 
Jakub Narebski
Poland
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 6:11 pm

Dear diary, on Tue, Oct 17, 2006 at 02:03:21PM CEST, I got a letter

In fact, in Git the branch is actually created at the moment you clone.

For simplicity sake, let's say you cloned just a single branch, not the
whole repository (or imagine a repository with a single branch). Then,
in your local repository, two branches will be created: 'origin' and
'master'. The origin branch is considered readonly (though Git does
not enforce it) and only mirrors the branch in the remote repository.
The master branch is the branch you do your work on, and it corresponds
to the contents of your working tree.

Thus, when you are "updating" your repository (we also call that
"pull"), what happens is that new commits are _fetched_ from the remote
repository to your 'origin' branch and then the 'origin' branch is
_merged_ to the 'master' branch. (You can even separate those two steps
and do them manually. So you can e.g. periodically fetch but just check
diffs with your master branch and never actually merge, or whatever.)

If you never do any local commits on the repository, every time you
merge the 'master' branch is ancestor of the 'origin' branch and only
so-called fast-forward merge happens - the 'master' branch is updated to
point at the same commit as the 'origin' branch.

If you _did_ do some local commits, a real merge of the two branches
happens and a new merge commit tying the current master and origin
history together is recorded on the merge branch.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 11:44 pm

By curiosity, what happens if you accidentally commit to it?

-- 
Matthieu
-

From: Shawn Pearce
Date: Wednesday, October 18, 2006 - 12:16 am

It will quietly accept the commit.

Later when you attempt to run `git fetch` to download any changes
from the remote repository to your local origin branch the fetch
command will fail as it won't be a strict fast-forward due to
there being changes in origin which aren't in the remote repository
being downloaded.

The user can force those changes to be thrown away with `git fetch
--force`, though they probably would want to first examine the
branch with `git log origin` to see what commits (if any) should
be saved, and either extract them to patches for reapplication or
create a holder branch via `git branch holder origin` to allow them
to later merge the holder branch (or parts thereof) after the fetch
has forced origin to match the remote repository.

So in short by default Git stops and tells the user something fishy
is going on, but the error message isn't obvious about what that
is and how they can resolve it easily.

There has been discussion about marking these branches that we
know the user fetches into as read-only, to prevent `git commit`
from actually committing to such a branch (we also have the same
case with the special bisect branch), but I don't think anyone has
stepped forward with the complete implementation of that yet.

Like anything I think people get used to the idea that those branches
are strictly for fetching and shouldn't be used for anything else.
There's really no reason to checkout a fetched into branch anyway;
temporary branches are less than 1 second away with
`git checkout -b tmp origin` (for example).

-- 
Shawn.
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 4:45 am

Git cannot do that remotely (with exception of git-tar-tree/git-archive 
which has --remote option), yet. But you can get contents of a file 
(with "git cat-file -p [<revision>:|:<stage>:]<filename>"), list 
directory (with "git ls-tree <tree-ish>") and compare files or 
directories (git diff family of commands) without need for working 
directory.
 
AFAICT working area is required _only_ to resolve conflicts during 

In git by default in the top directory of working area you have .git 
directory which contains whole repository (object database, refs (i.e. 
branches and tags), information which branch is current, index aka. 
gitcache, configuration, etc.). You can share object database locally 
(which includes network filesystem).

You can have .git (usually <project>.git then) directory without working 
area.


There was proposal to allow for tracking branches to be marked 
read-only, but it was not implemented yet.

But git has reverse check: it forbids (unless forced by user) to fetch 
into branch which has local changes (does not fast-forward). This make 
sure that no information is lost.

The idea is that you fetch changes into tracking branch (e.g. 'master' 
branch of some parent remote repository into 'origin' or 
'remotes/<repository name>/master' branch); you don't commit changes to 
such branch. You do your own work either on 'master' branch, then merge 
(typically using "git pull") corresponding 'origin' tracking branch, or 
use separate private feature branch and use rebase after fetch.


Git is designed for distributed workflows, not for centralized one.
All repositories are created equal :-)

-- 
Jakub Narebski
ShadeHawk on #git and #revctl
Poland
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 5:02 am

And you can use GIT_DIR environmental variable or --git-dir option
to git wrapper.
-- 
Jakub Narebski
Poland
-

From: Sean
Date: Tuesday, October 17, 2006 - 5:07 am

On Tue, 17 Oct 2006 13:45:31 +0200

Interesting, I didn't know about the --remote option.  So in fact as long
as the remote has enabled upload-tar then anyone can do a "light checkout".
However, it appears that kernel.org for instance doesn't enable this feature.

Sean
  
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 6:33 am

Same as bzr then I believe. "bzr pull" will suggest you to use "merge"

Note that "bound branches" and "other branches" in bzr are not so
different. The "master" (the one you make a checkout of) doesn't have
to know it has checkouts, and the "checkout" just has one file
pointing to the "master", and you can switch from one flow to the
other with "bzr bind/unbind".

So, in Bazaar, all repositories are /almost/ created equal ;-).

-- 
Matthieu
-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 5:00 am

What about

3) getting the repo with all the history while still not having to be 
online to actually commit to *your* copy of the repo. When you later get 
online, you can send all your changes in a big hunk, or let bazaar email 

It appears we have different ideas of what's handy. Perhaps it's just a 
difference in workflow, or lack of "email-commits-as-patches" tools in 
bazaar, but the ability to commit to whatever branch I like in my local 
repo and then just send the diffs by email or please-pull requests to 
upstream authors is what makes git work so well for me. I can ofcourse 
also pull the changes to another branch, or cherrypick them one by one, 
or...

OTOH, if by "commit" you mean "send your changes back to central 
server", and bazaar'ish for "register my current set of changes in the 
local clone of the repo" is called something else, it sounds very 

Centralized works in git too after a fashion. Most projects have a 
master repo hidden somewhere that frequently gets pushed out for 
publishing and which most (all?) contributors sync against from time to 
time, but it's by no means a certainty. What *is* a certainty is that 
the published branches are exactly identical to the ones in the master 
repo, and all the downstream authors will get a history where they can 
easily track master's development.

For git, I suppose Junio has the hidden master repo which he publishes 
at kernel.org. Linus does the same with the Linux repo.

On a side-note, it sounds as though the "bound branch" scenario 
encourages making a big change as one mega-diff, so long as it 
implements one feature, whereas the git workflow with topic-branches 
that eventually gets merged to master allows changes to sort of 
accumulate up to a feature in the steps one actually has to take to make 
the feature work.

Side-note 2: Three really great things that have made work a lot easier 
and more enjoyable since we changed from cvs to git and that aren't 
mentioned in the comparison table:
* ...
From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 6:27 am

Well, the discussion was about checkouts, so I was talking about
checkouts ;-).

What you mention is the default behavior of Bazaar when you use 
"bzr branch" or "bzr get". BTW, it's also possible to do this with a

You have "bzr bundle" in Bazaar, and there was work to have it
actually send the email ( http://bazaar-vcs.org/SubmitByMail ), but I
don't think it's finished yet.

And yes, this is a great feature, the first time I used it was with
Darcs, and I was impressed how easy I could submit a patch without any
setup and with a 5-lines tutorial. Even wiki seems complex after

Sure. Once again, Bazaar does it this way too. There's an _additional
feature_ called checkout which allows you to work in another way,
though. As most "feature", it's not useful to everybody.


Sure. And regarding this, hopufully, most modern VCS go in the same
direction.

> * Dependency/history graph display tools 
From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 6:55 am

> > * Dependency/history graph display tools 
From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 7:08 am

In bzr, the "bundle" appears like a patch, but it actually contain the
same information as the revision(s) it contains (I believe this
applies to hg and Darcs too). A bundle can be used almost like a
branch. That's a key point, since revision identity is not based on
content's hash, so applying a patch is very different from merging a

That's the key point, but patch review for non-accidental developpers

Bazaar's bundle use base64 encoding for binaries. I don't think that's
efficient binary diff (xdelta-like) though. Aaron has been fighting
quite a lot with MUA and MTA mixing up the patches (line ending in
particular) ...

-- 
Matthieu
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 7:41 am

The patch generated by git-format-patch has author information (in 
"From:" header), original commit date (in "Date:" header), commit 
message (first line in "Subject:", rest in message body), place for 
comments which are not to be included in commit message, diffstat for 
easier patch review, and git extended diff (with information about 
renames detection, mode changes, 7-characters wide shortcuts of file 
contents identifiers). It does not record parent information, original 
comitter and comitter date, which branch we are on etc. You can quite 
easily provide ordering of patches.

Sending patches via email prohibits first line of commit message to be 
enclosed in brackets (subject usually is "[PATCH] Commit description" 
or "[PATCH n/m] Commit description") and enforces git convention of 
commit message to consist of first line describing commit shortly, 
separated by empty line from the longer description and signoff lines.



If I remember correctly git binary diff format is xdiff based, and uses 
kind of ascii85 encoding (PostScript).

-- 
Jakub Narebski
Poland
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 5:00 pm

Dear diary, on Tue, Oct 17, 2006 at 04:41:02PM CEST, I got a letter

It should be noted that there's no user interface for sending/receiving
that and I suspect no reasonably usable user interface for creating it.

How frequently are the bundles used in practice?

It's a cultural difference, I suspect. Git comes from an environment
based on intensive exchanges of patches and patch series and an
environment not mandating developers to use any tool besides diff/patch,
so Git is very focused at good support for applying patches and there
simply has been no big conscious demand for bundles support given this.

Another aspect of this is that Git (Linus ;) is very focused on getting
the history right, nice and clean (though it does not _mandate_ it and
you can just wildly do one commit after another; it just provides tools
to easily do it). This means that the downstream maintainers have to
rebase patches, possibly reorder them, and update the changesets with
bugfixes instead of stacking the bugfixes upon them in separate changes
- then Linus merges the patches and only at that point they are "etched"
forever. This means that the history will contain neatly laid out way
of how $FEATURE was achieved, but of course also more work for
downstream maintainers.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 5:30 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Many times each day.  Most submission to the bzr mainline are done with

Yes, rebasing is very uncommon in the bzr community.  We would rather
evaluate the complete change than walk through its history.  (Bundles
only show the changes you made, not the changes you merged from the
mainline.)

In an earlier form, bundles contained a patch for every revision, and
people *hated* reading them.  So there's definitely a cultural
difference there.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNXWW0F+nu1YWqI0RAuRnAJ9aZVLo4T1sfmyGC2t364UyHX+6wACff7sM
peal5rAdk/T515RGeKXkWlo=
=O61J
-----END PGP SIGNATURE-----
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 5:39 pm

Dear diary, on Wed, Oct 18, 2006 at 02:30:14AM CEST, I got a letter

BTW, I think what describes the Git's (kernel's) stance very nicely is
what I call the Al Viro's "homework problem":

	http://lkml.org/lkml/2005/4/7/176

If I understand you right, the bzr approach is what's described as "the
dumbest kind" there? (No offense meant!)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: zindar
Date: Wednesday, October 18, 2006 - 2:28 am

Yes and no, The bundle includes both the full final thing, and each
step along the way. Each step along the way is something you'll get
when you merge it.

Once merged, it will be "next one" in the description above. It would
typically look something like this in "bzr log"(shortened)  In this
example, doing C requires doing A and B as well...

committer: foobar@foobar.com
message: merged in C
      -------
      committer: bar@bar.com
      message: opps, fix bug in A
      -------
      committer: bar@bar.com
      message: implement B
      -------
      committer: bar@bar.com
      message: implement A

So, you'll get full history, including errors made :)  You can also
see who approved it to this branch (foobar) and who did the actual
work (bar)

/Erik
-

From: Petr Baudis
Date: Wednesday, October 18, 2006 - 4:08 am

Dear diary, on Wed, Oct 18, 2006 at 11:28:32AM CEST, I got a letter

I see, that's what I've been missing, thanks. So it's the middle path
(as any other commonly used VCS for that matter, expect maybe darcs?;
patch queues and rebasing count but it's a hack, not something properly
supported by the design of Git, since at this point the development
cannot be fully distributed).

I also assume that given this is the case, the big diff does really not
serve any purpose besides human review?

But somewhere else in the thread it's been said that bundles can also
contain merges. Does that means that bundles can look like:

   1
  / \
 2   4
 |   | _
 3   5  |
  \ /   | a bundle
   6    |
       ~

In that case, against what the big diff from 6 is done? 2? 4? Or even 1?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Jakub Narebski
Date: Wednesday, October 18, 2006 - 4:17 am

From: zindar
Date: Wednesday, October 18, 2006 - 6:09 am

When you run the "bundle" command, you can tell it what you want the
bundle to be created against.  So, If I just commited 5, I can run
"bzr bundle -r-1" to get the bundle against 4, or I can do "bzr bundle
path/to/other/branch" to get a bundle that relates to it.

To merge a bundle into a branch, the parrent of the first revision in
the bundle, has to exist in the branch is't being merged into. (well,
unless you use patch, but that's outside of bzr, and bzr wouldn't know
about each revision in them)

This command will find a common root and create a bundle that
corresponds to it.  The "big diff" as you call it, would be the
changes between the point where the branch was created, and the last
commit.

In the case of just committing 5, and you want to create a bundle that
can be merged back at point 6, the "big diff" would be against 1 since
that's the branch point.

/Erik


-- 
google talk/jabber. zindar@gmail.com
SIP-phones: sip:erik_bagfors@gizmoproject.com
sip:17476714687@proxy01.sipphone.com
-

From: Jakub Narebski
Date: Tuesday, October 17, 2006 - 6:28 pm

Take for example 
 "[PATCH 0/6] ref deletion and D/F conflict avoidance with packed-refs."

Isn't it easier to review than "bundle", aka. mega-patch?
-- 
Jakub Narebski
Poland
-

From: Carl Worth
Date: Tuesday, October 17, 2006 - 6:44 pm

There are even more important reasons to prefer a series of
micro-commits over a mega-patch than just ease of merging.

In the cairo project, I've often reviewed a single patch and said:

	"This all looks like perfectly good code and I'd be happy to
	have it all in the tree. But please rebuild this as a series
	of independent patches (perhaps along the lines of a, b, c,
	...)"

I do that not just to make the history "look nice" but because code
history is something we _use_ a lot and separate commits for separate
actions just make the history so much more usable.

We have great tools like bisect to identify commits that introduce
bugs. I know that I'd be delighted to see bisect comes back pointing
at some minimal commit as causing a bug, (which would make finding the
bug so much easier).

But it's also been my experience that the largest commits are also the
most likely to be the things returned by bisect. Big commits really do
introduce bugs more frequently than small commits.

Finally, if someone had gone through the useful work to create small,
independent changes, (and likely finding and fixing bugs in the
process), what a horrible shame it would be to throw away that work
and merge it as a single patch, (welcome to the pain of CVS branch
merging).

Now, I do admit that it is often useful to take the overall view of a
patch series being submitted. This is often the case when a patch
series is in some sub-module of the code for which I don't have as
much direct involvement. In cases like that I will often do review
only of the diff between the tips of the mainline and the branch of
interest, (or if I trust the maintainer enough, perhaps just the
diffstat between the two). But I'm still very glad that what lands in
the history is the series of independent changes, and not one mega
commit.

-Carl
From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 8:27 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


A bundle isn't a mega-patch.  It contains all the source revisions.  So
when you merge or pull it, you get all the original revisions in your

Bisect should work equally well with revisions pulled or merged from a

The number of changes shown in the diff has nothing to do with the

So the difference here is that bundles preserve the original commits the
changes came from, so even though it's presented as an overview, you
still have a series of independent changes in your history.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNZ820F+nu1YWqI0RAjNyAJ90HMCAiopuAMvkKlcCEdc4F6QKLwCdGEWI
VOZThAQrvqybe5z93eC44BY=
=xBZM
-----END PGP SIGNATURE-----
-

From: Jakub Narebski
Date: Wednesday, October 18, 2006 - 2:20 am

But what patch reviewer see is a mega-patch showing the changeset
of a whole "bundle", isn't it?

I think it is much better to review series of patches commit by commit;
besides it allows to correct some inner patches before applying the whole
series or drop one of patches in series (and it happened from time to time
on git mailing list).

So if git introduces bundles, I think they would take form of series
of "patch" mails + introductory email with series description (currently
it is not saved anywhere), shortlog, diffstat and perhaps more metainfo
like bundle parent (which I think should be email form of branch really),
tags introduced etc.
-- 
Jakub Narebski
Poland
-

From: Aaron Bentley
Date: Wednesday, October 18, 2006 - 9:31 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Yes.  Carl was saying that, aside from the issue of what a reviewer
sees, a bundle is bad for other reasons.  I am saying those other
reasons don't apply.  I wasn't addressing the issue of what a reviewer sees.

To me, seeing the individual patches is like reading a book where every
page has a different word on it, and so it's hard to put it together
into a full sentence.  I'm not saying my way is The Right Way, just my
personal preference.

For larger pieces of work, we try to split them up into logical units,
and merge those units independently.

The Bundle format can also support a patch-by-patch output, but we don't

It's important to remember that bundles represent revisions, not
patches.  When you merge a bundle, you

1. install those revisions into your repository.  These revisions are
   latent, as though they were on another branch.
2. merge the head revision of the bundle into your branch.

Virtually any merge selection process that works with branches would
also work with bundles.  So tweaking before merging is really a matter

The parent in a bundle revision is the revision-id of the parent of that
revision in the branch.  I don't think it's possible to change that
parent id into something else, without changing the meaning of a bundle.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNlb40F+nu1YWqI0RAnxxAJ9ETibey1Qyvz/zVxdGipaHGtnddgCfTtzt
CQUZ2dK64BS5K5WYecFAsfM=
=bJxq
-----END PGP SIGNATURE-----
-

From: Jan Hudec
Date: Saturday, October 21, 2006 - 8:56 am

As for what the reviewer wants to see, I think it depends on what kind
of code it is. Kernel code is complex and does not have (at least I have
not heared of) unit-tests, so short patches are preferable for review.
And since C is of the more verbose languages, short patches mean
spliting them up into several pieces.

On the other hand bzr has unit-tests and python is less verbose, so the
single patch for a feature is not so big and is manageable. The patches
to bzr still come in logical steps, but usually one step per feature is
enough.

Also programmers usually don't develop even the single logical step as a
single commit. Instead they they also commit to backup their work,
when they try something they think they may in future return, when they
need to continue on another computer and so on. And these commits are
generally not logical steps. Also the steps are often not in a logical
order. Therefore showing diff for each commit in the bundle often does
not make sense.

So there is one bundle per logical step and therefore has a summary
diff. Individual bundles for individual steps are preferable anyway,
since the maintainer may decide to accept just some of them.  A tool to
generate a series of bundles (either each with just one commit or each
with several commits) would be possible, just noone was interested
enough to do it yet.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: Jakub Narebski
Date: Saturday, October 21, 2006 - 9:13 am

In git you can backup your work on temporary branch; besides there

That is why before sending patch series based on some feature branch,
you should at least rebase the branch on top of current work, to ensure
that the series would apply cleanly.

If feature branch/patch series needs cleanup (going from "answer" to
"solution" http://lkml.org/lkml/2005/4/7/176), i.e. patch (commit)
reordering, joining two patches into one, patch splitting, you can
use git-cherry-pick, git-cherry-pick --no-commit and git commit --amend
combination, or git-format-patch, patch editing and reordering, and git-am.
Or just use StGit or pg.

-- 
Jakub Narebski
Poland
-

From: Jeff Licquia
Date: Wednesday, October 18, 2006 - 11:03 am

You did.  The plugin is largely based on my experiences with the git
version, and explicitly gives credit in the comments.

-

From: Andreas Ericsson
Date: Tuesday, October 17, 2006 - 7:01 am

Differences in nomenclature is really messing this discussion up. In 
git, a "checkout" is the act of pulling objects from the object database 

Now I'm really confused. Does bazaar have both "clone" (git-style 
fetching a full repo and all the branches) and "checkout" (cvs-style 
>> * Dependency/history graph display tools 
From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 7:24 am

Yes, it has both. That's "bzr branch" (git clone) and "bzr checkout"
(cvs checkout).

Difference between "bzr branch" and "git clone" is that bzr doesn't
fetch all the branches. It fetches one "branch" (succession of
revisions) with all the ancestors of the revisions of the branch.

-- 
Matthieu
-

From: Olivier Galibert
Date: Tuesday, October 17, 2006 - 7:19 am

You're not telling us bzr still follows the utterly stupid
update-before-commit model, right?  Right?

  OG.
-

From: Matthieu Moy
Date: Tuesday, October 17, 2006 - 8:37 am

One last time:

bzr _CAN_ follow the utterly stupid update-before-commit model.

It doesn't force you to do so, obviously.

-- 
Matthieu
-

From: Petr Baudis
Date: Tuesday, October 17, 2006 - 6:46 pm

Dear diary, on Tue, Oct 17, 2006 at 01:19:08PM CEST, I got a letter

It isn't very nice because it enforces the update-before-commit
workflow, which was complaint of many CVS users and I can remember it
being one of the selling points of the distributed VCSes in 2001 or so,
although it is not so emphasized lately. (I understand that this is
something optional in Bazaar.)

BTW, merge commits aren't bad. They reflect what really happenned,
explicitly record the merge resolution taken, if there was any, and
protect you from accidentally losing or damaging [any portion of] your
changes. And they aren't cluttery either since we hide them from
non-graphical history listings by default.

Still, I can recognize that in some scenarios, people might find it
useful, and I can remember some people asking for it in the past. So I
couldn't resist and implemented it in Cogito as cg-commit --push. Pushed
out now. Took me about 5 minutes implementing it and 10 minutes documenting
it.  ;-)


P.S.: A general note for bleeding-edge Cogito users, I've rewritten the
local changes handling so that we always do three-way merge now instead
of that braindead patches diffing/applying, but it's not completely
stable yet, some testcases still fail. So be a bit careful when
updating/uncommitting/switching/... with uncommitted changes in the
working tree.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Sean
Date: Tuesday, October 17, 2006 - 3:23 am

On Tue, 17 Oct 2006 00:24:15 -0400

Yeah, even in git you typically don't publish your working tree when
making it available for cloning.  In fact the native git network

That is a very nice feature.  Git would be improved if it could

I'm not sure what you mean here.  A bzr checkout doesn't have any history
does it?  So it's not a mirror of a branch, but just a checkout of the
branch head?

If so, Git can export a tarball of a branch (actually a snapshot as at
any given commit) which can be mirrored out.

Sean
-

From: Johannes Schindelin
Date: Tuesday, October 17, 2006 - 3:30 am

Hi,


It would also make things slow as hell. How do you deal with something 
like annotate in such a setup?

Ciao,
Dscho

-

From: Sean
Date: Tuesday, October 17, 2006 - 3:35 am

On Tue, 17 Oct 2006 12:30:27 +0200 (CEST)

Some commands like annotate might not make any sense in such a set up.

But one way to get the same (perhaps even better) feature into git 
would be to support shallow clones, in which case even annotate would
continue to work even if somewhat crippled by the lack of a complete
history.

Sean
-

From: Matthias Kestenholz
Date: Tuesday, October 17, 2006 - 3:45 am

Hi,


You'd probably have to do all processing server-side (git log, blame,
merges... like in subversion, where you can merge and rename/move files
remotely, IIRC). Of course, all the things which make git really useful
for me (gitk, git log with all its arguments etc.) would not be
available. Cheap checkouts would be made possible easily that way at the
cost of higher server load and an abstraction layer over network for
object access.

I don't know if that sounds reasonable at all.

	Matthias

-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 6:48 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


For the particular case of annotate, bzr is designed to store
annotations at commit time.  So annotate should require remote access to
a small amount of data from two files-- not a great cost.

But our default form of checkout contains a local copy of all history
data, so that readonly operations happen at local speed.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNN8Y0F+nu1YWqI0RAqXtAJ4qKGQ5ZwlMF795kz3udeuRTcRy6wCghr53
tjw9cNVxzrQ0XSUO2v52ZIo=
=W6q7
-----END PGP SIGNATURE-----
-

From: Aaron Bentley
Date: Tuesday, October 17, 2006 - 12:51 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1




Sure, and so can bzr.  But using a checkout of the branch head means:
- - No one has to do anything special to provide a working tree of a given
  revision
- - I can still run any readonly operations I desire
- - I can update to the latest version of bzr.dev with one command.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFNTRc0F+nu1YWqI0RAsL2AKCCG0bP8m01WVllfPMzCdFZjmgEgACfeToz
57HERFJ6ZkkS3VrxLRnVPAs=
=3CX7
-----END PGP SIGNATURE-----
-

From: Jan Hudec
Date: Saturday, October 21, 2006 - 11:58 am

If I can add some clarification: There is a lightweight checkout and
heavyweight checkout. The former contains no history and does everything
(except status and I am not sure about diff) by accessing the remote
data. The later contains mirror of the history data and does
write-through on commit (and otherwise behaves like normal branch with
repository)

What would be really useful would be a checkout, or even a branch (ie.
with ability to commit locally), that would only contain history data
since some point. This would allow downloading very little data when
branching, but than working locally as with normal repository clone.

In bzr this was already discussed and the storage supports so called
"ghost" revisions, whose existence is known, but not their data. There
are even repositories around that contain them (created by converting
data from arch), but to my best knowledge there is no user interface to
create branches or checkouts with partial data.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>
-

From: Sean
Date: Saturday, October 21, 2006 - 12:02 pm

On Sat, 21 Oct 2006 20:58:25 +0200

In Git the same functionality can be achieved with so called shallow-
clones.  Unfortunately, they've only been discussed and not yet
implemented.

Sean
-

From: James Henstridge
Date: Friday, October 20, 2006 - 1:26 am

There are two forms of checkout: a normal checkout which contains the
complete history of the branch, and a lightweight checkout, which just
has a pointer back to the original location of the history.

In both cases, a "bzr commit" invocation will commit changes to the
remote location.  In general, you only want to use a lightweight
checkout when there is a fast reliably connection to the branch (e.g.
if it is on the local file system, or local network).

Aaron would be talking about a normal (heavyweight) checkout here.
With a heavyweight checkout, you can do pretty much anything without
access to the branch.  In contrast, almost all operations on a
lightweight checkout need access to the branch.

James.
-

From: Jakub Narebski
Date: Friday, October 20, 2006 - 3:19 am

So the "lightweight checkout" is equivalent of "lazy clone" we have
much discussed on git mailing list about (without any resulting code,
unfortunately). The point of problem was how to do this fast, without
need for fast reliable connection to the repository it was cloned from.
For example if to leave fetched objects in some kind of cache, or even
in "lightweight checkout"/"lazy clone" repository database.

If repository we do "lightweight checkout"/"lazy clone" from is on
local file system (perhaps network file system), then we can use
alternates mechanism (git clone -l -s). That's why "lazy clone" was

We have terminology conflict here. Bazaar-NG "pull" and "merge" vs.
GIT "fetch", "pull" and "merge"; Bazaar-NG "checkout" vs. GIT "clone"
and "checkout".

In GIT "clone" is what is used to copy whole repository, "checkout"
is what is used to extract given/current branch to [given] working area.
-- 
Jakub Narebski
Poland
-

From: zindar
Date: Friday, October 20, 2006 - 1:56 am

In bzr there are two different kind of checkouts.  One is a called a
lightweight checkout and that's really a "normal" checkout in the way
svn for example does it.  In this mode, you have the branch remotely
and only the working tree locally.  So it's just a checkout of the
branch head (of any other revision if using -r when doing the
checkout).

Then there are none lightweight checkouts, heavyweight checkouts.
These are the default type.  A heavyweight checkout is in fact a full
branch locally, but it is "bound" to the remote branch.  What this
means is that all commands such as diff/status/log/etc can be done
locally. So it's really quick.

It acts the same as a lightweight checkout in most regards, so when I
run "bzr update" it actually pulls from the remove branch, and when I
run "bzr commit" it commits the same revision in both the remote
branch and the local branch. It does this in one transaction so one
can't work and the other fail (they would both fail in that case).

What this also gives you is that when you want to clone the branch,
you don't need to go the the remote branch to get the revisions and
also, when being offline, you can commit locally.

Committing locally is a very cool feature in my mind.  If you work in
a centralized manner with checkouts, you normally commit directly to
the central branch, but when you are offline, that will fail (of
course :) ).  So what you can do then is to run "bzr commit --local"
to commit only to your local checkout branch, then when you get online
again you can run "bzr update".  In this case the update will take any
new commits that has been done while you were away, pull them into
your local branch, and make your local commits into something that has
been merged into the "checkout".

I find this REALLY useful.

Don't know if that made sense, here it is in commands.

$ bzr checkout t p
$ cd p
$ echo hej >> hosts
$ bzr commit --local -m 'offline'
$ echo hej >> hosts
$ bzr commit --local -m 'offline 2'

Now I get ...
From: Linus Torvalds
Date: Tuesday, October 17, 2006 - 8:03 am

Ehh. Exactly like the bzr numbers? You have to have access to the original 
repo to name it.

So your point is?

If you do

	git log v2.6.17

in a kernel repository, you'll see exactly what I see - because you'll 
have gotten the tags, aka the "easy revision names".

Now, I'm obviously biased, but the thing is, git really does do this 
right. No meaningless numbers. You give _meaningful_ revision names, and 
they can be extremely powerful.

And no, it's not just tags or the raw SHA1 numbers. You can do 
relationships like

	git log HEAD~5..

which means "show the log for everything since five parents ago" (which is 
_not_ the same as "show the last five revisions", because one of them may 
have been a merge, and brought in a lot more of new commits).

Or, you can say

	git diff mybranch@{2.days.ago}..nextbranch

which says exactly what you'd read it as: show the diff between what 
"mybranch" looked like 2 days ago and what "nextbranch" looks like right 
now.

Or, since the namespace is the same for commit history _and_ for actual 
file contents, and since some commands don't need commits, you can decide 
to name not a revision, but a specific file or subdirectory in a revision, 
and do things like

	git -p grep -1 request_irq v2.6.17~2:drivers/char

where the "revision" is not a commit revision at all, it's a _tree_ 
revision, because we've looked up the revision for "v2.6.17~2" (which 
means "the grandparent of the tag 2.6.17"), and then within that commit we 
looked up the tree "drivers/char", and then we grepped (recursively) for 
the string "request_irq" within that subtree (with one line of context), 
and then we paginated the output through "less" (or whatever your pager is 
set to).

In other words, yes, the above does _exactly_ what you'd expect it to do.

The fact is, nobody ever uses the SHA1 names directly in their normal 
work. You'd use the branch names, tag-names, or some relationship operator 
like "this long ago" or "the parent of" or ...
From: Johannes Schindelin
Date: Monday, October 16, 2006 - 4:45 pm

Hi Aaron,


How should this cope with a distributed project? IOW how does it deal with 
"this revision and that revision are exactly the same"?

If I understand you correctly, you are claiming that you are not really 
identifying a revision, but a revision _at a certain place with a 
place-dependent number_. This conflicts with my understanding of a 

It depends on your usage. If you want to do anything interesting, like 
assure that you have the correct version, or assure that two different 
person's tags actually tag the same revision, there is no simpler 

Of course! Persistence (and reliability) are the number one goal of git. 
Performance is the next one.

As an example of completely independet branches, look at the "next" and 
the "todo" branch of git. They are _completely_ independent, i.e. not even 

Oh, we start another flamewar again?

Honestly, if you want to record renames, why don't you also support (with 
a command for each of those purposes) code copying? And refactoring? And 
copyright year bumps? _put your favourite here_

If you really, really think about it: it makes much more sense to record 
your intention in the commit message. So, instead of recording for _every_ 
_single_ file in folder1/ that it was moved to folder2/, it is better to 
say that you moved folder1/ to folder2/ _because of some special reason_!

Same goes for all other thinkable examples.

If you want to track code, then let the tracker do its work, i.e. let 
git-pickaxe figure where your code came from. It is likely being more 

It is more like the Unix way. Let each command do _one_ thing, but let it 

Welcome to git! Git's commands are very efficient, and you can even pipe 
them efficiently! And now that we have GIT_TRACE, diagnostics are no 
concern.

Ciao,
Dscho

-

From: Petr Baudis
Date: Monday, October 16, 2006 - 7:40 pm

Hi!

Dear diary, on Tue, Oct 17, 2006 at 01:45:34AM CEST, I got a letter

I think Aaron rather meant that in case of an error, the error messages
may seem incoherent from the perspective of a porcelain user if it's
been generated by the plumbing. And I had that problem in Cogito as well
few times in the past, but I think most of those are reasonable now (I
can't think of a counter-example off the top of my head).

Calling multiple git commands _is_ a problem, especially in a loop, but
I think it's more the inherent fork()+execve() overhead than whatever
happens over and over when main() takes over. Many git commands got
adjusted so that you can call them just once and then feed from/to them
over longer time period.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-

From: Aaron Bentley
Date: Monday, October 16, 2006 - 10:08 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


There are two answers here.  One is that the URL + number is UI, not
internals.  A unique ID is used internally, so that can be compared.

But to fully ensure that there are no differences, i.e. that no one has

No, I am claiming that a revision at a certain place with a
place-dependent number is one name for a revision, but it may have other

I can use the 'bzr missing' command to check whether my branch is in
sync with a remote branch.  Or I can use the 'pull' command to update my

You'd be surprised.  When we last spoke to the Mercurial team, Mercurial
didn't support multiple persistent branches in one repository.  Pulling
from a remote repository could join two branches into one.  I'm told

I'd hope not.  It sounds as though you feel that supporting renames in
the data representation is *wrong*, and therefore it should be an insult
to you if we said that Git fully supported renames.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNGVq0F+nu1YWqI0RAsXiAJ9hjH2sQGG3E9oIYP2SxscXvVQsJACdHtkj
+r37JPSjbQCuchPo08P3px8=
=5MHE
-----END PGP SIGNATURE-----
-

From: Carl Worth
Date: Monday, October 16, 2006 - 10:25 pm

I think you missed the simplicity of the git naming here. With git, I
can receive a bug report that specifies a bug that appears in a
revision such as:

	71037f3612da9d11431567c05c17807499ab1746

And since I have a commit object in my repository with that same name
I have a strong assurance that I am testing the identical software as
the bug reporter without me ever needing any access to pull from the
reporter's repository.

And this works in an entirely distributed fashion. Any two users can
be certain they are working with identical software on both ends by
exchanging and comparing a few bytes, (in email, irc, bugzilla, what
have you), without any need to refer to a common repository which both
users have access to.

-Carl

From: Shawn Pearce
Date: Monday, October 16, 2006 - 10:31 pm

It would seem that the majority of folks on the Git list feel that
way, myself among them.  I don't know that we'd find it an insult
to say Git fully supports renames but I do think we have had better
results from *not* recording them and looking for them after the
fact with smart tools.

Junio's recent work with git-pickaxe (or whatever its name finally
settles out to be) is a perfect example of this.  Despite not having
"recorded renames" git-pickaxe is able to fairly accurately detect
blocks of code moving between files, of which renaming files is just
a special case.  This provides some fairly accurate blame reporting
pointing to exactly which commit/author/datetime put a given line
of code into the project.

No additional metadata required.  All existing repositories can
immediately benefit from the new tool.  Rather slick if you ask me.

-- 
Shawn.
-

From: Junio C Hamano
Date: Monday, October 16, 2006 - 11:23 pm

Not recording and not supporting are quite different things.

What we don't do is to _record_ renames in the data structure.
I personally would not use a word as strong as _wrong_ (and
Linus may disagree), but (1) we can support renames without
recording them just fine, (2) recording renames would not help
to tell users about line movements across files which we would
want to do, and (3) we are getting closer to come up with a way
to even do (2) without recording renames.  Given these, perhaps
I might say recording renames is _pointless_ when I am in good
mood.


-

From: J. Bruce Fields
Date: Tuesday, October 17, 2006 - 11:52 am

Yes.  There's a risk of confusing a feature with an implementation
detail.  From http://bazaar-vcs.org/RcsComparisons:

	"If a user can rename a file in the RCS without loosing the RCS
	history for a file, then renames are considered supported. If
	the operation resultes in a delete/add (aka "DA pair"), then
	renames are not considered supported. If the operation results
	in a copy/delete pair, renames are considered "somewhat"
	supported. The problem with copy support is that it is hard to
	define sane merge semantics for copies."

The first sentence sounds like a description of a user-visible feature.
The rest of it sounds like implementation.

And git probably has some deficiencies here, but it'd be more useful to
identify them in terms of things a user can't do.

--b.
-

From: Sean
Date: Tuesday, October 17, 2006 - 3:23 am

On Tue, 17 Oct 2006 01:08:59 -0400

The "bzr missing" command sounds like a handy one.  

Someone on the xorg mailing list was recently lamenting that git does not
have an easy way to compare a local branch to a remote one.  While this
turns out to not be a big problem in git, it might be nice to have such
a command.

Sean
-

From: Robert Collins
Date: Tuesday, October 17, 2006 - 2:33 am

Just a small nit here: bzr does /not/ record the move of every file: it
records the rename of folder1 to folder2. One piece of data is all thats
recorded - no new manifest for the subdirectory is needed.

Of course, a user can choose to move all the contents of a folder and
not the folder itself - its up to the user.

By recording the folder rename rather than the contents rename, we get
merges of new files added to folder1 in other branches come into folder2
automatically, without needing to do arbitrarily deep history processing
to determine that.

This also does not prevent us doing history analysis as well, to
determine other interesting things - such as cross file 'blame' as has
been mentioned in this thread.=20

-Rob
--=20
GPG key available at: <http://www.robertcollins.net/keys.txt>.
Previous thread: VCS comparison table by Jon Smirl on Saturday, October 14, 2006 - 8:07 am. (1 message)

Next thread: Re: [PATCH 1/2] Delete ref $frotz by moving ref file to "deleted-$frotz~ref". by Junio C Hamano on Saturday, October 14, 2006 - 11:47 am. (4 messages)