login
Header Space

 
 

Re: Git Vs. Svn for a project which *must* distribute binaries too.

Previous thread: [PATCH 2/2] cvsimport: add <remote>/HEAD reference in separate remotes more by Andy Whitcroft on Monday, June 4, 2007 - 5:01 am. (1 message)

Next thread: [PATCH] Makefile: Remove git-merge-base from PROGRAMS. by Johannes Sixt on Monday, June 4, 2007 - 7:53 am. (1 message)
To: <git@...>
Date: Monday, June 4, 2007 - 7:48 am

Hello git users / maintainers / fans,

My fellow projecteers and I watched a presentation given by Linus
Torvalds on the advantages of git given at a google questions session
sometime recently.

Our project, www.rockbox.org, an open source firmware replacement
project for digital audio players currently makes use of subversion
for it's source code management system, but Linus's eloquent (though
sometimes rather blunt) speech has made us question whether git is
perhaps a better solution for us.

On the whole, we like a lot of the features it offers but, we have a
couple of issues which we've discussed, and so far have failed to come
up with a decent resolution for them.

1) Due to the nature of our project, with multiple architectures
supported, we strive to provide a binary build of our software with
every commit to the subversion repository. This is so that we can
provide a working firmware for the majority of our users that don't
have the necessary know-how for cross-compiling and so forth.

2) Unlike the Linux Kernel, which Linus uses as a prime example of
something git is very useful for, the Rockbox project has no central
figurehead for anyone to consider as owning the "master" repository
from which to build the "current" version of the Rockbox firmware for
any given target.

3) With a central repository, for which we have a limited number of
individuals having commit access, it's easy for us to automate a build
based on each commit the repository receives.

Given these three points, we wonder how we'd best achieve the same
using git. As far as we can make out we'd need to appoint someone as a
maintainer for a master repository whose job it is to co-ordinate
pulls from people based on when they've made changes we wish to
include in the latest version of our software. This sounds like a time
consuming role for a project which is only staffed by volunteers.

Can anyone offer any insights for us here?

Bryan
-
To: Bryan Childs <godeater@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 11:20 am

Git has no problems with binaries, but I _really_ I hope that you don't 
actually want to check these binaries into the repository? You could do 
that, and the git delta algorithm might even be able to compress the 
binaries against each other, but it could still be pretty nasty.

And by "pretty nasty" I don't mean that git won't be able to handle it: I 
suspect it's no worse from a disk size perspective than SVN.  But since 
git is distributed, it means that everybody who fetches it will get the 
whole archive with whole history - it means that cloning the result is 
going to be really painful with tons of old binaries that nobody really 
cares about beign pushed around.

So I *hope* that you want to just have automated build machinery that 
builds the binaries to a *separate* location? You could use git to archive 
them, and you can obviously (and easily) name the resulting binary blobs 
by the versions in the source tree, but I'm just saying that trying to 
track the binaries from within the same git repository as the source code 

The kernel is really kind of odd in that it has just a single maintainer. 
That's usually the case only for much smaller projects.

And no, git is not at all exclusively *designed* for that situation, 
although it is arguably one situation that git works really well for. 

There is nothing to say that you cannot have shared repositories that are 
writably by multiple users. Anything that works for a single person works 
equally well for a "group of people" that all write to the same central 
git repo. It ends up not being how the kernel does things (not because of 
git, but because it's not how I've ever worked), but the kernel situation 
really _is_ pretty unusual.

So git makes everybody have their own repository in order to commit, but 
you can (and some people do) just view that as your "CVS working tree", 
and every time you commit, you end up pushing to some central repository 
that is writable by the "core group" that has commit access....
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 11:38 am

On 6/4/07, Linus Torvalds &lt; [send email to
torvalds@linux-foundation.org via gmail]

Oh lord no - I never meant to imply that we'd be checking those
binaries in, I just meant to hi-light that we need a central
repository to build those binaries from - otherwise we'd end up with a
selection of binaries for our users to download which contain a bunch
of different features if they were built from a combination of
repositories. I know you think everyone else is a moron, but we're not
quite dumb enough to think maintaining binaries in a repository is a

This sounds like what we eventually came up with. I'm not sure how
soon we'll make a switch to a git repository, but when we do, this
seems to be the best model for the conversion in the short term, and

Yes, after I'd sent my email this morning I found you could do pushes

This is what I personally was trying to advocate in our discussion -
but I'm not sure everyone quite understood it. Hopefully your


Thanks for your time (and everyone else who replied) - it's very much
appreciated!

Bryan
-
To: Bryan Childs <godeater@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Monday, June 4, 2007 - 7:48 pm

Actually, I've been playing with using git's data-distribution mechanism 
to distribute generated binaries. You can do tags for arbitrary binary 
content (not in a tree or commit), and, if you have some way of finding 
the right tag name, you can fetch that and extract it.

I came up with this at my job when we were trying to decide what to do 
with firmware images that we'd shipped, so that we'd be able to examine 
them again even if we lose the compiler version we used at the time. We 
needed an immutable data store with a mapping of tags to objects, and I 
realized that we already had something with these exact characteristics.

	-Daniel
*This .sig left intentionally blank*
-
To: Daniel Barkalow <barkalow@...>
Cc: Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 8:21 pm

Yes, I think git should be very nice for doing binary stuff like firmware 
images too, my only worry is literally about "mixing it in" with other 
stuff.

Putting lots of binary blobs into a git archive should work fine: but 
if you would then start tying them together (with a commit chain), it just 
means that even if you only really want _one_ of them, you end up getting 
them all, which sounds like a potential disaster.

On the other hand, if you actually want a way to really *archive* the dang 
things, that may well be what you actually want. In that case, having a 
separate branch that only contains the binary stuff might actually be what 
you want to do (and depending on the kind of binary data you have, the 
delta algorithm might even be good at finding common data sequences and 

Yeah, if you just tag individual blobs, git will keep track of them, but 
won't link them together, so you can easily just look up and fetch a 
single one from such an archive. Sounds sane enough.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Daniel Barkalow <barkalow@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 9:42 pm

if you put the binaries in a seperate repository and do shallow clones to 
avoid getting all the old stuff wouldn't that work well?

-
To: <david@...>
Cc: Daniel Barkalow <barkalow@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 11:58 pm

Yes. I'm not a huge fan of shallow clones, and I suspect they've not 
gotten all that much testing, but that would certainly solve the problem 
of getting unnecessarily much data..

		Linus
-
To: Bryan Childs <godeater@...>
Cc: Linus Torvalds <torvalds@...>, <git@...>
Date: Monday, June 4, 2007 - 6:29 pm

If your infrastructure to build the binaries is automated, you can
easily script the build for new incoming commits. The output of
git-describe is really useful for this if you are going to name your
builds `git describe`-&lt;arch&gt;.tar.gz.

OTOH, commit is different from push (vs SVN where both are one op),
and that means that when using git you can present a large change as a
better-explained patch-series. That's actually a good practice for new
development, and it might not make sense to have literally
one-build-per-commit.

Maybe I'd enable auto-builds for maintenance/bugfixes branches, and on
other (experimental/devel) branches only auto-build commits selected
explicitly (tagged?).

cheers,


martin
-
To: Bryan Childs <godeater@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 12:23 pm

Heh. I get worried (and judging from other responses, I wasn't the only 
one) when people start talking about generated binaries and SCM's.

Because people _have_ traditionally done things like commit the generated 
files too. 

But if it's just an automated build server, everything is good. That's 

Yes. As mentioned, the kernel model of having just one person push is 
actually fairly rare. 

When you have multiple people pushing, you have issues that I never have, 
but that you've already seen with CVS/SVN, for all the same reasons: you 
may need to merge the changes that others have done while you were working 
on yours.

However, the git "push" model is *different* from the CVS/SVN "commit" 
model.

In CVS/SVN, if you want to commit, and somebody else has done updates to 
the central repository, the "cvs commit" phase will obviously tell you 
that you're not up-to-date, and you cannot commit at all. So you end up 
doing a "cvs update -d" equivalent to first update your tree, then you 
have to resolve any conflicts, and then you can try to commit again.

In git, this is technically very different, yet similar. Since you can 
always commit to your *local* repository, when you do a "git commit", 
you'll never have any conflicts at all, because there is no conflicting 
work!

But the conflicts happen when you then do a "git push" to send out your 
commit(s) to the central repository. If nobody else has done any changes, 
at that point, you'll get exactly the same kind of situation as when you 
do a CVS commit, and the server will tell you that you're not up-to-date, 
and will refuse to take your push.

(The message is different: git will tell you that you try to push a commit 
that is not a "strict superset" of what the central repository has).

So when that happens with git, you actually have two different options:

 - you can do "git pull" to merge the central changes, and in that case 
   you get the exact same kinds of conflict markers for any conflicting 
   c...
To: Linus Torvalds <torvalds@...>
Cc: Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 1:57 pm

Thank you a lot. I finally understood what "git rebase" is all about!

        Thomas
-
To: Thomas Glanzmann <thomas@...>
Cc: Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 4:45 pm

I'd like to point out some more upsides and downsides of "git rebase".

Downsides:

 - you're rewriting history, so you MUST NOT have made your pre-rebase 
   changes available publicly anywhere else (or you are in a world of pain 
   with duplicate history and tons of confusion)

 - you can only rebase "simple" commits. If you don't just have a linear 
   history of your own commits, but have merged from others, rebasing 
   isn't a sane alternative (yeah, we could make it do something half-way 
   sane, but really, it's not worth even contemplating)

Upsides:

 - while there may be more conflicts you have to sort out, they may be 
   individually  simpler, so you *might* actually prefer to do it that 
   way.

 - if the reason for the conflicts is that upstream did some nice cleanup 
   in the same area, and you decide that you would actually want to re-do 
   your development based on that nice cleanup, then "git rebase" can 
   actually be used as a way to help you do exactly that. IOW, you can 
   take _advantage_ of the conflicts as a way to re-apply the patches but 
   also then fix them up by hand to work in the new (better) world order.

And finally, the upside that is probably the most common case for using 
"git rebase", and has nothing to do with resolving conflicts before 
pushing them out with "git push":

 - if you actually want to send your changes upstream as emailed *patches* 
   rather than by pushing them out (or asking somebody else to pull them),
   rebasing is an excellent way to keep the set of patches "fresh" on top 
   of the current development tree.

   People who send their patches out as emails are also unlikely to have 
   the downsides (ie they normally send them as patches exactly *because* 
   they don't want to make their git trees public, and they probably just 
   have a small set of simple patches in their tree anyway)

So I have to say, I'm still very ambivalent about rebasing. It's 
definitely a very useful thing to do, but at ...
To: Linus Torvalds <torvalds@...>
Cc: Thomas Glanzmann <thomas@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 5:21 pm

Wouldn't it be possible to register the rebase somewhere (weak parent?
some kind of note not influencing the sha1 ?) that pull/merge could
follow?  Rebases and cherry-picking are a special kind of merge, so
maybe it can be handled like one where it counts...

  OG.
-
To: Olivier Galibert <galibert@...>
Cc: Linus Torvalds <torvalds@...>, Thomas Glanzmann <thomas@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 10:56 pm

Hi,


Actually, with reflogs (if you did not explicitely disable them), you 

There is something I have to add as a real disadvantage in rebase:

Usually you are expected to test your commits. So, say that you work on 
some patch series, and produce 3 well tested patches. Then you fetch 
upstream and realize it advanced by some commits, and rebase your three 
patches.

However, _none_ of your patches is well tested, because there is a quite 
real chance that your patches interact _badly_ with the patches you just 
fetched.

And if that is the case, git-bisect can very well attribute it to a wrong 
patch, either because more than one patch is bad, or because the last 
patch in your series _exposes_ the bug (but does not _introduce_ it).

Ciao,
Dscho

-
To: Olivier Galibert <galibert@...>
Cc: Thomas Glanzmann <thomas@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 5:33 pm

Well, it's not like duplicate history is a disaster from a *technical* 
angle. It might be a small space-waster etc, but that's really not the 
real issue.

The problem with duplicate history is that it just makes things much 
harder to look at. IOW, it's *messy*. So the "tons of confusion" part is 
basically purely about humans, not about git itself. Git won't really 
care, and there's no reason to "handle" it specially in that sense.

So I would strongly discourage people from ever making rebased history 
available, but that's not because of any particular git technical issues 
as just because of it being a good way to confuse all the _humans_ 
involved.

(That said, gits own 'pu' branch ends up jumping around, and it hasn't 
caused all that much confusion, so maybe I'm overstating even that human 
confusion)

			Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Olivier Galibert <galibert@...>, Thomas Glanzmann <thomas@...>, Bryan Childs <godeater@...>, <git@...>
Date: Monday, June 4, 2007 - 6:30 pm

It survives because it is well-known.  Everyone expects it to
break.  ocfs2 has an "ALL" branch that is everything we have working,
sort of a "test this bleeding edge" thing.  It gets rebased all the
time, and everyone knows that they can't trust it to update linearly.
Other developers have similar things in their repositories.

Joel

-- 

"What no boss of a programmer can ever understand is that a programmer
 is working when he's staring out of the window"
	- With apologies to Burton Rascoe

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
-
To: Joel Becker <Joel.Becker@...>
Cc: Linus Torvalds <torvalds@...>, Olivier Galibert <galibert@...>, Thomas Glanzmann <thomas@...>, Bryan Childs <godeater@...>, <git@...>
Date: Tuesday, June 5, 2007 - 7:19 am

I wonder if it would be useful to be able to be able to flag a
branches as "jumping around a lot", where this flag would be
downloaded from another repository when it is cloned, so that a naive
user could get some kind of warning before committing a patch on top
of one of these branches that is known jump around.  

	"This branch gets rebased all the time and is really meant for
	testing.  If you really want to commit this changeset, please
	configure yourself for expert mode or use the --force."

Or maybe just a warning, ala what we do with detached heads.

						- Ted
-
To: Bryan Childs <godeater@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 10:58 am

Hi,


Git has no problems with binaries. Actually, one could argue that it has 
less problems with binary files than with text files, since it only 
recently acquired the capability (disabled by default) to transcribe 
certain files into the CR/LF line ending some Windows programs still 
insist on.

As for checking in binaries, you even could set up a post-commit hook, 
which builds the binary, and checks it into a separate branch...

Ciao,
Dscho

-
To: Bryan Childs <godeater@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 9:18 am

You might want to take a look at http://repo.or.cz for an example of
how you can have a limited number of trusted inidividuals with commit
access.  As has been said before, &lt;SCM&gt; is not a substitute for
communication, and if you have multiple people who can commit into a
repository, you had better make sure those trusted individuals with
commit access are talking to each other.  

There are some folks who have created hooks to do more fine-grained
access control systems, if you want to replicate SVN's ability to
control who can commit to which branch.  

Regards,

						- Ted
-
To: Bryan Childs <godeater@...>
Cc: <git@...>
Date: Monday, June 4, 2007 - 7:56 am

You can setup git to work in a centralised style if you wish.

See http://www.kernel.org/pub/software/scm/git/docs/cvs-migration.html

-- 
Julian

  ---
If reporters don't know that truth is plural, they ought to be lawyers.
 		-- Tom Wicker
-
Previous thread: [PATCH 2/2] cvsimport: add <remote>/HEAD reference in separate remotes more by Andy Whitcroft on Monday, June 4, 2007 - 5:01 am. (1 message)

Next thread: [PATCH] Makefile: Remove git-merge-base from PROGRAMS. by Johannes Sixt on Monday, June 4, 2007 - 7:53 am. (1 message)
speck-geostationary