Hello git users / maintainers / fans, My fellow projecteers and I watched a presentation given by Linus Torvalds on the advantages of git given at a google questions session sometime recently. Our project, www.rockbox.org, an open source firmware replacement project for digital audio players currently makes use of subversion for it's source code management system, but Linus's eloquent (though sometimes rather blunt) speech has made us question whether git is perhaps a better solution for us. On the whole, we like a lot of the features it offers but, we have a couple of issues which we've discussed, and so far have failed to come up with a decent resolution for them. 1) Due to the nature of our project, with multiple architectures supported, we strive to provide a binary build of our software with every commit to the subversion repository. This is so that we can provide a working firmware for the majority of our users that don't have the necessary know-how for cross-compiling and so forth. 2) Unlike the Linux Kernel, which Linus uses as a prime example of something git is very useful for, the Rockbox project has no central figurehead for anyone to consider as owning the "master" repository from which to build the "current" version of the Rockbox firmware for any given target. 3) With a central repository, for which we have a limited number of individuals having commit access, it's easy for us to automate a build based on each commit the repository receives. Given these three points, we wonder how we'd best achieve the same using git. As far as we can make out we'd need to appoint someone as a maintainer for a master repository whose job it is to co-ordinate pulls from people based on when they've made changes we wish to include in the latest version of our software. This sounds like a time consuming role for a project which is only staffed by volunteers. Can anyone offer any insights for us here? Bryan -
Git has no problems with binaries, but I _really_ I hope that you don't actually want to check these binaries into the repository? You could do that, and the git delta algorithm might even be able to compress the binaries against each other, but it could still be pretty nasty. And by "pretty nasty" I don't mean that git won't be able to handle it: I suspect it's no worse from a disk size perspective than SVN. But since git is distributed, it means that everybody who fetches it will get the whole archive with whole history - it means that cloning the result is going to be really painful with tons of old binaries that nobody really cares about beign pushed around. So I *hope* that you want to just have automated build machinery that builds the binaries to a *separate* location? You could use git to archive them, and you can obviously (and easily) name the resulting binary blobs by the versions in the source tree, but I'm just saying that trying to track the binaries from within the same git repository as the source code The kernel is really kind of odd in that it has just a single maintainer. That's usually the case only for much smaller projects. And no, git is not at all exclusively *designed* for that situation, although it is arguably one situation that git works really well for. There is nothing to say that you cannot have shared repositories that are writably by multiple users. Anything that works for a single person works equally well for a "group of people" that all write to the same central git repo. It ends up not being how the kernel does things (not because of git, but because it's not how I've ever worked), but the kernel situation really _is_ pretty unusual. So git makes everybody have their own repository in order to commit, but you can (and some people do) just view that as your "CVS working tree", and every time you commit, you end up pushing to some central repository that is writable by the "core group" that has commit access....
On 6/4/07, Linus Torvalds < [send email to torvalds@linux-foundation.org via gmail] Oh lord no - I never meant to imply that we'd be checking those binaries in, I just meant to hi-light that we need a central repository to build those binaries from - otherwise we'd end up with a selection of binaries for our users to download which contain a bunch of different features if they were built from a combination of repositories. I know you think everyone else is a moron, but we're not quite dumb enough to think maintaining binaries in a repository is a This sounds like what we eventually came up with. I'm not sure how soon we'll make a switch to a git repository, but when we do, this seems to be the best model for the conversion in the short term, and Yes, after I'd sent my email this morning I found you could do pushes This is what I personally was trying to advocate in our discussion - but I'm not sure everyone quite understood it. Hopefully your Thanks for your time (and everyone else who replied) - it's very much appreciated! Bryan -
Actually, I've been playing with using git's data-distribution mechanism to distribute generated binaries. You can do tags for arbitrary binary content (not in a tree or commit), and, if you have some way of finding the right tag name, you can fetch that and extract it. I came up with this at my job when we were trying to decide what to do with firmware images that we'd shipped, so that we'd be able to examine them again even if we lose the compiler version we used at the time. We needed an immutable data store with a mapping of tags to objects, and I realized that we already had something with these exact characteristics. -Daniel *This .sig left intentionally blank* -
Yes, I think git should be very nice for doing binary stuff like firmware images too, my only worry is literally about "mixing it in" with other stuff. Putting lots of binary blobs into a git archive should work fine: but if you would then start tying them together (with a commit chain), it just means that even if you only really want _one_ of them, you end up getting them all, which sounds like a potential disaster. On the other hand, if you actually want a way to really *archive* the dang things, that may well be what you actually want. In that case, having a separate branch that only contains the binary stuff might actually be what you want to do (and depending on the kind of binary data you have, the delta algorithm might even be good at finding common data sequences and Yeah, if you just tag individual blobs, git will keep track of them, but won't link them together, so you can easily just look up and fetch a single one from such an archive. Sounds sane enough. Linus -
if you put the binaries in a seperate repository and do shallow clones to avoid getting all the old stuff wouldn't that work well? -
Yes. I'm not a huge fan of shallow clones, and I suspect they've not gotten all that much testing, but that would certainly solve the problem of getting unnecessarily much data.. Linus -
If your infrastructure to build the binaries is automated, you can easily script the build for new incoming commits. The output of git-describe is really useful for this if you are going to name your builds `git describe`-<arch>.tar.gz. OTOH, commit is different from push (vs SVN where both are one op), and that means that when using git you can present a large change as a better-explained patch-series. That's actually a good practice for new development, and it might not make sense to have literally one-build-per-commit. Maybe I'd enable auto-builds for maintenance/bugfixes branches, and on other (experimental/devel) branches only auto-build commits selected explicitly (tagged?). cheers, martin -
Heh. I get worried (and judging from other responses, I wasn't the only one) when people start talking about generated binaries and SCM's. Because people _have_ traditionally done things like commit the generated files too. But if it's just an automated build server, everything is good. That's Yes. As mentioned, the kernel model of having just one person push is actually fairly rare. When you have multiple people pushing, you have issues that I never have, but that you've already seen with CVS/SVN, for all the same reasons: you may need to merge the changes that others have done while you were working on yours. However, the git "push" model is *different* from the CVS/SVN "commit" model. In CVS/SVN, if you want to commit, and somebody else has done updates to the central repository, the "cvs commit" phase will obviously tell you that you're not up-to-date, and you cannot commit at all. So you end up doing a "cvs update -d" equivalent to first update your tree, then you have to resolve any conflicts, and then you can try to commit again. In git, this is technically very different, yet similar. Since you can always commit to your *local* repository, when you do a "git commit", you'll never have any conflicts at all, because there is no conflicting work! But the conflicts happen when you then do a "git push" to send out your commit(s) to the central repository. If nobody else has done any changes, at that point, you'll get exactly the same kind of situation as when you do a CVS commit, and the server will tell you that you're not up-to-date, and will refuse to take your push. (The message is different: git will tell you that you try to push a commit that is not a "strict superset" of what the central repository has). So when that happens with git, you actually have two different options: - you can do "git pull" to merge the central changes, and in that case you get the exact same kinds of conflict markers for any conflicting c...
Thank you a lot. I finally understood what "git rebase" is all about!
Thomas
-I'd like to point out some more upsides and downsides of "git rebase". Downsides: - you're rewriting history, so you MUST NOT have made your pre-rebase changes available publicly anywhere else (or you are in a world of pain with duplicate history and tons of confusion) - you can only rebase "simple" commits. If you don't just have a linear history of your own commits, but have merged from others, rebasing isn't a sane alternative (yeah, we could make it do something half-way sane, but really, it's not worth even contemplating) Upsides: - while there may be more conflicts you have to sort out, they may be individually simpler, so you *might* actually prefer to do it that way. - if the reason for the conflicts is that upstream did some nice cleanup in the same area, and you decide that you would actually want to re-do your development based on that nice cleanup, then "git rebase" can actually be used as a way to help you do exactly that. IOW, you can take _advantage_ of the conflicts as a way to re-apply the patches but also then fix them up by hand to work in the new (better) world order. And finally, the upside that is probably the most common case for using "git rebase", and has nothing to do with resolving conflicts before pushing them out with "git push": - if you actually want to send your changes upstream as emailed *patches* rather than by pushing them out (or asking somebody else to pull them), rebasing is an excellent way to keep the set of patches "fresh" on top of the current development tree. People who send their patches out as emails are also unlikely to have the downsides (ie they normally send them as patches exactly *because* they don't want to make their git trees public, and they probably just have a small set of simple patches in their tree anyway) So I have to say, I'm still very ambivalent about rebasing. It's definitely a very useful thing to do, but at ...
Wouldn't it be possible to register the rebase somewhere (weak parent? some kind of note not influencing the sha1 ?) that pull/merge could follow? Rebases and cherry-picking are a special kind of merge, so maybe it can be handled like one where it counts... OG. -
Hi, Actually, with reflogs (if you did not explicitely disable them), you There is something I have to add as a real disadvantage in rebase: Usually you are expected to test your commits. So, say that you work on some patch series, and produce 3 well tested patches. Then you fetch upstream and realize it advanced by some commits, and rebase your three patches. However, _none_ of your patches is well tested, because there is a quite real chance that your patches interact _badly_ with the patches you just fetched. And if that is the case, git-bisect can very well attribute it to a wrong patch, either because more than one patch is bad, or because the last patch in your series _exposes_ the bug (but does not _introduce_ it). Ciao, Dscho -
Well, it's not like duplicate history is a disaster from a *technical* angle. It might be a small space-waster etc, but that's really not the real issue. The problem with duplicate history is that it just makes things much harder to look at. IOW, it's *messy*. So the "tons of confusion" part is basically purely about humans, not about git itself. Git won't really care, and there's no reason to "handle" it specially in that sense. So I would strongly discourage people from ever making rebased history available, but that's not because of any particular git technical issues as just because of it being a good way to confuse all the _humans_ involved. (That said, gits own 'pu' branch ends up jumping around, and it hasn't caused all that much confusion, so maybe I'm overstating even that human confusion) Linus -
It survives because it is well-known. Everyone expects it to break. ocfs2 has an "ALL" branch that is everything we have working, sort of a "test this bleeding edge" thing. It gets rebased all the time, and everyone knows that they can't trust it to update linearly. Other developers have similar things in their repositories. Joel -- "What no boss of a programmer can ever understand is that a programmer is working when he's staring out of the window" - With apologies to Burton Rascoe Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 -
I wonder if it would be useful to be able to be able to flag a branches as "jumping around a lot", where this flag would be downloaded from another repository when it is cloned, so that a naive user could get some kind of warning before committing a patch on top of one of these branches that is known jump around. "This branch gets rebased all the time and is really meant for testing. If you really want to commit this changeset, please configure yourself for expert mode or use the --force." Or maybe just a warning, ala what we do with detached heads. - Ted -
Hi, Git has no problems with binaries. Actually, one could argue that it has less problems with binary files than with text files, since it only recently acquired the capability (disabled by default) to transcribe certain files into the CR/LF line ending some Windows programs still insist on. As for checking in binaries, you even could set up a post-commit hook, which builds the binary, and checks it into a separate branch... Ciao, Dscho -
You might want to take a look at http://repo.or.cz for an example of how you can have a limited number of trusted inidividuals with commit access. As has been said before, <SCM> is not a substitute for communication, and if you have multiple people who can commit into a repository, you had better make sure those trusted individuals with commit access are talking to each other. There are some folks who have created hooks to do more fine-grained access control systems, if you want to replicate SVN's ability to control who can commit to which branch. Regards, - Ted -
You can setup git to work in a centralised style if you wish. See http://www.kernel.org/pub/software/scm/git/docs/cvs-migration.html -- Julian --- If reporters don't know that truth is plural, they ought to be lawyers. -- Tom Wicker -
| Block Sub System query | 53 minutes ago | Linux kernel |
| kernel module to intercept socket creation | 1 hour ago | Linux kernel |
| Image size changing during each build | 2 hours ago | Linux kernel |
| Creating a device from a kernel module (mknod style) | 2 hours ago | Linux kernel |
| Soft lock bug | 7 hours ago | Linux kernel |
| sysctl - dynamic registration problem | 13 hours ago | Linux kernel |
| Question on swap as ramdisk partition | 15 hours ago | Linux kernel |
| serial driver xmit problem | 20 hours ago | Linux kernel |
| Generic Netlink subsytem | 21 hours ago | Linux kernel |
| 'Report spam filter error' page broken | 22 hours ago | KernelTrap Suggestions and Feedback |
