login
Header Space

 
 

Linux: Using Git For More Than The Kernel

August 15, 2005 - 2:37pm
Submitted by Jeremy on August 15, 2005 - 2:37pm.
Linux news

A discussion was raised as to whether or not GIT [story] would be a service that should be provided by development websites like SourceForge. Linus Torvalds suggested that this would be a good match-up. "The git architecture is admirably suited to an _untrusted_ central server," Linus explained, "ie exactly the SourceForge kind of setup." He went on to explain, "with git, developers don't have to trust SF, and if SF is down or something bad happens (disk crash, bad backups, whatever), you didn't 'lose' anything - the real development wasn't being done at SF anyway, it was a way to _connect_ the people who do real development."

As to whether or not this is likely to happen, Linus added, "it's possible that git usage won't expand all that much either. But quite frankly, I think git is a lot better than CVS (or even SVN) by now, and I wouldn't be surprised if it started getting some use outside of the git-only and kernel projects once people start getting more used to it. And so I'd be thrilled to have some site like SF support it."


From: Wolfgang Denk [email blocked]
To:  git
Subject: [OT?] git tools at SourceForge ?
Date:	Fri, 12 Aug 2005 21:07:39 +0200

This is somewhat off topic here, so I apologize, but  I  didn't  know
any better place to ask:

Has anybody any information if SourceForge is going to provide git  /
cogito / ... for the projects they host? I asked SF, and they openend
a new Feature Request (item #1252867); the message I received sounded
as if I was the first person on the planet to ask...

Am I really alone with this?

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [email blocked]
There are three things I always forget. Names, faces -  the  third  I
can't remember.                                         - Italo Svevo


From: Daniel Barkalow [email blocked] Subject: Re: [OT?] git tools at SourceForge ? Date: Fri, 12 Aug 2005 16:46:34 -0400 (EDT) On Fri, 12 Aug 2005, Wolfgang Denk wrote: > This is somewhat off topic here, so I apologize, but I didn't know > any better place to ask: > > Has anybody any information if SourceForge is going to provide git / > cogito / ... for the projects they host? I asked SF, and they openend > a new Feature Request (item #1252867); the message I received sounded > as if I was the first person on the planet to ask... > > Am I really alone with this? The git architecture makes the central server less important, and it's easy to run your own. Also, kernel.org is providing space to a set of people with a large overlap with git users, since git hasn't been particularly publicized and kernel.org is hosting git. -Daniel *This .sig left intentionally blank*
From: Linus Torvalds [email blocked] Subject: Re: [OT?] git tools at SourceForge ? Date: Fri, 12 Aug 2005 15:27:52 -0700 (PDT) On Fri, 12 Aug 2005, Daniel Barkalow wrote: > > The git architecture makes the central server less important, and it's > easy to run your own. On the other hand: - the git architecture is admirably suited to an _untrusted_ central server, ie exactly the SourceForge kind of setup. I realize that the people at SourceForge probably think they are trustworthy, but in the big picture, even SF probably would prefer people to see them as a _distribution_ point, not the final authority. IOW, with git (unlike, for example CVS), you can have a useful distribution point that is _not_ one that the developers have to control or even necessarily want to control. Which is exactly the kind of setup that would match what SF does. So with git, developers don't have to trust SF, and if SF is down or something bad happens (disk crash, bad backups, whatever), you didn't "lose" anything - the real development wasn't being done at SF anyway, it was a way to _connect_ the people who do real development. - Every developer wants to have their own history and complete source control, but very few developers actually have good distribution resources. "kernel.org" works for a few projects, and might be fine to expand a bit past what it does now, but kernel.org doesn't eevn try to do (nor _want_ to do, I bet) the kinds of things that SF does. Yes, developers can just merge with each other directly, and git allows that, but it's actually not very convenient - not because of git itself, but because of just doing the maintenance. For example, I don't allow incoming traffic to my machines, and I feel _much_ better that way. No MIS, no maintenance, and much fewer security issues. This is _exactly_ where something like SF really ends up being helpful. It's a _hosting_ service, and git is eminently suitable to being hosted, exactly because of its distributed approach. It needs very few hosting services: you could make do with a very very limited shell access, and in fact I tried to design the "git push" protocol so that you could give people ssh "shell" access, where the "shell" was not a real shell at all, but something that literally just implemented four or five commands ("git-receive-pack" and some admin commands to do things like creation/removal of whole archives etc). > Also, kernel.org is providing space to a set of people with a large > overlap with git users, since git hasn't been particularly publicized > and kernel.org is hosting git. kernel.org certainly works well enough for the few projects that use it, but I don't think it's going to expand all that much. And it's possible that git usage won't expand all that much either. But quite frankly, I think git is a lot better than CVS (or even SVN) by now, and I wouldn't be surprised if it started getting some use outside of the git-only and kernel projects once people start getting more used to it. And so I'd be thrilled to have some site like SF support it. bkbits.net used to do that for BK projects, and there were a _lot_ of projects that used it. Linus
From: Martin Langhoff [email blocked] Subject: Re: [OT?] git tools at SourceForge ? Date: Sat, 13 Aug 2005 11:17:57 +1200 > - the git architecture is admirably suited to an _untrusted_ central > server, ie exactly the SourceForge kind of setup. I realize that the Definitely. And beyond that too. Using SF for CVS means that when SF's CVS service is down (often enough) you can't commit (or even fscking diff) until they are back up. Every single damn operation does a roundtrip. This also means a huge load on their servers. I'm sure SF will be glad to see CVS fall out of favour. > Yes, developers can just merge with each other directly I take it that you mean an exchange of patches that does not depend on having public repos. What are the mechanisms available on that front, other than patchbombs? > This is _exactly_ where something like SF really ends up being helpful. > It's a _hosting_ service, and git is eminently suitable to being Not sure whether SF is offering rsync, but they do support hosting of arbitrarty data -- and a project using GIT can use that to host several developer trees . It'd be nice if SF offered gitweb and similar niceties. As my usage of GIT increases, I may add support for it on Eduforge.org If I had more (hw/time) resources I'd do the git proxying of CVS projects, but that's huge. > And so I'd be thrilled to have some site like SF support it. Eduforge's charter is to host education-related projects, so that's not a free-for-all-comers, but I'm considering git support, as our usage of git is growing. cheers, martin
From: Linus Torvalds [email blocked] Subject: Re: [OT?] git tools at SourceForge ? Date: Fri, 12 Aug 2005 16:46:11 -0700 (PDT) On Sat, 13 Aug 2005, Martin Langhoff wrote: > > > Yes, developers can just merge with each other directly > > I take it that you mean an exchange of patches that does not depend on > having public repos. What are the mechanisms available on that front, > other than patchbombs? Just have a shared trusted machine. A lot of "core" developers end up having machines that they may not control, and that they may not be able to use as major distribution points, but that they _can_ share with others. For example, "master.kernel.org" ends up being that for the kernel: you don't have to have an account on master, but most of the core developers do, so they can use it as a short-cut that is independent of the actual "public" site. Similarly, some people are perfectly willing to give other trusted developers a ssh login on their machine - and that's a perfectly fine way to sync repositories directly if you have even a slow DSL link. You'd never want to _distribute_ the result over DSL, though. The point being that you can certainly sync up to others without going through a public site. [ We _could_ also just send pack-files as email attachments. There's nothing fundamentally wrong with doing the object discovery that "git-send-pack" does on its own manually over email. In other words: you could easily do something like "Hey, I've got your commit as of yesterday, ID <sha1>, can you send me your current top-of-tree SHA1 name and the pack-file between the two?" and have direct git-to-git synchronization even over email. NOTE NOTE NOTE! BK did this, with a "bk send" and "bk receive". I hated it, which is why I'd never do scripts like that. But I think it's a valid thing to do when you're cursing the fact that the central repository is down, and has been down for five hours, and you don't know how long it will take to get back up, and you don't have _any_ other choices ] > > This is _exactly_ where something like SF really ends up being helpful. > > It's a _hosting_ service, and git is eminently suitable to being > > Not sure whether SF is offering rsync, but they do support hosting of > arbitrarty data -- and a project using GIT can use that to host > several developer trees. The problem with the arbitrary data approach (and rsync) is that the git repositories can get out of sync. We haven't seen it very often on kernel.org, but we _do_ see it. I think I've got something like three bug reports from people saying "your tree is corrupted" because it so happened that the mirroring was on-going at the same time I did a push, and the mirroring caught an updated HEAD without actually having caught all of the objects that HEAD referenced. Now, all the git tools do write things in the right order, and mirror scripts etc _tend_ to mirror in alphabetical order (and "objects" come before "refs" ;), so you really have to hit the right window where a git tool updates the git repository at the same time as a mirroring sweep is going on, but it definitely _does_ happen. It just happens seldom enough that most people haven't noticed. But if you've seen the gitweb page go blank for one of the projects, you now know why it can happen.. And this is inevitable when you have a non-git-aware server. You really need to update the objects in the right order, and to get the right order you do have to be git-aware. > It'd be nice if SF offered gitweb and > similar niceties. As my usage of GIT increases, I may add support for > it on Eduforge.org I think we'll find that it's a learning process, to just find out what drives site managers mad (we certainly found the problem with lots of small files on kernel.org out ;). Having a few sites that do it and tell the others what gotchas there are involved with it (and what scripts they use for maintaining stuff like auto-packing etc) is all a learning experience. Linus



Related Links:

SVN

August 15, 2005 - 3:47pm
Q (not verified)

Could someone experienced with SVN/CVS/etc. explain how come GIT surpassed them so quickly?
Or are we comparing just some specific functionality here?
Because it sounds like git is much faster, better and was developed in no time.

Can't say I've ever used SVN

August 15, 2005 - 5:30pm

Can't say I've ever used SVN / CVS but I know a little about git. Git's filesystem-like implementation allows high-performance implementations (albeit at some disk space cost if you don't repack), whilst being relatively simple. Likewise, its content-based indexing makes fully distributed (no central server, every developer has their own branch with full revision control for their private development work) development work very nicely.

Git is still behind CVS / SVN in terms of third-party tools, services such as SF.net (hence the above discussion), etc. 3rd party tool support is appearing relatively rapidly though. Another nice feature of git is that the "core plumbing" specifies a standard revision database and operations you can perform on it but doesn't specify what the SCM frontend should look like. Tools like "cogito" then place their own SCM-like interface on top: you can have several different SCMs all interoperating using the common git backend.

Much of what I've said also applicable to the Mercurial SCM (except that Mercurial uses a similarly fast on-disk format that has a number of advantages including not requiring repacking). Mercurial is a complete SCM, including front-end (which is not an advantage or disadvantage, just different to git).

SVN

August 15, 2005 - 5:42pm
Kougar (not verified)

Simple someone (Linus) made a solution that fixed what made those tools unsuitable for his use. And git seems so much better since people aren't trying to use it for things it's not designed for (yet). Just the usual way of things evolving. Try something till you know how its broken. Fix. Hope people agree your fix makes things better. Repeat.

SVN versus Git

August 15, 2005 - 7:23pm
erikharrison (not verified)

Subversion is in my mind the best of the classic centralized version control systems.

Bitkeeper is the best of the new model of distributed version control systems.

Git is a fast userspace, versioning, content addressable file system, which can be used to implement many of the things that Bitkeeper does.

Linus really has a different philosophy of version control than that of the last 20 years. BK is very close to his vision, but not quite there. The centralized model, to Linus's mind (and I think he's right) is fundamentally broken, and SVN's on disk formats are pitifully slow for a project of the kernel's size. As such, the facts that

1) Git is not a complete version control tool
2) Git is relatively featureless
3) Git has very little in terms of a supporting toolset

don't matter to Linus. Git meets his needs better.

Git's rather raw with Git + Cogito is pretty sane

August 15, 2005 - 10:31pm

Git's rather raw but Git + Cogito make a reasonably sane SCM. I rather like the git model of having a set of core utilities and a standard database format that other SCMs can use to interoperate: as well as Cogito, Darcs has support for accessing and updating git repositories and there are verious GUI tools (qgit, gct, gitk) that also build on the core git plumbing and so can work on any of these repositories.

I understand that Arch is going to use the git database format at some stage in the future, in order to improve performance.

svn branching worse than cvs

August 16, 2005 - 10:03am
Paul Houle (not verified)

I've really found that branching support in CVS is a pain (multiple merges between branches is a big pain and requires a lot of manual effort.) SVN, believe it or not, is worse than CVS -- although SVN addresses the problems that Java developers have with CVS (support for renaming) it doesn't address the real problems CVS has.

Also, SVN is slow. It's fine for tiny projects (but still noticably slower than CVS) but it would take forever to do SVN operations on the Linux kernel. The idea of using a modified WebDAV for the communications protocol is cool, but it's the kind of decision you make if performance doesn't matter.

Also, SVN is slow. It's fine

August 16, 2005 - 10:48am

Also, SVN is slow. It's fine for tiny projects (but still noticably slower than CVS) but it would take forever to do SVN operations on the Linux kernel. The idea of using a modified WebDAV for the communications protocol is cool, but it's the kind of decision you make if performance doesn't matter.

You don't have to use webdav. There are other ways to access svn repositories, like the native svn server.

Hey patrick, hoe is het.. lang geleden.

August 29, 2007 - 4:26pm
p-j (not verified)

Ha, Ik heb je gevonden..

Zit je nog steeds bij prĂ­va??
laat ff wat van je horen.

p.j.m.broodbakker(AT)gmail.com

Groeten, P-J

you sure about that?

August 17, 2005 - 7:46am

SVN doesn't have branching actually -- which is precisely the reason I love it. Many SVN clients/frontends implement "branching" features, so perhaps if you're unhappy with branching in SVN you should just try a new client/frontend.

I've worked in both small and medium (50k-100k line) projects in both CVS and SVN, and IMHO SVN has always been faster. Especially since branches and tags in SVN are simply other folders in your repository, it makes branching and merging very easy and fast.

I know SVN is far from perfect, but it seems extremely flexible to me since its underlying principles are basically that of a versioned filesystem tree.

BTW - Try svnserve + FSFS backend for good performance and stability. Also, SVN 1.2.x is supposed to be significantly quicker, but I'm still using 1.1.x.

svn speed

August 17, 2005 - 8:46am
IA (not verified)

SVN was pretty slow some time ago, but it's history. Did you tried last versions? We use it for a quite large project - 20k files, 170MB of sources, 80MB of binaries, 20 branches on 1GHz, 512ram box and it works fine (server apache2 with WebDAV).

The speed of Git

August 16, 2005 - 6:07pm
Zygo Blaxell (not verified)

Git sits on a continuum of features, access time, and storage size, which extends in one direction toward SCCS and CVS, and in the other direction toward XDFS (the XDelta File System...assuming it was ever implemented), and in a third direction towards Bitkeeper, ClearCASE, and so on (the continuum is sort of triangular ;-).

Git is really dead simple compared to the implementation complexity of CVS, SCCS, and the like. Its design is orthogonal to data storage issues (do we compress files? Which delta algorithm do we use? Who cares, someone will figure that out) and to user interface design issues (do we act like a bunch of trees, a single big tree with branches, or ordered stacks of patches? Who cares, let's do all three with room to add more later on). Git never had to bother being backward compatible with anything, which greatly accelerated its development--speaking from experience, it only takes a week or two to build a minimal SCM, for carefully restricted definitions of the words "minimal" and "SCM." Git's technical basis is a cryptographically secure data integrity algorithm, which allows it to solve most of the security problems of other SCM's in a single stroke. Finally, the words "written by Linus Torvalds" probably helped spread Git around just a little. Besides, how can you not like the name? ;-)

On the other extreme...from what I've read of XDFS (in 1998 or so), the authors were just obsessive about access latency and size, and they planned a whole lot of tricks to try to arrange versions of objects in order so that they could get minimum delta size, then used plain zlib compression on everything else. They also arranged what looked like balanced binary trees of forward and backward rsync-style deltas between versions, so that *nothing* in an XDFS was more than about 20 patches away from something analogous to the CVS HEAD revision. The whole thing would run on Berkeley Transactional Data Store (aka libdb) for data integrity. Adding arbitrary metadata and patchset support to something like that would be trivial, since it almost contains a relational database. Like Git, the XDFS design would allow for multiple user interfaces using a common storage format. I wonder if XDFS ever emerged from the land of vaporware? Maybe it became PRCS2? Maybe it just needed better marketing, or it spent too much time solving problems that only the XDFS maintainers actually have.

The refreshing thing about the Git design is that it restricts itself to exactly one of the problems that traditional SCM's have tried to solve--providing versioned views of a filesystem--and it doesn't get in the way of people who might solve all the other problems (ACID update, security, storage size, user interface). Most other SCM's try to solve all of the problems, share none of the infrastructure, maintain incompatible metadata semantics, and want to stay backward-compatible with previous versions of themselves...which means nothing ever gets done in any of them.

Another thing that happened to Git that hasn't happened to other SCM's is that a large number of relatively bright developers were suddenly and simultaneously put in a situation where they had to say "you know, the SCM we are using sucks, and it's worse than not having an SCM at all. Let's throw out our SCM today, and write a new one in a few weeks." Maybe that was Linus' plan all along. ;-)

Reiser4 & GIT

August 17, 2005 - 8:03pm

I just realised, it would be nice then to implement it on top of Reiser4, that means you get atomic commits with minimal extra code :)

Reiser's transaction stuff could well be useful for something :)

The irony of this has been am

August 19, 2005 - 12:08pm
0g (not verified)

The irony of that has been amusing me ever since Linus first called it a filesystem. ("layering violation", anyone?)

Re: Reiser4 & GIT

August 21, 2005 - 4:59am

> Reiser's transaction stuff could well be useful for something :)

I'm not so sure. Git is very careful to write data in just the right order so that the repository remains consistent at all times. Hence, any ordered filesystem will do -- including Reiser3, XFS, BSD's UFS with soft-updates, or even Ext3.

On the other hand, Git loves to create zillions of small files, so you probably want something that can handle that efficiently (i.e. Reiserfs). And a fair amount of memory -- Git's access patterns are horrible, you really want enough memory to keep all of the i-nodes under .git in cache.

Transaction support

August 24, 2005 - 12:02am

Does it also work for the case where I start pulling as you're pushing, or vice versa?

If so, I take back what I said. These kind of corner cases are why one should care about transactions (as well as rollback, but as I understand it, Reiser4 doesn't implement that yet, it only knows how to rollback every pending transaction).

Cheers,

Michael

Re: Transaction support

August 25, 2005 - 11:01am

> Does it also work for the case where I start pulling as you're pushing, or vice versa?

Yes, you'll get the older head.

See my comment about Git's atomicity properties lower down.

Git Uses

August 15, 2005 - 7:46pm
Jeff Flowers (not verified)

I find Git to be a very interesting piece of software. Could Git be useful for managing a website made up of static webpages? Also, is there anything preventing Git from being used on on platforms other than Linux, like Mac OS X and NetBSD?

I believe that GIT (with the

August 15, 2005 - 9:27pm
Brett (not verified)

I believe that GIT (with the cogito front-end) already runs on FreeBSD, so I can't see any reason it woulddn't run on NetBSD/OpenBSD/Mac.

Yup, it runs on Mac - there h

August 15, 2005 - 10:24pm

Yup, it runs on Mac - there have been posts on the git mailing list regarding it.

Yes, it could. Any modern SC

August 15, 2005 - 10:27pm

Yes, it could. Any modern SCM should be fine for the purpose of managing a load of static webpages, though (though different SCMs may give different performance I doubt it'd matter too much for a typical website).

Hey, Joey Hess!

August 15, 2005 - 11:35pm

You wrote an article called "Your Life in Subversion". Maybe now we could have one called "Your Life in GIT", or "GIT A Life". ;)

We already use it.

August 16, 2005 - 8:12am
tru (not verified)

XMMS2 developement has been done in GIT for more then 3 months now. We are very happy about it and it works really well since we came from the free version of bk before.

Our gitweb is here: http://git.xmms.se/

mantis and git

August 17, 2005 - 10:39am
Juan Liska (not verified)

How did you tie mantis to git? does scmbug have git support?

Shared git repositories

August 16, 2005 - 12:10pm
Walter (not verified)

Does anybody know whether the different projects on kernel.org/git share a common git repository? That would probably save a lot of space as all those kernel projects probably share lots of files.

Shared git repositories

August 16, 2005 - 2:42pm
sadfdas (not verified)

IIUC, every git repository is like a cell, with a complete "DNA" set for the project. No hierarchy of any sort, as could be implied with a traditional, centralized repository arrangement.

The Git repository is nothing

August 16, 2005 - 2:54pm
Walter (not verified)

The Git repository is nothing but a big hash table with the hash function being a sha1 digest over the complete file in question.
Two identical files thus share the same hash and are stored in the same location independent of their metadata.

Not yet but soon(?)

August 16, 2005 - 8:18pm
yashi (not verified)

git-clone-script now have '-s' option to share local repo. with that option and GIT_ALTERNATE_OBJECT_DIRECTORIES env ver., you can _easily_ share hashed objects.

Oh

August 17, 2005 - 3:19am
Catwalk (not verified)

Not having atomic commits is a serious lack. What happends to a GIT repos when the commit dies halfway through?

The diff algorithm, err, sucks. It takes more than a couple of months to write proper diff/merge functionality that handles the corner cases.

Other than that: good start.

Why? Git works fundamental

August 17, 2005 - 5:15am
yashi (not verified)

Why?

Git works fundamentally different in this regards. "commit" is noting more than adding new object, which holds a tree object and a comment describing why and how objects changed. if that failed, well, too bad, re-do it again. all other objects (ie. file and tree) are added when you run git-update-cache or git-write-tree. not the commit time.

plus, because git isn't server/client type of SCM, you can guaranty that you are the only one to access your _private_ repo. all others are pulling from your _public_ repo after you push. this makes Git much more simple compare to other SCM.

regarding to diff/merge, i'm just starting to read the diffcore code. so not qualified to comment on it.

The problem with lack of atom

August 17, 2005 - 7:39am
Rhiendal (not verified)

The problem with lack of atomicy is the case when pullers start pulling while you push, perhaps? I don't know the source, so this is just a guess. Another theory is that the original poster is a halfassed troll.

Re: The problem with lack of atomicity

August 21, 2005 - 4:52am

> The problem with lack of atomicy is the case when pullers start pulling while you push, perhaps?

When you do a commit in Git, Git writes out a number of files into .git/objects, and then updates a special file, .git/HEAD, to point at file indexing the new data.

Because HEAD is written after the data, anyone pulling while you commit will either see the new head, or the old one, never something inconsistent.

There are two ways in which atomicity can be lost. If two people commit to the same Git repository at the same time, you can end up with a lost update (two parallel heads, one of which is inaccessible). This is easy to solve with a lock, which is what Darcs-Git does, but the Git folks don't want to do any locking.

What Linus was getting at is the problem of Git-unaware mirrorring software modifying Git repositories. If the mirrorring software copies .git/HEAD before it copies the actual data, there will be a span of time during which the target repository is inconsistent. If you pull from such an inconsistent repository, you'll most likely get an error, but your local copy should not become corrupted.

Other distributed systems

August 17, 2005 - 4:19am
Ruda M. (not verified)

If you think that there is just a few features in GIT, then have a look at monotone. Linus considered it as an option before he started his work on GIT. If I remember right, there were some performance issues.

There are performance issues

August 24, 2005 - 4:46pm
OE Developer (not verified)

There are performance issues in Monotone. The otherwise very nice OpenEmbedded.org project uses it since some time.

When you do pull from a remote monotone repository, you'll need quite a time. To get the changes of one day, a cup of coffee is sufficient (51 changesets take about 3 Minutes). If you didn't pull for some time, then you can go for lunch or take out your girlfriend. I had pull times of more than 3 hours. While this happens, monotone uses about 100% of CPU time, mostly for cryptographic stuff.

Local operations are quite fast, e.g. an update or commit.

Because Monotone is so slow when exchanging data with other repositories, I stopped contributing to OpenEmbedded.org. It was really annoyoing just to be up-to-date.

Why does OE not use git? Because it doesn't have "git push".

we certainly found the proble

September 3, 2005 - 3:57pm
dada (not verified)

we certainly found the problem with lots of
small files on kernel.org out ;). - I think this is not true

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary