Feature: Git And The Linux Kernel Archives

Submitted by Jeremy
on June 20, 2005 - 7:33am

The Linux Kernel Archives provides an assortment of methods for obtaining the Linux Kernel source code. In an earlier article [story] we spoke with H. Peter Anvin who has been maintaining kernel.org since its inception in 1997. In the beginning it operated on a generic PC connected to the Internet through a shared T1 housed by Transmeta Corporation. Since those early days, it has been upgraded several times to finally reach the current configuration which includes multiple ProLiant DL585 4-way dual-core Opterons donated by HP, each with 24 gigabytes of RAM and 10 terabytes of disk space. Both of the servers have a full gigabit connection to the Internet donated by Internet Systems Consortium, Inc.

At the time our earlier article was published, the new hardware hadn't seen much of a stress test yet. Peter had noted that with the release of RedHat's Fedora Core 4, he expected to see the internet links saturated. However, when the release finally came last week, link saturation didn't quite happen. Instead, Peter noted, "we peaked at 1600 Mbit/s for less than 12 hours or so." He suggested, "I think that was in part due to the fact that FC4 had leaked before release." He did rule out flaws in the network infrastruture as the reason they didn't reach a full 2 gigabits in download rate saying, "we got quite a few reports saying that downloading from kernel.org was a lot faster than BitTorrent, so I'm quite sure it was *not* due to upstream bottlenecks."


I asked Peter how the new servers are working, to which he replied, "they've performed beautifully." However, he did note that the introduction of git as a source control system that's housed on the same servers has been a little problematic. "This changed our usage pattern fundamentally," Peter explained. "git is extremely file (inode) intensive; in fact, the total number of files on kernel.org (excluding mirrors) has septupled since April."

While the public portion of the Linux Kernel Archives is served by the hardware described above, there is a third server, a private master server to which data is originally written and then pushed out to the public servers. The seven-time increase in the number of files being archived has caused two problems for the interaction between the master server and the public servers.

The first problem involves rsync, which is used to synchronize the two public servers with the master server. He explains, "simply using 'rsync' to synchronize takes too long, just because the file list to compare is hundreds of megabytes long." To solve for this, Peter says, "we're working on a stateful sync program." Nathan Laredo is currently working on the replacement for rsync which is hoped to be ready as soon as next week.

The second problem has to do with a hardware limitation, "master.kernel.org is still an i386 machine," Peter explained. "It's constantly hurting for lowmem since the dentry and inode caches can only live in lowmem." The solution for this problem is simpler, "we need a new master.kernel.org with a 64-bit CPU," Peter stated. Efforts are currently underway to obtain the new 64-bit hardware.

The pattern of usage of the Linux Kernel Archives has been changed by the introduction of git, however the new hardware has performed very well. Peter stressed, "the new servers (zeus1 and zeus2) have not been the problem at all; they've performed beautifully. If we hadn't had those machines we would have been in much worse shape." Once the backend master server is upgraded to be 64-bit and rsync is replaced with more intelligent software, things will be performing smoothly again. As is, most users of the kernel archives aren't even aware that the system is being stressed, as kernel downloads are as fast as ever.


Learn more about the Linux Kernel Archives in our earlier feature, "The Linux Kernel Archives".

Can anyone say which distro a

lagitus (not verified)
on
June 20, 2005 - 1:00pm

Can anyone say which distro and kernel version/patchset they are using on those beasts?

Please read the article linke

Erik Hensema (not verified)
on
June 20, 2005 - 1:34pm

Please read the article linked in this one (first link in this article):

At this time, the servers run Fedora Core and use the 2.6 kernel provided by RedHat.

So that's simply a stock kernel, not even a custom recompile. Which is what you want on a server, from a maintenance point of view. Everything you compile yourself is prone to bitrot.

Looks like Fedora http://t

kormoc (not verified)
on
June 20, 2005 - 1:42pm

openbsd.

Anonymous Coward (not verified)
on
June 20, 2005 - 2:08pm

openbsd.

The sister article goes into

MeMyself&I (not verified)
on
June 20, 2005 - 2:12pm

The sister article goes into it in better detail, but here is the highlight:

... At this time, the servers run Fedora Core and use the 2.6 kernel provided by RedHat. Peter explained, "it just comes down the upgrade pipe, which makes keeping it up to date a lot simpler." He added that for this reason they will continue to use vendor kernels so long as they're not lacking any critical features. The Linux Kernel Archives began serving data with the 2.6 kernel nearly a year ago on May 24'th, 2004. ...

Fedora Core and kernel 2.6 ac

Anon (not verified)
on
June 20, 2005 - 2:27pm

Fedora Core and kernel 2.6 according to this link.

64bit machine

Alphageek (not verified)
on
June 20, 2005 - 2:12pm

I'd recommend an old alpha. They can be obtained from various sources cheaply, even ebay. They're still beautiful machines, DEC built them to last for at least another 10 years.

replacement for rsync

Anonymouse (not verified)
on
June 20, 2005 - 2:49pm

Will it be open source?

Ask Larry McVoy

Anonymous
on
June 20, 2005 - 4:14pm

Ask Larry McVoy

Replacement for rsync

on
June 20, 2005 - 7:54pm

I'd suggest to have a look to zsync.
It's more suitable for big files, but maybe it could help you.
More info at zsync.moria.org.uk/ and http://www.google.it/url?sa=U&start=4&q=http://freshmeat.net/projects/zs...

-------------------oOOo---oOOo------------------
There are only 10 types of people in this world:
those who understand binary, and those who dont

mrtg? rrd?

bert hubert (not verified)
on
June 21, 2005 - 6:04pm

Are there mrtg/rrd or whatever bandwidth graphs available? The bwbar is cool as it is but history would be nice.

bandwidth bar history

Nathan Laredo (not verified)
on
August 10, 2005 - 4:42pm

Look again. I've finally implemented the bandwidth bar history on kernel.org.

Neither mrtg nor rrd were required, but ghostscript was. All of the graphing was done in postscript code, then ghostscript was used to render the postscript as an image.

Enjoy.
-- Nathan

Fedora leaks found.

AnonymousUbuntu (not verified)
on
June 26, 2005 - 2:32am


At the time our earlier article was published, the new hardware hadn't seen much of a stress test yet. Peter had noted that with the release of RedHat's Fedora Core 4, he expected to see the internet links saturated. However, when the release finally came last week, link saturation didn't quite happen. Instead, Peter noted, "we peaked at 1600 Mbit/s for less than 12 hours or so." He suggested, "I think that was in part due to the fact that FC4 had leaked before release."

Perhaps Fedora is just not as popular as Red Hat would have people believe? Now if you had mirrored Ubuntu on there, you'd need another line to the Internet, and for an extra day to boot. ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.