The Linux Kernel Archives are perhaps most familiar through their web interface, http://kernel.org/. The latest release of the Linux kernel is easily found here, along with patches by various Linux kernel hackers, and mirrors of other popular free and open source projects. Countless people worldwide happily rely on this archive without giving much thought to the effort behind it.
In a recent announcement to the Linux Kernel Mailing List, H. Peter Anvin detailed a recent upgrade of the infrastructure behind kernel.org. The new servers were donated by Hewlett-Packard, and are each quad Opterons with 24 gigabytes of RAM and 10 terabytes of disk space. Internet Systems Consortium, Inc. donates the bandwidth in the form of two independent gigabit-connected datacenters, PAIX Palo Alto and e200paul in San Francisco. H. Peter Anvin, Nathan Laredo, and Kees Cook all donate time to maintain the archives. KernelTrap recently spoke with Peter Anvin to learn more about the history behind the Linux Kernel Archives.
The beginning:
Peter Anvin has been involved with Linux since nearly the beginning. When Linus Torvalds purchased his first computer on which he began writing the Linux kernel, the state-of-the art PC with 4 megabytes of RAM and running at 33 megahertz was too expensive for him to buy outright. Therefore, he financed much of the nearly $3,500 price, planning to pay it off over three years. Within a year as the Linux kernel began to evolve and a community of users formed, Peter organized an online collection that raised $3,000 and paid it off.
Later, when Linus graduated from the University of Helsinki, Peter convinced him to move to California to work for Transmeta Corporation, where Peter himself had been working for about a year. At this point, the Linux Kernel Archives was born. "I've been taking care of kernel.org since its inception in 1997," Peter explained. In the beginning, the archives were housed on a generic white-box PC running the Linux kernel, connected to the Internet through Transmeta's T1. The idea was to provide Linus with a local server.
The 'kernel.org' domain name was picked because by that time in 1997 the more logical seeming Linux dot names were already taken. The Transmeta domain was intentionally not used to avoid creating the false perception that Transmeta owned Linux. "So kernel.org was taken as sort of a second choice," Peter explained, "and it has worked out obviously very well, as today it's instantaneously recognized as its own thing."
Second generation:
The original PC was replaced in 1998 with a Dual PII 550 donated by VA Linux Systems (now VA Software). Around that same time, Globix donated colocation for the server, providing a dedicated 100 megabit link at their data center in Santa Clara. Within a couple of years, the website was drawing that much bandwidth on a regular basis, and Peter noted "the relationship was getting fairly strained." In 2000 when the telecommunications industry came crashing down, Globix found that they needed to trim costs. Peter summarized, "they pretty much asked us to leave on short notice."
Third Generation:
At this point, Paul Vixie, who runs Internet Systems Consortium, Inc., contacted them to offer space at ISC's colocation in PAIX Palo Alto. "This was pretty much a dream colo for us," Peter said, "we were allowed to saturate a 100 megabit link into quite a few Internet backbones."
I emailed Paul Vixie asking for insight into how ISC is able to provide this hosting for kernel.org. He pointed out an impressive list of projects for which they currently provide hosting and explained, "we're a public benefit corporation and we do a lot of this kind of stuff. We recognize the Linux Kernel Archive project as a fellow traveler and it's clear to us that by helping Peter Anvin we help our own cause." He went on to add, "Peter Anvin's been great to work with. Kernel.org is one of our larger single traffic sources, and we're proud to be associated with it. ISC believes that our existence has an industry-wide and community-wide benefit, and that kernel.org's existence, likewise." Mr. Vixie went on to thank his friend Daryl Jones of SMRN who is helping pay for kernel.org's power and heat in the San Francisco location.
In 2001, Hewlett-Packard made their first hardware donation to the Linux Kernel Archives, a ProLiant DL380 G2 with Dual PIII's running at 1.1 gigahertz. "That machine had what then seemed an astounding 6 gigs of RAM, and a terabyte of disk," Peter said. "At that time it seemed like a way over-dimensioned system." ISC then upgraded kernel.org to a gigabit link, more bandwidth then the server could actually use. With a fair amount of tuning, they managed to get the server pushing out 600 megabits of data. That limitation, Peter explained, was because more data was being served than could fit in the 6 gigabytes of RAM, at which point disk bandwidth became the limiting factor.
Serving data with http and ftp is is not very CPU intensive, but over time the amount of rsync traffic being fed by the kernel.org server continued to increase, and rsync is CPU intensive. "That's what rsync does" Peter said, "it trades bandwidth for CPU horsepower. We were getting to the point where we had all the bandwidth, but the Dual PIII 1.1's couldn't really keep up." He noted that the load average kept growing, well into triple digits. Referring to 32-bit systems, Peter noted, "we learned that the Linux load average rolls over at 1024. And we actually found this out empirically."
Fourth Generation:
As it become more apparent that the hardware needed an upgrade, Peter began to think about preparing a request to Hewlett-Packard for new hardware. Before he even made his request, HP contacted him basically saying, "hey, we noticed that you guys have been kind of struggling lately, what do you need?" Peter provided them with his wish list, and within two weeks the decision was made and new hardware was on the way. Peter noted, " HP came to us from a quite high level. They have been absolutely great."
Matt Taggart, part of the R&D lab within HP's Open Source & Linux Organization, noted that HP is a large company and that the different donations to kernel.org actually came from different divisions. "There are plenty of people in HP that recognize the value that kernel.org provides and that benefit (both directly and indirectly through HP's customers) from having it perform well," he explained. "This time the donation came for HP's Open Source and Linux Operation R&D Lab, but in the past they have come from other places such as the Industry Standard Server Division (the folks that do ProLiant)." He went on to add, "HP's IT organizations also use Linux and are big users of kernel.org, so it benefits them as well."
As for why HP has made these donations, Matt explained, "when possible, HP likes to help Free and Open Source software projects at the source. For example, if HP wants to contribute driver fixes for a piece of equipment that we ship, it is a better use of our time to work at the kernel.org level rather than duplicating effort by working individually with each distributor (or not being able to work with some at all). Providing kernel.org hardware is an easy way for us to give back to the project that has helped save us a lot of effort."
On the wishlist that Peter provided to HP he had two main requirements, a 64-bit processor, and two identical servers to allow one to have scheduled downtime while the other could continue to function. The new servers donated by HP are ProLiant DL585 4-way dual-core Opterons, with 24 gigabytes of RAM and 10 terabytes of disk space using a pair of MSA-30 arrays for each server. "The new machines can genuinely serve all the commonly requested files from RAM," Peter said. "That was a big reason why we asked for 24 gigabytes."
One of the new servers is located in San Francisco, the other in PAIX Palo Alto. As of April 9'th, 2005, both of the new servers are online and serving traffic. Each of them should be capable of individually serving a full gigabit of traffic, though this hasn't happened yet. The CPU load average dropped from triple digits down to the low single digits.
"Each server is in a different ISC colo, connected to the Internet via gigabit fiber links," Peter summarized in an announcement to the Linux Kernel Mailing list. "Consequently, we should now see incredibly much better performance from kernel.org. Huge thanks to HP for the new hardware, and huge thanks to ISC for letting us quadruple our rack space requirements from 5U to 2x10U. We'll be saturating those links in no time :)"
Under The Hood:
The servers that power the Linux Kernel Archives have always used the Linux kernel. In the beginning, they ran the vanilla kernel, keeping up with the latest and greatest features providing the best performance. However, in the past couple of years, the archives have begun using vendor kernels. At this time, the servers run Fedora Core and use the 2.6 kernel provided by RedHat. Peter explained, "it just comes down the upgrade pipe, which makes keeping it up to date a lot simpler." He added that for this reason they will continue to use vendor kernels so long as they're not lacking any critical features. The Linux Kernel Archives began serving data with the 2.6 kernel nearly a year ago on May 24'th, 2004.
The web pages are served by Apache, upgraded to Apache 2 on December 4'th 2004. FTP is served by vsftpd, which replaced proftpd on May 26'th 2004. Beyond that, Peter noted, "very little fancy is going on, and that is good because fancy is hard to maintain." He explained that the only fancy thing being done is that all filesystems are mounted noatime meaning that the system doesn't have to make writes to the filesystem for files which are simply being read, "that cut the load average in half." Beyond that, he explained that their main requirement is that everything use the sendfile system call, "which basically says take this file and herd it out this particular TCP port. That is 99 point something percent of what we do, so that is very important to us."
Mirrors:
The Linux Kernel Archives Mirror System is managed by Kees Cook. Peter explained that this system originated back when kernel.org was using the Transmeta T1 link, "and horribly bogged down as a result." Several high bandwidth sites volunteered to act as mirrors, and a formal system was created. Essentially, each site agrees to a baseline of service, and links are provided from the kernel.org website. "We consistently have a little over 100 sites, and the number has been constant from pretty much the very beginning," Peter said. "Of course, the sites themselves change over time." When there was only one server running kernel.org, the mirrors would also take over when the main server needed maintenance.
In the current configuration there is no shortage of CPU or bandwidth, causing Peter to remark, "as far as the kernel is concerned, it wouldn't be a whole lot of skin of our teeth if the mirror system fell apart." However, for users downloading the kernel outside of North America, the mirrors are very helpful by providing them with a local source.
Other services:
Other active services fall under the kernel.org domain. For example, the Linux Kernel Mailing List is run on a server called vger. However, physically that machine has nothing to do with the Linux Kernel Archives. Peter explained that originally there was a Linux Activist mailing list run in Finland. It was eventually replaced with the Majordomo powered Linux Kernel Mailing List, managed by David Miller at Rutgers University. Later, when David went to work at RedHat and the server moved with him, some people become quite concerned about the LKML having a redhat.com domain. Peter offered at that time, "if it makes people feel better, we can make it vger.kernel.org." And so, while the server still is physically housed at RedHat, it is part of the kernel.org domain. "This is due to its function, not its location."
Bugzilla.kernel.org is another example of a server in the kernel.org domain that is housed elsewhere. In the case of the bugzilla server, it's run by OSDL. "It was because it kind of got blessed by Linus Torvalds and the general consensus of kernel developers that we put it in the kernel.org domain," Peter explained.
Bandwidth:
The normal bandwidth used by kernel.org is between 150 to 200 megabits per second, at times when "nothing major is happening," Peter said. "Quite honestly, the test releases aren't even a blip on our radar," he added, referring to the -pre and -rc kernels, explaining that they don't noticeably increase the amount of bandwidth that is consumed. Only when an official stable release is announced does kernel.org see a spike in traffic. For example, with the upcoming 2.6.12 release Peter predicted, "I expect it go to the high 200's, for about a day." He noted that even with a direct link from a busy website such as Slashdot, that was about as much bandwidth consumption as they see from a kernel release.
"What really drives up the load average is when one of the distributions that we mirror makes a release," he explained, "such as one of the Fedora cores. The kernel is only a few tens of megabytes, whereas a fedora core is a couple of gigabytes." With the upcoming release of Fedora Core 4, Peter predicts that both gigabit links will probably be saturated for 3 or 4 days. "This is largely speculation, because never before have we had the capability of serving that much traffic."
When asked about viewing the actual access logs, Peter explained that although they do occasionally get requests from various sorts of researchers, they generally don't make them available for privacy reasons. "We've only allowed access to people who are intimately involved with Linux already," he said, "people we already know." There has been discussion about making the logs available in an anonymized form, but it's not the top priority. "It gets talked about," he noted, "but it's largely a people time issue."
Making It Happen:
Currently there are three people who manage kernel.org, all to some degree in their spare time. In addition to Peter Anvin, Nathan Laredo and Kees Cook also help out. Peter, who is employed by Orion Multisystems, is in charge of the overall architectural design, providing developers with access to upload their patches, and with public relations. Nathan, who also works at Orion, maintains the system and server software as well as the web pages. And Kees, employed by the OSDL, is in charge of the mirror system. Day-to-day administration is done by whomever gets to it.
When I asked Linus Torvalds about the Linux Kernel Archives, he replied, "I have been very happily relying on others to do all the work with kernel.org." He went on to say, "I've literally never needed to lift a finger for kernel.org maintenance, which is wonderful (both for me - since I'm lazy, and for kernel.org - since I'm a total air-head when it comes to system management ;) The only thing I can add to anything is just a 'thanks for doing it' to Peter and the other people involved."
There has been talk of possibly creating a formal staff for managing kernel.org, though for several reasons it hasn't happened yet. "We would have to find a sponsor to pay for it," Peter explained, "It's not impossible, but just going out and hunting for that is a big job in itself." He went on to add that part of the reason this hasn't happened "is that both HP and ISC have been so great to deal with. They've been very low demand on our time, and they've been very forthcoming with what we need without going through rigmarole."
Officially, the site is run by the Kernel Dot Org Organization, Inc., a nonprofit corporation formed in 2001. However, a whois search reveals that the domain name is still registered by Transmeta Corporation. Peter is currently working with Transmeta to get it re-registered under the corporate entity. This originally was happening back in 2001, but was stalled due to some turmoil with Transmeta at that time, and there really hasn't been a pressing need.
The idea behind the non-profit organization was that the bandwidth consumed by kernel.org is very expensive. "I wouldn't be surprised to learn if it amounts to 1 million dollars a year," Peter said. "That's a lot of money, and we were thinking if we had to get another ISP sponsor, they'd need to be able to deduct this as a charitable expense. Currently we're doing this under ISC's umbrella." He added that the plan is to continue working with ISC as long as possible, "it's been an incredibly good relationship for us."
Thanks to the kernel.org folks
They've been doing an amazing job and it's good to know the story behind it.
great read
Its not only about being informative and detailed, but its a good story told well.
robins
Go, HP
I already own two HP printer products, and would offer even more word-of-mouth advertising if the GNU/Linux driver support were mo' betta.
Go
The small IT business I work recently bought their first HP ProLiant (a mere 360DL G4), and we couldn't be happier with it. We've always bough bargain servers from Dell before and were shocked by the features that the ProLiant line sports. Automatic RAM chip failover?!?! I don't think my PowerEdge does that!
At any rate, I don't want this to become an advertisement, but its nice to know that the server we're in love with is made by a FOSS-friendly company.
Hmm, that's exactly what I th
Hmm, that's exactly what I thougt.
We have some PEs, but now I got my hands on a Proliant.
I think we will switch to HP...
It makes you wonder if there
It makes you wonder if there will ever be a hurd.org for Linux/HURD.
Linux/HURD at Kernel.org?
On the other hand, they already have our drivers, maybe we can host them on kernel.org. It's a 'kernel' server, right?
ooooh you're in trouble!
ooooh you're in trouble! RMS is going to kick your butt for using the "L-word" it's GNU/Hurd :-)
Get that right, GNU/Hurd.
Get that right, GNU/Hurd.
Fine. GNU/Linux/Hurd Stal
Fine. GNU/Linux/Hurd
Stallmania is so prevelant about the GNU Software in Linux, so I think that they need to hold up thier end of the bargain, and acknowledge that a severe amount of the drivers are from Linux. Enough so to be called 'Linux/HURD', or 'GNU/Linux/HURD'. Or does the whole 'prepended /' only apply to userspace?
Its "GNU Hurd", and I don't t
Its "GNU Hurd", and I don't think the HURD owes much to Linux drivers.
Indeed the HURD is still rather picky about hardware it supports, unlike Linux.
The "/" denotes that Linux isn't part of the GNU project, the Hurd is part of the project, so it is properly called "GNU Hurd".
> I don't think the HURD owes
> I don't think the HURD owes much to Linux drivers.
Lots of kernel developers would dispute that...
But how about: Linux-GNU HURD
After all, Linux isn't userspace. It's kernel, and drivers, etc, etc.
> I don't think the HURD owes
Linux/GNU HURD
Funny how the credit goes one way.
It's funny how when asking for a share of the credit in calling the variant of GNU with the Linux kernel "GNU/Linux" the GNU Project is made fun of and accused of egotistical grandstanding.
But in this discussion "Linux-GNU HURD" is proposed as a viable name without any sense of humor or shame.
--J.B. Nicholson-Owens (jbn@forestfield.org)
nomenclature
Kernel = Hurd or GNU Hurd
System = GNU or GNU/Hurd
Excellent
An excellent read. Respect and kudos to all the organisations that have and are donating equipment, services, time and money.
Thanks
Thanks for your hard work and hope to see soon the bandwith bar/uptime/etc online again.
Kind regards!
Makes me wonder....
Yes, there are still these kind of people and companies in this world who think of others too ;-)
Nice to know that they still exist....
It is a really great
It is a really great article. Thanks for this, helped me a lot to understand some things.
senfile and 2.6
He says their software uses sendfile() when transmiting files from filesystem to sockets.
I thought that features was removed in 2.6?
FreeBSD feature
Just wanted to give a quick plug for the fact that sendfile() was originally a FreeBSD feature. Alan Cox (not that one) continues to optimize it with his ongoing VM work.
Heh
I continually find it a point of amusement that both Linux and FreeBSD have an Alan Cox. Watch, it'll become a prerequisite that all successful UNIX variants have an Alan Cox on their team. ;-)
blah blah blah, who the hell
blah blah blah, who the hell cares.
Interesting
Wow, what an interesting article! Thanks Jeremy :) I've been wondering a long time, just what's going on behind kernel.org. Now I know.
$1,000,000/yr bandwidth ?
The idea that two gigabit links would cost $1,000,000 per year is a bit excessive.
In reality, you can buy gigabit access at $15/megabit or even lower (Cogent, etc).
So, that's $15,000 per month, or $180,000 per year.
Now, if you are looking at two gigabit links, that's double.
But, obviously, the 95th percentile average is nowhere near 1000Mbps per month, so depending on how they buy it (with a minimum commit of 300Mbps or so, for instance), they could spend as little as $4500 per month, per 300Mbps commit.
Basically, bandwidth is cheap, folks.
You could buy those cheap one
You could buy those cheap ones, but they dont offer much support or are they trustworthy? Is the link up 24/7/365 with no packet loss + do they offer 24/7/365 physical access to the servers?
Wonder what filesystem !
Any idea, about the underlying filesystem ? I guess it could be ext3.
I guarantee it is, what else
I guarantee it is, what else would you expect?! reiser/xfs/jfs?! feh! Lets be real, they aren't stupid. And he mentioned RedHat, in that solution there is only one option. I really doubt they went through the trouble to format the disks to something else.
huh.
Read the comment below. Other than an educated guess, yours is funny. :)
ext3? guess it is
I'm afraid it's really ext3 and that has had *more* than its fair share in "high triples" of LA.
We're running free software mirror in Ukraine (ftp.linux.kiev.ua, local IX only) and you know what? Two new and shiny identical ATA spindles (Barracuda 120Gb) were added and mkfs'ed using ext3 and xfs (Linux 2.4), and one beautiful day...
...ext3 literally killed off the performance under conditions as simple as a few DSL downloads, a bunch of modem downloads and ~50Mbps inbound (ISO set coming in).
Just out of curiousity moved the whole thing to xfs (the disks were mostly free then). It Just Worked, no LA. And ACLs came handy later too. Got ext3 partition copied and wiped over with mkfs.xfs.
One of ext3 developers I know told it was due to delayed allocation missing in ext3 and implemented in xfs, and for some part I'm sure it was inefficient directory storage in ext3/2.4 (better in 2.6.x, don't remember when indexing was introduced).
The only trouble with that was power outages last fall -- when UPSes eventually went down too. xfs is quite harsh with suspicious files during recovery zeroing them out due to security concerns...
Maybe someone knowing Peter suggests to give xfs a test drive? (although for virtually in-memory file archive it's not that important)
ext3
I emailed H. Peter Anvin about the file system used. The answer is: ext3, as many guessed.
Additionally, i have confirmed the hard discs in the machines. As you know, each node is a DL585 with 2xMCA-30 hard drive enclosures.
DL585: 4x300GB
MCA-30: 14x300GB
MCA-30: 14x300GB
Giving a total of 32x300GB disks, for a total storage of 9600GB. That's rounded to 10TB for ease of consumption.
Oh, and for the poster below who complains of ext3's poor performance; it sounds like you're trying to do heavy I/0 on a pair of desktop ATA drives. Rest assured that the performance on a 28 disk RAID array composed of 10k RPM SCSI drives is significantly better. It's pretty easy for a modern machine to be significantly faster than the disks, when you only have one or two. (though your comment that XFS was much faster kinda makes what i'm saying irrelevant)
Making history one server at a time
Hi there,
Thanks for this great article, at this days finding good reading is getting harder, I enjoyed this pice a lot.
Is great to see what's under the hood of this huge, history making movements and technology, and impressive the hard they use to keep up with demand. Come on! 24gigs of RAM!
Polo
My specialty recetas