The Linux Kernel Archives provides an assortment of methods for obtaining the Linux Kernel source code. In an earlier article [story] we spoke with H. Peter Anvin who has been maintaining kernel.org since its inception in 1997. In the beginning it operated on a generic PC connected to the Internet through a shared T1 housed by Transmeta Corporation. Since those early days, it has been upgraded several times to finally reach the current configuration which includes multiple ProLiant DL585 4-way dual-core Opterons donated by HP, each with 24 gigabytes of RAM and 10 terabytes of disk space. Both of the servers have a full gigabit connection to the Internet donated by Internet Systems Consortium, Inc.
At the time our earlier article was published, the new hardware hadn't seen much of a stress test yet. Peter had noted that with the release of RedHat's Fedora Core 4, he expected to see the internet links saturated. However, when the release finally came last week, link saturation didn't quite happen. Instead, Peter noted, "we peaked at 1600 Mbit/s for less than 12 hours or so." He suggested, "I think that was in part due to the fact that FC4 had leaked before release." He did rule out flaws in the network infrastruture as the reason they didn't reach a full 2 gigabits in download rate saying, "we got quite a few reports saying that downloading from kernel.org was a lot faster than BitTorrent, so I'm quite sure it was *not* due to upstream bottlenecks."
I asked Peter how the new servers are working, to which he replied, "they've performed beautifully." However, he did note that the introduction of git as a source control system that's housed on the same servers has been a little problematic. "This changed our usage pattern fundamentally," Peter explained. "git is extremely file (inode) intensive; in fact, the total number of files on kernel.org (excluding mirrors) has septupled since April."
While the public portion of the Linux Kernel Archives is served by the hardware described above, there is a third server, a private master server to which data is originally written and then pushed out to the public servers. The seven-time increase in the number of files being archived has caused two problems for the interaction between the master server and the public servers.
The first problem involves rsync, which is used to synchronize the two public servers with the master server. He explains, "simply using 'rsync' to synchronize takes too long, just because the file list to compare is hundreds of megabytes long." To solve for this, Peter says, "we're working on a stateful sync program." Nathan Laredo is currently working on the replacement for rsync which is hoped to be ready as soon as next week.
The second problem has to do with a hardware limitation, "master.kernel.org is still an i386 machine," Peter explained. "It's constantly hurting for lowmem since the dentry and inode caches can only live in lowmem." The solution for this problem is simpler, "we need a new master.kernel.org with a 64-bit CPU," Peter stated. Efforts are currently underway to obtain the new 64-bit hardware.
The pattern of usage of the Linux Kernel Archives has been changed by the introduction of git, however the new hardware has performed very well. Peter stressed, "the new servers (zeus1 and zeus2) have not been the problem at all; they've performed beautifully. If we hadn't had those machines we would have been in much worse shape." Once the backend master server is upgraded to be 64-bit and rsync is replaced with more intelligent software, things will be performing smoothly again. As is, most users of the kernel archives aren't even aware that the system is being stressed, as kernel downloads are as fast as ever.
Learn more about the Linux Kernel Archives in our earlier feature, "The Linux Kernel Archives".