login
Header Space

 
 

Scaling Problems

October 24, 2007 - 10:54am
Submitted by Jeremy on October 24, 2007 - 10:54am.
KernelTrap

The past few days have been a little rough. To begin, all KernelTrap configuration files were moved into a configuration management system, but in doing so many of the configuration files inadvertently got modified. These issues were fixed as they were noticed.

Then, Monday I rolled out the new Quotes feature, and the sudden surge of traffic in the mail archives was more than our server could handle. Currently sharing web and database traffic on the same server, MySQL is forced to run with minimal RAM -- something the extra processing required by the mail archives stressed to the breaking point.

We experimented with some MySQL tunings, aiming to allocate MySQL the most possible amount of RAM without swapping. This helps, but with limited RAM it can only do so much.

I then quickly wrote a caching layer for the mailarchives -- much of the queries are very expensive, but shouldn't have to be repeatedly performed (especially for older threads that aren't changing). I deployed a first draft of the caching code this morning, which noticeably reduced the load. Of course, it's a little rough around the edges, and in particular I need to work on cache expiration. One step at a time.

Finally, I temporarily disabled the most expensive pieces of the mailarchives, searching by subject and from address. I will re-enable this once my caching layer is updated to cache these queries too.

Unfortunately this alone hasn't been enough, and we're still seeing some big hiccups. I'm continuing to dig in, trying to isolate what has changed and what is the cause of these continued failures. Oregon State University's Open Source Lab hosts the KernelTrap server, and they have also been quite helpful in this effort.

Sorry for the continued problems, but expect things to return to normal again soon.

At one point I saw several

October 24, 2007 - 3:21pm

At one point I saw several spam messages that made it through your spam filters. I thought the bad performance was because KernelTrap was attacked by spam bots.

Could it be that htis explains the high hitrate for the archives? If they really are spam bots, I guess blocking them from accessing the archives is a generous enough solution?

Or can you rule this out as a cause?

lots adds up

October 24, 2007 - 3:39pm

KernelTrap has been under a more or less constant Distributed Comment Spam Attack for the past 2 or 3 years. They're actually relatively predictable, and I've been meaning to apply what I've learned to rewriting my spam filters to greatly reduce the number that slip through. Certainly this contributes to the load we're seeing, but in and of itself it's nothing unusual.

There are also several search engine spiders indexing our mail archives, and that does cause overhead as well. However, we _want_ the search engines to index the mail archives, so obviously we're not doing anything to block it.

OSU's Open Source Lab has already allocated additional resources for KernelTrap. Tonight at 7:30pm EST we will have ~2 hours of downtime while we migrate to a new database server. That should resolve the immediate performance issues, and allow us to continue to grow.

I was just making the point

October 24, 2007 - 7:05pm

I was just making the point that the reason we're seeing problems now could be that these spam bots are also crawling mailing list archives in their futile attempt to find spammable forms. They simply didn't cause problems earlier because the number of pages on the site was small enough.

Search engines are usually not as problematic because they deliberately limit the number of hits/second to a single website.

nothing abnormal

October 24, 2007 - 8:18pm

I understand your suggestion, but reviewing the logs I've not seen anything abnormal. Of course, one of the difficulties with blocking/tracing comment spam is that it's posted automatically by a distributed network -- it doesn't come from 1 IP, or 1 IP range, it's randomly spread among who knows how many zombied Windows computers all over the Internet. I do see repeat offenders, and my filters automatically block them, but that only goes so far.

In any case, the new server is now active, and the web pages feel much snappier. Hopefully this will be the last of our downtime for a long time to come.

Ok, I see. I'm still

October 25, 2007 - 9:47am
Anonymous (not verified)

Ok, I see.

I'm still experiencing occasional MySQL connect errors, but they seem to disappear much quicker now (refresh in a couple of seconds and it works again).

server issues

October 25, 2007 - 11:31am

Yeah, the database server experienced some issues this morning. There was a two hour window during which you likely saw these errors. Things are good now, and the server is being examined to understand what went wrong / prevent it from happening again.

Code rolled back?

October 25, 2007 - 4:57am

It looks like the code for displaying the mailing list archives has been rolled back to some earlier state. The "quoted text" expansion links sometimes don't work, and a recent feature you added is no longer there (the <link rel=next> links for easier navigation). Or maybe it's a side-effect of the caching or the database migration? Just a heads up.

caching

October 25, 2007 - 6:40am

This is a side-effect of the caching. It's on my todo list to get this working.

I was aware of the quote text issue, not the link rel issue. Thanks, I hope to have time to resolve this today.

quotes fixed

October 25, 2007 - 7:06am

This quote issue was a simple fix, applied.

The link rel issue is more involved, I'm working on a fix now.

link rel fixed

October 25, 2007 - 9:50am

I've applied the link rel fix -- however, it will take a while to propogate out to the pages that were already cached. For new (previously unviewed) content it works now.

Re: link rel fixed

October 25, 2007 - 11:25am

Oh wow, thanks a lot!

Man, I walk away from the computer for a couple hours, and when I come back suddenly everything works... amazing.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary