Ruby on Rails: December 2005 Archives

So Tired, or Late Night Database Hacking

|

I was up until 2:00 am last night working on the database. I wanted to wait until the regular 1:00 am full backup completed so there would be something to go back to if I really screwed something up badly.

Using my training in the scientific method that I gained as a physics student at one of the top research universities in the country, I decided that my strategy would be to blindly poke at things and see if anything happened. And I would have to poke hard. If a little is a good thing then surely a lot must be even better. I made some adjustments to three or four settings and restarted the database. Things seemed to be working better than ever. Of course, it's hard to judge because the site always performs well under the light night traffic.

I went to sleep confident that I had accomplished something good. The sound of the pager at 4 am suggested otherwise. With a strange calmness I was able to form a bit of a theory as to why the database was freaking out and back out those changes. The clues were there earlier but at that time I wasn't aware of what they meant.

I was perhaps a little more conservative than necessary, but I waas able to leave in a change that I think will actually make the biggest difference. And with that, I've consumed pretty much all of the collective knowledge I could glean off the internet for tuning MySQL databases.

The only thing left that I can think would make a difference is to sacrifice some up to the second synchronization gurantees for reduced disk access. Dr. Brain may disagree, but I'm not convinced that an fsync() after each and every transaction is really necessary. But even this will become less of an issue when our new database server with its faster striped disks comes online.

With things working once again, I headed back to bed, only to wake up too soon to the ringing of the alarm clock. I pressed snooze for as long as I could, but I needed to be awake and ready to meet Andrej for breakfast this morning. And so I'm not sure whether I will have enough energy to carry me through the new year or whether I will just crash sometime later this evening.

Memcache Mysteries

| | Comments (2)

A little more digging into the memcache code revealed some interesting details. It looks like the root of the problem is due to socket options in the server. To get the maximum network performance the server tries to disable the default Nagle packet buffering algorithm. On systems that support the TCP_NOPUSH socket option, the server will bracket network writes within a no push section and then let the operating system send back the result as soon as all the data has been written. If the system doesn't support TCP_NOPUSH, memcache will instead fall back to TCP_NODELAY.

It looks like FreeBSD supports the TCP_NOPUSH option but it doesn't seem to work exactly the way you would want it to. Reading up on the newsgroups, it looks like there have been some proposed kernel patches to bring FreeBSD's handling more in line with what is found on Linux. I didn't really want to mess with the kernel, so I simply recompiled the memcache server to use TCP_NODELAY.

Initial testing looks good. The 100 millisecond response is now processed by the Ruby client in just over one millisecond. This is definitely much better than 100 milliseconds. I'll let the new server run on our staging machines for a while before trying to push it out to the live site.

Ruby & Memcache

| | Comments (2)

My natural distrust of the Ruby programming language might have caused me to miss something important. On 43 Things we rely heavily on memcache to offload database reads. After a bit of work simplifying and tuning the networking code, we were able to get very good response times for data lookups. Occassionally, however, the timings drifted from their usual submillisecond range to close to 100 milliseconds. I just assumed that either Ruby networking code occassionally freaked out or that we were caching complicated data structures that could take a while to parse.

Recently I've been working on some alternative algorithms to help solve some of the performance problems we've been seeing on a few of the more intensive pages. These solutions make extensive use of caching. While a single 100 ms lookup on a page might slip through without much notice, a handful of them will easily kill page serve times.

I dug into the code a little deeper and added some additional logging statements around cache access. Data marshalling costs of even the most complicated structures could account for no more than about 2 ms of the 100 ms. Something was wrong. Then I noticed that while most of the entries we read and write are fairly small--often only several hundred bytes--one entry that seemed to be performing consistently poorly was a larger 22K. Now 22K isn't that big, but it was a clue.

Last night I had downloaded the C language libmemcache client to think about whether it might make sense to ditch Ruby for a few resource intensive computations. The unit tests in the package include a benchmark app that can repeatedly make requests of a certain size to the cache server. With some trial and error I found that reading entries sized 14,304 bytes completed in about 130 microseconds, while reading entries 14,305 bytes or larger required 100 milliseconds. This is pure, untainted, wonderful C code so there's no way I can blame Ruby for these strange results.

Something strange is afoot at the memcache server....

About this Archive

This page is a archive of entries in the Ruby on Rails category from December 2005.

Ruby on Rails: November 2005 is the previous archive.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.1