My natural distrust of the Ruby programming language might have caused me to miss something important. On 43 Things we rely heavily on memcache to offload database reads. After a bit of work simplifying and tuning the networking code, we were able to get very good response times for data lookups. Occassionally, however, the timings drifted from their usual submillisecond range to close to 100 milliseconds. I just assumed that either Ruby networking code occassionally freaked out or that we were caching complicated data structures that could take a while to parse.
Recently I've been working on some alternative algorithms to help solve some of the performance problems we've been seeing on a few of the more intensive pages. These solutions make extensive use of caching. While a single 100 ms lookup on a page might slip through without much notice, a handful of them will easily kill page serve times.
I dug into the code a little deeper and added some additional logging statements around cache access. Data marshalling costs of even the most complicated structures could account for no more than about 2 ms of the 100 ms. Something was wrong. Then I noticed that while most of the entries we read and write are fairly small--often only several hundred bytes--one entry that seemed to be performing consistently poorly was a larger 22K. Now 22K isn't that big, but it was a clue.
Last night I had downloaded the C language libmemcache client to think about whether it might make sense to ditch Ruby for a few resource intensive computations. The unit tests in the package include a benchmark app that can repeatedly make requests of a certain size to the cache server. With some trial and error I found that reading entries sized 14,304 bytes completed in about 130 microseconds, while reading entries 14,305 bytes or larger required 100 milliseconds. This is pure, untainted, wonderful C code so there's no way I can blame Ruby for these strange results.
Something strange is afoot at the memcache server....

you rock, bob. you can save the world by fixing memcached to scale to 15k entries without the 1000x slowdown. That's super significant.
I don't know shit about memcached but that sounds like the difference between a disk hit and a memory hit, except the whole point of memcached is to read from memory only, right, so, uh, unless there's swapping, but i wuoldn't think swapping--which would be an os thing, based on environmental crap--would be pinpointable to a one-byte boundary like that.
so I know you would have already done this but are there any weird hard-coded buffers that are that size? i mean 14304 is 14k minus 32 bytes which sounds like a header block to me. So, uh, does it use 14k buffers, and something really dingy for overflow?
It's almost like I'm still a computer scientist!
I ran a strace on the memcache server and it looks like the write completed in under a millisecond. If it's not a problem on the server then does that mean it's a client issue? But we're getting the same bad behavior on both the Ruby client and the C client.
The 14K - 32 bytes does seem awfully suspicious. And so the plot thickens....