Memcached and Ruby - Lukasz Wrobel

There is a long-awaited moment after launching a web application, when technical means used so far are no longer sufficient. Long story short, number of pageviews increases each day and at the same time response time increases rapidly. One of the first things which should be considered in such a case is introducing data caching.

Among filesystem and local memory, memcached is one of the most widely-used storage systems being adapted to cache data in web applications. I’ll try to show different ways of integrating memcached with web applications in order to gain performance boost. Described techniques can be applied regardless of the framework (Ruby on Rails/Merb/Sinatra) and model layer library (ActiveRecord/DataMapper/Sequel) being used. I assume that reader has at least a basic understanding of what memcached is and how it can be used to store and retrieve data.

The Quick and Dirty Way

So you have fully functional, but a little bit slow data retrieval methods? Try to decorate them with cache logic and see what happens:

#
# Get all blog posts.
#
def get_posts
  cached = memcached.get('posts')
  return cached if cached

  posts = Post.all
  memcached.set('posts', posts)

  posts
end

Here’s a somewhat more civilized example:

def get_posts
  begin
    posts = memcached.get('posts')
  rescue Memcached::Error
    posts = Post.all
    memcached.set('posts', posts)
  end

  posts
end

What’s the first impression like? Not too good, I hope. To be honest, the quick and dirty approach has only one advantage: it can be implemented quickly.

Let’s move on to the disadvantages. First, writing code in such manner causes cache logic to be spread across the whole application. Moreover, it tempts to apply caching in a implementation-bound manner. What happens if cache keys (potentially repeated in a few places) change? What if one memcached client is going to be replaced in favor of another one? Also, in case of manual cache invalidation, there will be even more cache-related logic spread across the application code.

As you can see, this way of caching data in Ruby (or in nearly any programming language) can be implemented quickly, but it causes application tiers to be mixed and may turn introducing changes in application code into a horror.

Caching Method Calls - Memoization

Instead of mixing cache logic into your methods, try to wrap method calls with cache handler. This shouldn’t be difficult, especially in view of some nice Ruby’s features.

Let’s wrap method call with a block like this:

posts = cached('posts') do
  get_posts()
end

It looks much more cleaner than the previous example, but when does the caching and retrieving data from cache takes place? Let’s try to implement the cached() method now:

def cached(cache_key)
  if result = memcached.get(cache_key)
    return result
  else
    result = yield
    memcached.set(cache_key, result)

    return result
  end
end

Laziness is a virtue, remember? The block of code associated with the cached() method call is being run only when necessary, i.e. when cache data for specific key is empty. It means that the underlying code is called only once per amount of time (providing there was no cache eviction). Of course, this simple cached() method implementation can be further extended to handle method parameters and support various types of storage incl. filesystem and local memory. Wrapping expensive code is easy in Ruby as well as in any other programming language supporting anonymous functions.

By the way, storing results of function calls is often referred to as memoization, i.e. when calculation results are stored for future reference. This is similar to dynamic programming, which allows to quickly solve some computationally complex problems.

There are some Ruby gems available which perform memoization, yet they only store results in local memory. This is what we should avoid in a distributed, web environment - a situation when each machine holds its own, possibly stale copy of data and the same call is being made on many machines over and over again, is clearly inacceptable. However, these gems can be handy to optimize Ruby programs running on a local machine.

Memoization wrapped with blocks is at least a little bit cleaner than the “quick and dirty” approach and it’s not strictly bound to implementation details. However, if you wish to invalidate cache data on demand and not rely on TTL only, you should take into account that this technique suffers from spreading invalidation logic to the same extent as the previous one.

Cache Handled Entirely by a Library

Sometimes we have a clever library at our disposal. This library is clever enough to wrap all data layer-related calls (incl. all CRUD operations) and claim full responsibility for storing results in cache and invalidating them when necessary.

Encapsulation level like this can only be achieved when all calls come through the library:

post = Post.create(:title => 'My New Ruby Post', :content => 'Some useless things...', :date => Date.today)

When we want to modify or delete a post, library should know about this, too:

post.update(:title => 'Much Better Title')

post.delete()

When writing code this way, we tell library of everything what’s going on with the object. Then library knows when to store data in cache (create()), modify it (update()) and last but not least, delete it (delete()).

In this case our code is absolutely free from cache-related logic, though we must write the program having invisible cache layer in our mind.

Having a “clever library” has many advantages, though it is far from being perfect. First, we have to get this kind of library somehow and from my experience, libraries like this are usually home-baked solutions, made either by a single programmer for his own purpose or by a specific company to be used within the company only. That is to say, choice of such libraries is limited.

I think you’ll agree this is not a big limitation; it should not be difficult to write a library like this on top of ActiveRecord or DataMapper. But there’s another problem: sometimes application logic is too complicated to be fully understand by the library, which is rather supposed to be general-purpose. When cache invalidation doesn’t directly result from deleting an object, there has to be another way to mark cached data as stale. This way we get back to the same problem which has already emerged before: manual cache invalidation.

To sum up, having a smart cache-handling library is convenient, because it lets you easily forget about memcached or any other place where cached data will be stored. Cache will be completely transparent as long as you won’t have to invalidate it manually.

View Cache

View cache is like memoization from a bird’s-eye view. You don’t care how many and how complicated methods need to be called to display a specific part of application output. You just cache it all and don’t go into details.

View cache seems to be the general-purpose method, as language or framework used is not essential. Of course, a convenient framework like Merb or Ruby on Rails will let you implement view cache in a simple and elegant way, but this is not a must.

As usual, there is at least one drawback. The main point is that view cache leads to caching the same data multiple times, which possibly means increased number of cache evictions. Also, some computations are being performed many times while their results are already sitting in memcached, though they are formatted in a different way and can’t be used. The same thing happens when you provide an API for your application - it usually returns data formatted as XML or JSON, not HTML.

But there is a huge (apart of easy implementation) benefit of view cache: this technique allows you to gain significant performance boost, especially when you use page cache. Sounds interesting?

Page Cache

Following examples are (loosely) based on Ruby on Rails, but their equivalents can be easily found in other web frameworks, such as Merb and Sinatra. Take a look at this simple controller:

class BlogController < ActionController

  def about_me
    @info = Me.get_info()
  end

  def posts
    ...
  end

end

It’s a simple blog controller, able to display posts and “about me” page. Since we know that “about me” page rarely changes, we can page-cache it:

class BlogController < ActionController

  caches_page :about_me

  def about_me
    @info = Me.get_info()
  end

  def posts
    ...
  end

end

Code of the about_me action didn’t change at all, however, its result became page cached. What does it mean in practice? When first run, this action will generate HTML output which Rails will store in a file located in the public directory. I think you might have guessed what happens then - web server will treat /about_me as a static page and serve it directly from disk, not even launching the Rails framework. I hope I don’t have to convince you this is significantly faster than firing any Ruby-based web framework.

By the way, page cache doesn’t have anything to do with memcached, but it’s just too important to be concealed.

Action Cache

Action cache is similar to page cache, though the HTML output is not being saved on disk directly. Instead, it is kept in a cache store and displayed only when before filters allow to do so:

class SecretController < ActionController

  before_filter :check_access
  caches_action :classified

  def classified
    @data = XFiles.secret_data
  end

end

When user wants to display the “classified” page, check_access before filter is always run. When it says everything is OK, Rails looks into cache store (e.g. memcached) for output of the check_access() action and sends it back. It’s a little bit more flexible than the page cache, but it’s also slower. Nothing to add.

Fragment Cache

Fragment cache is also being referred to as partial cache. It’s useful when not the whole response, but only its fragment (e.g. a list of active users displayed in block) can be cached. Take a look at this piece of Embedded Ruby (ERB) code:

<h1>Dashboard<h1>

<p>Your name: <strong>Łukasz Wróbel</strong></p>

<% cache do %>
  <ul>
    <% User.active do |u| %>
      <li><% = u.name %></li>
    <% end %>
  </ul>
<% end %>

Memcached will do just fine for storing fragment cache. As you can see, fragment cache simply boils down to wrapping method calls in blocks in order to cache their results, which is what we’ve already done before.

All view caching methods talked over so far have one common disadvantage: they require you to manually invalidate the cache, either by writing Rake tasks to delete generated files in case of page cache or to write so-called cache sweepers. In case of storing data in memcached, though, you can set Time To Live (TTL) and just wait for cache to expire. Of course, you can do it only if you can admit of some inconsistencies.

Conclusion

As you can see, there are many techniques available for developers who want to speed up their web applications written in Ruby. In most cases one can go without memcached and just store data in filesystem or local memory, but only memcached combines speed with ability to work well in a distributed environment.

There are even more powerful solutions available, including reverse proxy cache implementations like Squid and Varnish, which can gain thousands of requests per second on a single machine. There are also similar, but Ruby-based solutions, communicating via the Rack interface. However, all of these solutions operate in a much different way than techniques described in this article and they require a detailed analysis, including introduction of some of the not well-known properties of the HTTP protocol.

To sum up, whether you’re using Rails, Merb, Sinatra or any other framework, sooner or later you will encounter efficiency problems. I hope my article is a good place to start and in conclusion, I can only wish you performance problems with your web applications - it usually means you’re on the right way.