Memcached has long been the answer to most questions containing the word scale.
There are some spectacular memcached installations out there. Facebook is said to run a 200 server with 3TB of memory
solely for servicing memcached; Shopify, twitter, digg, Slashdot and just about every other public facing application
depends on it. Facebook’s installation is said to deliver a 99% cache hit rate while servicing tens of thousands of
requests a second.
There are many ways to use this elaborate hash table and many ways which are more trouble then they are worth.
In our experience the key to use memcached effectively is to ask it for the exact thing you want, but i’m getting ahead of
myself.
A common pattern to using memcached is the following
class Product < AR:B
def load(id)
Cache.get(key, self) || Cache.set(key, find(id))
end
def after_save; Cache.expire(key); end
def after_destroy; Cache.expire(key); end
def key
"#{table_name}/#{id}"
end
end
The issue is that this model only caches on a per object basis. But the real database load comes usually from loading
collections. Storing a collection in memcached is harder because you have to start tracking the objects in the collection somewhere
so that you can efficiently expire the collection once one of its items is changed. And that way, he knew, lay madness.
In Shopify’s case, what we really need, is to cache all the required data to render a given public URL.
Two requests to the same URL should always yield a cache hit given all input parameters being equal.
In code this could look something like this:
cache params.values.sort.to_s do
... load all data ...
end
Of course you have to keep track of all the keys you store in memcached now. A database table will do nicely here.
class CacheKey < AR:B
def after_destroy; Cache.expire(key); end
end
cache key = params.values.sort.to_s do
... load all data ...
CacheKey.create :key => key
end
CacheKey.destroy_all # Sweep cache
So far so good.
This has been the traditional approach and has worked somewhat.
I’m here to offer a better solution here though:
Ask for the thing you need, be specific: The complexity to the above solution comes from the simple fact
that we formulated our question to memcached too vague. Ask yourself what you really require from memcached and then ask it for exactly that.
Consider this: When a product is updated all current urls should be invalidated because they are outdated. Shopify allows
the designers to reference a product from any page in the system so we have to run a full sweep.
Without informing memcached that its caches are stale it will continue to deliver this stale data
and customers will continue to see the old version of the product. A clear miss-understanding between shopify and memcached.
The solution is simple:
At the beginning of each request we load a shop object which we pick depending on the incoming host name.
We use the fact that we always load this shop model anyways and add versioning to it.
This version column is incremented every time we want to sweep all caches.
Now we add the version number to the cache keys:
cache shop.version + params.values.sort.to_s do
... load all data ...
end
this means that we will never get an outdated version from the caches because we ask them for a very specific thing. After
the version number is increased in the database all incoming requests will miss the caches but will be re-cached quickly.
Memcached will automatically get rid of the stale keys once space is needed, least recently used keys are discarded first
so there is no need for manual cleanup.
In Shopify we use this technology as a way to do Page caching. We keep the rendered HTML, HTTP return status code and Content-Type in
memcached and use all the differentiating input variables as keys such as content of the shopping cart. We keep the HTML because
this saves our server cluster valuable bandwidth by avoiding loading and compiling the liquid templates from the NFS server.
Requests for cached documents are now rendered in sub 10ms regions.
To summarize Shopify asks memcached politely to: “Hand over version 55 of the index html for www.snowdevil.com the way it would look like with one Draft 151cm snowboard in the cart”.
A very specific question for which there is only one valid answer, the exact data we want, stale data can never
be returned because everything which would make it stale will increase the version number.
Quick remark. When you use memcached in ruby make absolutly sure that you use memcache-client as it’s the fastest and most used ruby implementation of the protocol.