Delayed Job (DJ)

Posted by tobi — 04:04 PM Feb 17

I finally got a github invitation and used the opportunity to release another Shopify extractions.

Delayed::Job or DJ is a asynchronous priority queue which only relies on a simple database table. It doesn’t require you to run a dedicated server like many other systems do.

We use for a lot of longer running tasks in Shopify such as sending newsletters, uploading files to s3, downloading images from urls, indexing products to Solr and so on.

There are two ways to add jobs to the queue:

Jobs are simple ruby objects with a method called perform. Any object which responds to perform can be stuffed into the jobs table. Job objects are serialized to yaml so that they can later be resurrected by the job runner.


  class NewsletterJob < Struct.new(:text, :emails)
    def perform
      emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
    end    
  end  

  Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', Customers.find(:all).collect(&:email))

There is also a second way to get jobs in the queue: send_later.


  BatchImporter.new(Shop.find(1)).send_later(:import_massive_csv, massive_csv)                                                    

This will simply create a Delayed::PerformableMethod job in the jobs table which serializes all the parameters you pass to it. There are some special smarts for active record objects which are stored as their text representation and loaded from the database fresh when the job is actually run later.

The plugin can be found on github.

6 comments (closed) Filed under: Code Rails

ActiveMerchant PDF

Posted by tobi — 06:07 PM Jan 29

If you are working on a ruby application that requires dealing with credit cards, you are probably using ActiveMerchant. If not, you probably didn’t know about ActiveMerchant.

ActiveMerchant is an extraction from Shopify. It’s a simple to use library which translates one common interface into the wire language of 30-40 different payment processors around the globe with more added at rapid pace. As long as your application can talk to active merchant you can switch payment providers with a single line of code.

Treat yourself to Cody Fauser’s excellent ActiveMerchant PeepCode PDF which is an in depth discussion about the library and covers topics such as order pipelines, order state management and the appropiate unit testing which a financial application requires.

Cody is the main programmer ActiveMerchant which I originally started. Cody took the library further than anything I envisioned and it’s now one of the most competent libraries for ruby.

0 comments (closed) Filed under: Code Rails

The Secret to Memcached

Posted by tobi — 11:50 AM May 22

Memcached has long been the answer to most questions containing the word scale. There are some spectacular memcached installations out there. Facebook is said to run a 200 server with 3TB of memory solely for servicing memcached; Shopify, twitter, digg, Slashdot and just about every other public facing application depends on it. Facebook’s installation is said to deliver a 99% cache hit rate while servicing tens of thousands of requests a second.

There are many ways to use this elaborate hash table and many ways which are more trouble then they are worth. In our experience the key to use memcached effectively is to ask it for the exact thing you want, but i’m getting ahead of myself.

A common pattern to using memcached is the following


class Product < AR:B

  def load(id)
    Cache.get(key, self) || Cache.set(key, find(id))
  end

  def after_save; Cache.expire(key); end
  def after_destroy; Cache.expire(key); end

  def key
   "#{table_name}/#{id}" 
  end
end

The issue is that this model only caches on a per object basis. But the real database load comes usually from loading collections. Storing a collection in memcached is harder because you have to start tracking the objects in the collection somewhere so that you can efficiently expire the collection once one of its items is changed. And that way, he knew, lay madness.

In Shopify’s case, what we really need, is to cache all the required data to render a given public URL. Two requests to the same URL should always yield a cache hit given all input parameters being equal. In code this could look something like this:


cache params.values.sort.to_s do
  ... load all data ...
end

Of course you have to keep track of all the keys you store in memcached now. A database table will do nicely here.


class CacheKey < AR:B
  def after_destroy; Cache.expire(key); end
end

cache key = params.values.sort.to_s do
  ... load all data ...
  CacheKey.create :key => key
end

CacheKey.destroy_all # Sweep cache

So far so good.

This has been the traditional approach and has worked somewhat. I’m here to offer a better solution here though:

Ask for the thing you need, be specific: The complexity to the above solution comes from the simple fact that we formulated our question to memcached too vague. Ask yourself what you really require from memcached and then ask it for exactly that. Consider this: When a product is updated all current urls should be invalidated because they are outdated. Shopify allows the designers to reference a product from any page in the system so we have to run a full sweep. Without informing memcached that its caches are stale it will continue to deliver this stale data and customers will continue to see the old version of the product. A clear miss-understanding between shopify and memcached.

The solution is simple: At the beginning of each request we load a shop object which we pick depending on the incoming host name. We use the fact that we always load this shop model anyways and add versioning to it. This version column is incremented every time we want to sweep all caches.

Now we add the version number to the cache keys:


cache shop.version + params.values.sort.to_s  do
  ... load all data ...
end

this means that we will never get an outdated version from the caches because we ask them for a very specific thing. After the version number is increased in the database all incoming requests will miss the caches but will be re-cached quickly.

Memcached will automatically get rid of the stale keys once space is needed, least recently used keys are discarded first so there is no need for manual cleanup.

In Shopify we use this technology as a way to do Page caching. We keep the rendered HTML, HTTP return status code and Content-Type in memcached and use all the differentiating input variables as keys such as content of the shopping cart. We keep the HTML because this saves our server cluster valuable bandwidth by avoiding loading and compiling the liquid templates from the NFS server. Requests for cached documents are now rendered in sub 10ms regions.

To summarize Shopify asks memcached politely to: “Hand over version 55 of the index html for www.snowdevil.com the way it would look like with one Draft 151cm snowboard in the cart”. A very specific question for which there is only one valid answer, the exact data we want, stale data can never be returned because everything which would make it stale will increase the version number.

Quick remark. When you use memcached in ruby make absolutly sure that you use memcache-client as it’s the fastest and most used ruby implementation of the protocol.

32 comments (closed) Filed under: Code Rails

OpenID is taking hold

Posted by tobi — 08:27 PM Mar 08

Shopify now supports OpenID as an first class login method. After our friends at 37signals figured out how to do the present a pleasant UI and David created the wonderful open_id_authentication plugin, it was just the matter of a single workday to bring OpenID to the tousands of Shopify stores out there.

Awesome technology for awesome products. OpenID’s future is as bright as can be :)

4 comments (closed) Filed under: Rails

Query Cache

Posted by tobi — 06:52 PM Feb 14

Rails 2.0 is getting its first bunch of new features. One of the first additions was a new set of clothes for the query cache. David started this feature quite a while ago but never finished it and it wasn’t activated.

Here is the catalyst which prompted the development:


 Blog.find(1).articles.each { |a| puts "#{a.blog.title} #{a.title}" }
   => Blog Load (0.000461)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Article Load (0.000521)   SELECT * FROM articles WHERE (articles.blog_id = 1)
   => Blog Load (0.000461)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Blog Load (0.000460)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Blog Load (0.000469)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Blog Load (0.000460)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Blog Load (0.000462)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   [..]

Painful stuff. This is because each belongs_to has its own cache. So for each article the blog object has to be loaded again even though we just received a perfectly capable set of data from the database a second earlier.

With a recent change you can now do the following:


Blog.cache do 
   Blog.find(1).articles.each { |a| puts "#{a.blog.title} #{a.title}" }
   => Blog Load (0.000461)   SELECT * FROM blogs WHERE (blogs.id = 1) 
   => Article Load (0.000521)   SELECT * FROM articles WHERE (articles.blog_id = 1)
end

For the duration of the cache block the same query will not be run twice ( unless you run any INSERTS or UPDATES, in which case all the cache is flushed to disk )

For Rails 2.0 we hope to cultivate these humble beginnings into an automatic cache so that all read queries are only run once per rails request if possible.

7 comments (closed) Filed under: Rails