Posted by tobi — 01:14 PM Nov 15
So there is a lot of talk about Phusion Passenger lately and I feel the need to chime in here. David pointed out that Shopify is running on passenger which is something I announced on Twitter a few months ago.
Some context on Shopify’s installation: We launched Shopify originally on Lighttpd with FastCGI and later migrated to nginx with mongrels. Obviously we had to use HAProxy between Nginx and mongrels to avoid the dreaded “queue behind long running process” problem. We also added Monit to the mix which observed all mongrels to make sure that everything is running according to plan. After a process reaches 260 mb of memory we signal it to shut down after the next request so that a new one can start out with less memory. For this we added runit to the mix which supervises the mongrels and starts them up quickly once they hit the ground.
It’s important to note that we are not talking about a memory leak here. The reason for the 260mb ceiling comes from two issues with Ruby’s garbage collector:
- It allocates memory in very large chunks once the available memory gets low. This means a 140mb process increases to 260mb in a single go. It also never gives memory back to the operating system because Ruby’s GC is not able to move objects. Once it adds an object into the newly allocated space and that object remains alive, it cannot yield memory back to the OS.
- Because Ruby’s garbage collector uses mark and sweep it has to traverse the entire memory space in search of pointers. There are no generations that help with that. It means that GC cycles become longer and longer the more memory is available.
-Rails mitigates these issues by moving a full GC run behind a HTTP response, into the time period when the process is waiting for a new request(Update: Rails doesn’t do this anymore) but performance monitoring tools such as NewRelic clearly show that average response times is directly correlated with the amount of memory used across the server farm.
Now why did we switch to Passenger? Simple: the keyword is remove moving parts.
Every additional tool you add will come with it’s own bugs. Many people I talked to over the past years considered haproxy to be the most solid piece of infrastructure in their stack but even there was a really nasty bug recently (search for request queue handling).
We treat our server farm very similar to Shopify’s codebase. We are in this for the long haul and we cannot accept complex solutions when simple ones present themselves. Maintainability of our code and servers is paramount to the long term success of our product. Yes the Mongrel setup worked very well but Passenger allowed us to remove: Nginx, Haproxy, Runit and Monit. That’s a nice refactoring!
At the same time Passenger introduced some tangible improvements. We switched to enterprise ruby to get the full benefit of the COW memory characteristics and we can absolutely confirm the memory savings of 30% some others have reported. This is many thousand dollars of savings even at today’s hardware prices. We allow Passenger to adaptively spawn more processes with demand but most of the time our application servers are running about 40 processes to handle more than a million dynamic requests a day. However, because passenger constantly despawns and respawns rails processes they always stay fresh, run short GC cycles and are generally a lot more responsive. All this means that the total amount of memory that is used by Shopify during normal operations went from average of 9GB to an average of 5GB. We evenly distributed the savings amongst more Shopify processes and more memcached space which moved our average response time from 210ms to 130ms while traffic grew 30% in the last few months.
In conclusion: I cannot see any reason to choose a different deployment strategy at this point. Its simple, complete, fast and well documented.


