many many DJs

Posted by tobi — 10:16 AM Mar 23

I just checked in a new version of the delayed_job plugin which handles background processing of long running tasks in Shopify.

DJ is now fully parallelizable and does not require any global locking anymore. This means that you can run as many worker processes as you want across your server farm if you need to speed up the queue processing.

This became necessary when we kicked off a full search server reindex recently and realized that a single worker process would require 48 hours to complete the task. Such is the burden of success.

This feature is DB independent and doesn’t rely on row level locking. I found that row level locking lead to a lot of unnecessary lock timeout waits. If you are updating from the previous version of the plugin please be advised that there are two new columns you have to add.

Grab the latest version form http://github.com/tobi/delayed_job/tree/master

Comments

  • Mazdak Rezvani 23 Mar 11:29

    Hi Tobias

    I am interested to see how your solution will work. Row level locking is a real pain (especially in MySQL).

    I think this solution might eventually hit a wall as the number of requests start increasing and errors start piling up. We’ve been using Database-based message queues on BubbleShare for over 2 years now and it becomes painful when failed jobs start piling up due to an unforeseen error.

    We are just testing out a Starling-based system to see how it performs.

    Thanks.

  • tobi 23 Mar 18:11

    The plugin, as everything i do, is incrementally designed. It was a fine solution for what we needed in the beginning and, as i said in the article, it ceased to be so i added the needed features.

    Here is what I did (simplified):

    1. Select 1-5 jobs from the db, top of the queue
    2. Lock the row with an update statement: UPDATE locked_by=workerpid, locked_until=5.minutes.from_now WHERE id=? AND (locked_by is NULL and locked_until < NOW )
    3. Check affected rows, if 1 we got the lock and yield it from the Delayed::Job.reserve method.
    4. If affected_rows != 1 we take the next job and try to lock this. If we cannot get a lock we get more jobs from the db.

    Selecting batches of 5 jobs as raw material is a little optimization which I added which evens out the distribution of tasks amongst the workers. Its still not 50/50 but it doesn’t need to be either for my uses.

    If there is a failure in the queue it will exponentially increase the run_at time and will retry the task later. We have the number of failed jobs in the queue very prominently in our Shopify internal admin interface so someone will notice quickly and look into it.

    I hope this helps. Let me know if you have any advice.

  • Morten 31 Mar 10:18

    Hi Tobias.

    Thanks for DJ (and all your great plugins). I’m replacing BackgrounDRb with DJ for async processing of long running tasks – what bliss!

    Basically we create a job on a certain request, and then run that in the background and have an AJAX based polling mechanism. But I don’t want to enqueue the same job multiple times (separate users clicking the link around the same time).

    Thus, I’ve added an ‘identifier’ column to delayed_jobs, and a Job.enqueue_unique(identifier, object) method. The use case is to be able to determine if a given job has already been enqueued, without having to run through, and deserialize, all the jobs. Can I persuade you into supporting this use case? I really prefer to keep my plugins pristine… :-)

    Thanks.

  • Morten 31 Mar 10:30

    Hey, just an addenum to my last comment. I decided to drop the “identifier” and use a hash of the job object instead.

    http://pastie.caboo.se/173161

    Br,

    Morten

Commenting are now closed…