We had a problem at Korrelate: we needed to schedule heterogeneous jobs to run at irregular intervals. Some of the jobs took a long time to run, others were quick, and still others were usually quick but occasionally slow. They needed to run on several different servers. And trying to manage it all with cron was getting troublesome. In short, we needed a job queuing and scheduling system.
Possible solutions included commercial products, implementing our own solution from scratch, or building a solution from open source libraries. Commercial products looked good at first, but in the long run are potentially limiting and expensive to maintain. In particular, if we want to do something that isn’t supported, can we? And if we can’t, how much will it cost to add the feature? Since we’re still small, we can’t afford to bet on an outside vendor controlling too much of our destiny, so we looked in the direction of rolling our own solution. However, we already primarily work with open source tools at Korrelate, so before starting from scratch we looked for something we might build our solution on. And, since we’re Ruby-centric, Resque was the obvious choice. Resque is open source, free, extendable, performant (because redis is performant) and generally awesome.
Resque is a Ruby gem that supports job queueing/running based on the publish-subscribe design pattern. Say what?
We all know what this means, if not how to describe it. Send a page to a printer. Watch it print. Watch it print after documents. Watch it print after you clear the 9th paper jam, wait behind your annoying co-worker’s 50-page thesis, replace the ink, and generally solve all of the world’s problems. You queue a job, and wait for it to run. When you’re ready, you can pick up the printer’s (or queue’s) results. But the printer/queue doesn’t get back to you. It doesn’t tap you on the shoulder and say, “I’m done now.”
This is publisher-subscriber or pub-sub in action. You publish a message (a document to print) and a subscriber (printer) prints it.
Back to Resque. Created by Github to solve their own backend processing needs after other solutions fell short, Resque is Redis-backed, relatively lightweight, and can queue full Ruby objects as jobs. This makes it easy to create and manage our jobs because we can write them almost identically to the way we would if we weren’t using job scheduling.
Resque requires a monitoring solution to see when a Resque worker dies and restart the job or Resque. We decided to go with Monit, but we also considered God and a combination of Foreman and Upstart. As mentioned, Resque also requires Redis. Our initial solution was to use Redis to Go as our Redis server, but we’ve begun to run into various problems that we think may be solved by running our own Redis instance, and we’re in the process of setting that up.
So on whole, our current solution is to use Resque, backed by Redis to Go, to send jobs to queues on application and database servers to be run, and monitor our jobs using Monit. However, based on our experience and research, our goal is to migrate to using our own Redis server, have queues running only on application servers, and monitor using Foreman and Upstart. In addition, we needed a few extensions to Resque, including Resque status and Resque scheduler.
More details to come soon on our specific solution. We’re also going to be committing an open-source project/some pull requests to Resque, resque status, monit, etc. We have some extensions, some bug fixes, and some nice hackery to open to the public.
Hopefully, Resque can rescue your company from cron middle management.