At Reflektive, we offer the option of sending reminder emails after kicking off performance review cycles in order to notify participants of that cycle that they should fill out their reviews. We used Delayed::Job, an asynchronous priority queue system pioneered by the Shopify engineering team, to handle the delivery of those emails.
However, we noticed that there was a huge bottleneck when it came to delivering emails to cycles with large numbers of participants. For instance, one company that had approximately 30,000 employees would clog up our worker queues for hours, preventing other companies from sending out their own emails while this large job was still being processed.
One solution might have been to just throw more hardware at it, to spin up more DJ workers to hammer through the job queues. However, we decided to explore the option of using Sidekiq as an alternative asynchronous job processing system.
Some of the factors leading to choosing Sidekiq included the fact that Sidekiq has higher concurrency than Delayed::Job because it leverages threads as opposed to single processes.
In addition, Delayed::Job stores its queued jobs in a database table (in our case, Postgres) while they’re waiting for a DJ worker to process it. We opted to pair Sidekiq with Redis, an in-memory datastore. This results in a much lower I/O cost because the threads processing Sidekiq jobs don’t have to make database queries every time they fetch a new job.
Below are some of Sidekiq’s performance statistics (obtained from their Github page)
Converting a Delayed::Job to Sidekiq
The process for converting a Delayed::Job worker to Sidekiq was fairly straightforward. The general structure of our DJ email worker code looked something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The equivalent Sidekiq implementation of that email worker would look like this:
1 2 3 4 5 6 7 8 9 10 11 12
It’s worth noting that Sidekiq will perform a JSON dump and load of arguments passed into its workers, so it only accepts pure JSON data types. This increases the consistency of data for edge cases where data has changed from underneath in the time that a job has spent enqueued and waiting for a worker to process it, but it will increase the total number of database calls made.
Testing Sidekiq Jobs
Of course, for every piece of code, we have to write corresponding tests. Below are various methods of testing Sidekiq jobs in Minitest. They are all different ways of testing the same thing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Configuring Sidekiq Workers
Now that we have set up enqueueing Sidekiq jobs, it is a simple matter to spin up a process to start working on those jobs. The Sidekiq rake command allows you to configure the concurrency, weight, and queues for each process as follows:
-c flag controls the concurrency of the worker, in this case, 20. The
-q commands specify which queues this worker will be pulling jobs from, and the
,1 after the queue names indicate the weighted priority given to these queues. In this case, jobs from the
process_email queue will be processed twice as frequently as those from the
We use NewRelic to monitor our application’s performance, and we wanted to be able to leverage NewRelic to provide further transparency and statistics for our new Sidekiq queues as well. To this end, we set up a Heroku dyno to continually publish our Sidekiq queue sizes as custom events to our NewRelic instance. This way, we can see whether or not a queue is backed up, investigate the causes of any bottlenecks that we observe, and determine whether or not we should allocate more workers to process jobs.
The code we used to publish custom NewRelic events looked something like as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
And afterwards, we were able to query those custom events in NewRelic and compile a chart to visualize how our Sidekiq queue sizes vary over time.
Results and Next Steps
After some monitoring, we discovered that switching from Delayed::Job to Sidekiq resulted in about a 4.5x improvement in throughput! We were able to complete a 30,000 employee email campaign in under an hour. We did hit some snags where we discovered that the concurrency we had set for our Sidekiq workers was a little too high and was resulting in a lot of open database connections at once, and we ended up scaling that down.
Moving forward, we are pushing to convert more Delayed::Job work over to Sidekiq.