I mean it. Let me tell you why…
Basic websites have one job: respond to user requests. Over time, these requests are becoming more demanding and our latency requirements are becoming more strict. In this post, My preferred method of handling demanding requests more efficiently: don't complete the entire request.
Let’s look at a canonical example: Facebook likes. When a user first posts a photo on Facebook, they refresh the page over and over -- counting the number of likes and comments as if their lives depended on it. After the first hundred likes, most people know they’re in the clear -- they have achieved a socially acceptable level of engagement. Facebook knows this, and their engineers handle these requests accordingly. After the first few hundred likes, they begin deferring the work of updating the total number -- because it’s no longer important!
For the first hundred likes, Facebook is handling the requests synchronously, meaning the entire request is handled before a response is sent. After the first few hundred likes, Facebook begins handling the requests asynchronous, meaning they respond to the request before all of the work is done.
In this post, I’ll walk you through how I do this at my company, my thought processes around deciding specific technologies to use, and how cron jobs became my best friend.
When answering this question in a systems design interview, the immediate solution is: task queue. Simply assign work onto the task queue and have a worker pool picking them off the queue. You can even assign priorities and have the queue connect to third party services. There are plenty of benefits to this approach and I have no doubt you’ll sound smart when you rattle them off in your next interview.
In real life, I’m more cautious for one reason: it’s always more complex than you think. You’re never just using a queue -- you’ll also need some compute infrastructure, probably serverless functions. AWS Lambda and Google Cloud Functions are simple enough, but it’s unnecessarily adding complexity to your system. Setting them up has an upfront cost, but that’s nothing compared to the time it takes to manage complexity.
There’s another solution that people smarter than myself have devised: microservice architecture. It’s designed to handle this exact problem! But managing a microservice architecture requires a ton of engineering hours, which as a startup, we don’t have. So, I turn to Martin Fowler: monolith first.
To explain the rest of our thinking, it’s important to understand a bit about what we do. In short, we’re a community-based learning platform. Every morning at 6am, our platform emails a lesson to every one of our users. When completing the lessons, our users discuss their learnings. They can see their progress and all that jazz, too!
If you’re following closely, you’ll notice we have to do a lot of one thing: email notifications. We send email notifications for daily lessons, community messaging, and progress updates. We send a TON of emails!
A year ago, we had all of these emails being sent synchronously. So a user would send a message to the community, our server would accept the request, post the message to the database, send the email, all before responding to the user. To make matters worse, sending emails programmatically is complicated. They take a variable amount of time and they fail regularly.
So, I wanted to isolate them -- put them in a queue and send the emails at a regular interval. How would I do this? You guessed it -- my friend Cron Job!
Previously, when a user submitted a message on the platform (we call them reflections), we’d send an email notification before responding to the user (a.k.a. synchronously). This is what the flow looked like:
Our backend uses Django-Rest, a Python framework. But Python doesn’t have the same concurrency model as a language like Node, where you can initiate an asynchronous task and not get blocked until it returns. In Python, you have to wait for the emails to get sent out (which take anywhere between 2 to 5 seconds, based on Sendgrid’s API) before you’re able to give the user a confirmation. That’s a really long time.
My attempts at introducing threading and tuning gunicorn were previously unsuccessful (ping me if you’ve got any interesting low-lift hacks that I could use instead). Cron jobs introduce the perfect opportunity to improve this, since the sending of the notification email doesn’t need to be done immediately.
There’s one final benefit of having set up cron jobs in this way -- testing. I’m not talking about the code that the cron job triggers, but rather, everything else. Having the actions and mutations be performed by a cron job makes it much easier to test the “calculation” part of the code which is now written more purely.
A cron job is a way of scheduling commands to run at a specific time or at recurring intervals. It’s usually found on unix-based OSes, but now plenty of cloud services provide a solution that are able to either natively run some code, send HTTP requests, or do actions on third party services. We’re using Google App Engine (GAE), so I first started to look for solutions to create a cron job, which was surprisingly easy to find and use.
So all I had to do was to specify the cron.yaml to run the specific endpoint that was tasked with sending emails -- all in less than a hundred lines.
I have one point to make here and I hope I’ve done it: cron jobs are my best friend. Why? Because they made my site faster; Because they make my users happier; And because they let me defer work until later.
How have you used cron jobs? Let me know! (Really, I’m interested!!!)
Thanks for reading! If you've read this far, I hope that you found it useful. I'd really appreciate if you'd share with your friends and colleagues.
We're growing a design for non-designers community where we curate free 5 minute lessons to help you get smarter every day. For more design tips, case studies, and principles, join our community!