My Authors
Read all threads
How I learned of the importance of idempotency for work queues: a thread.
About 9 years ago, I was the owner and sole developer at a company which delivered scheduled reminders over phone calls. The logic was, essentially, “If someone needs a call today, and they haven’t been called, put a call on the queue to them.” We ran this check every 5 minutes.
One day, I made a change to an unrelated part of the queue worker which caused them to block (not process any events, including calls). Importantly, a) they usually took 5s to run and b) we had never had something fail to execute within 5 min before.

See where this is going?
I discovered the bug late that day, because I was in the midst of an office/apartment move and my phone ended up in one of the boxes.

12 hours had elapsed and there were thousands of events on the queue. I reverted the commit that was blocking the queue and let it drain.
What I thought had happened:

OK, X,000 less important analytics events and a few dozen phone calls scheduled for this morning delivered within a half hour of scheduled time.

That was not what had happened.
What had actually happened was, every five minutes for 12 hours, an entry was added to the queue for each person who needed a phone call.

When I restarted the queue, they all fired within 2 minutes.
Now let’s talk about the scalability or cloud systems versus the scalability of a plain old telephone.

The telecom system is very capable of delivering 80 phone calls in 2 minutes.

Phones don’t tolerate that so well.
So this manifested for customers’ customers as their phones ringing off the hook. Answer a call. Hang up on it. That’s one. Phone rings immediately. Answer and hang up. That’s two.

Most people pulled the cord from the wall before counting to ~70.
When I realized what had happened it was after midnight in Japan. I was in a new apartment with no reliable phone or Internet.

I walked across town, in freezing rain, holding a laptop and wired phone, so that I could deliver apology calls to customers and customers’ customers.
I broke down crying after the first three, and called my dad, certain that I had just bankrupted the business.

We actually lost two customers. One came back after receiving an explanation.

Anyhow, that’s why you add to queues using an idempotency key.
Worth noting, since people often think that only noobs made mistakes: I was 10 years into my career at this point, have a CS degree, had run multiple businesses and written code used in production by Serious People at my day job, etc.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Patrick McKenzie

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!