In honor of someone’s bad bug today, I will retell a story of my worst bug:
Once upon a time I was the CEO and entire engineering team of a company which sent appointment reminders.
Each reminder was sent by a cron job draining a queue. That queue was filled by another cron job
Reminders could fail but the queue draining job had always been bulletproof and had never failed to execute or take more than a few seconds to complete. It ran every 5 minute.
So I had never noticed the queue *filling* job wasn’t idempotent.
Idempotent is a $10 word for a simple concept: an operation where you get the same result no matter how many times you run it.
Adding 2 + 2 is idempotent. Creating a new record in your database may not be; the number of rows in the DB goes up each time.
One day, for the first time ever, the queue draining job broke and could not be restarted. This was a result of a trivial code push I had made to an unrelated part of the code base late in the day, prior to an apartment move.
Of course, being responsible, I was paved immediately
However, that was back during those pre-iPhone years where my cell phone was “a useful tool” rather than “extension of brain”, and like many useful tools it ended up in a box on the moving truck, trilling merrily for 13 hours.
Later that night, while unboxing things, I got the page, realized there were thousands of undrained events in the queue, and panicked. So I reverted the bugged deploy and restarted the queue workers. Queue quickly drained.
Crisis averted, right?
At 2 AM in the morning I woke from a nightmare caused by system engineer spideysense. “Wait wait wait there were THOUSANDS of events on the queue? Shouldn’t it have been a couple dozen at that hour?”
And then I realized with dawning horror what I had done.
In the 13 hours the queue had been broken, cronjob #2 had been dutifully asking “Have we called Client of Customer #437 about their Friday appointment?”, gotten a no from the DB, and then dutifully queued up a call.
Every five minutes.
Resulting in 13 * (60 / 5) calls.
We did not spam the heck out of one client’s inbox. Oh no.
We spammed the heck out of every client, of every customer, who had an appointment that day.
But worse, because a key feature of our service was that it didn’t just email; it would escalate to SMS and then phone.
And since the blissfully ignorant queued calls thought “OMG I am so late better urgently tell the person about their appointment” most of them chose to escalate to a call.
Now there is a word for what happens when many independent systems simultaneously try to restart.
We call it a “thundering herd.” It routinely brings down systems built for massive scalability like web tiers, APIs, databases, etc.
You know what is not designed for massive scalability? A residential plain old telephone.
Plausibly you might not even know what happens if, say, 50 people all call you at once. I’ll tell you. They keep ringing indefinitely while you have a conversation and hang up, at which point your phone will immediately start ringing again.
You can repro w/ 50 patient friends.
Computers are very, very patient, and so they dutifully lined up and let the phone ring until it was answered or went to answering machine, 50 times consecutively.
Or, as more commonly happened, the immensely frustrated person physically disconnected their phone line.
Now this would have been bad enough. But what did those 50 calls start with?
“This is Dr. Smith’s office. He’d like to remind you that...”
So you can imagine who got the call an hour later. From every patient they were supposed to see that day.
Then they sent me an email.
And so it was that at 2 AM local time I had a long stack of very irate emails from literally every client of my company and no working internet to answer them from (new apartment).
I still had the keys to old apartment, which had working Internet. Problem: across town, no way to call taxi because small town Japan and no functioning phone.
Solution: pack up laptop, landline, and heater into backpack, walk across town to old apartment. In freezing rain.
And so at 3 AM, wet and shivering in a room with no light, I started making apology calls.
After the first two I broke down and called my dad, convinced I had just bankrupted my company. He talked me down.
I worked through the night on apologies.
We ended up losing exactly two accounts. One reactivated the next day, impressed that they rated “a personal apology from the CEO.”
“Everyone makes mistakes. Go easy on your engineer.”
That’s good advice.
I fixed the idempotency issue and added MANY safeguards. We never unintentionally duplicated a call again for the life of the company.
Nobody remembers this now except me, and engineers who I tell it to, when they think that they’ve just made the worst mistake of their career.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
One thing that I non-ironically appreciate about the cryptocurrency community is that the transparency ethos from those projects that are not outright frauds will provide an unprecedented amount of historical content into e.g. collapse of financial institutions.
Years of Congressional inquiries, books, etc later, most people (even very interested people) have a very poor mental model of what happened during the global financial crisis.
Meanwhile in crypto you can literally see the warroom chat logs as they're losing tens of millions of dollars of depositor's money and trying frantically to stop it.
The most recent hiring class in Japan is larger than the entire Japan team from the day I joined.
We have our first product (which I’ll elide mentioning) where the ratio in usage between US and Japan is commensurate with economic size.
We’ve recently released convenience store (konbini) payments in Japan. Early users are loving it.
Our experience in many markets has been that users get jaw-dropping conversion lifts for letting customers pay in the way they want, versus standardizing on cards only worldwide.
A metacomment: This is not a unique insight about fashion, but it being phrased in terms of Searle’s Chinese Room would have given undergraduate me *permission to understand it.*
This effect creates ~infinite demand for teaching (and repackaging teaching).
People often wonder whether the market for teaching (or advice, or commentary, or...) is saturated, and it is remarkably underserved, both along the axis of which topics are covered and on ways-of-seeing-the-world they’re presented in.
(I particularly like the bit about how a $5k angel punched waaaaay above their weight due to introductions.)
Would like to underscore, as someone who did fundraising for the first time this year and has done high-volume prospecting before in e.g. hiring and sales: you will absolutely die if you do not have A System and in fundraising that probably means a spreadsheet.
You might think you have a good working memory. I have a good working memory. I am not capable of remembering which of 76 people lit up particularly when given the A anecdote or whether there are outstanding document requests from Prospect 37.
It was once observed to me that there are some communities where people who know each other only as avatars would quote take a bullet for endquote each other, and while that is probably a level one does not need to model for an API response, allowing high-trust spaces is powerful
In some ways the future is here but not evenly distributed; you can model, for example, companies as being notably high mutual trust islands in a sea of (presumably!) lower mutual trust relationships.
Which implies something about e.g. their Slack channels.
And on that spectrum / within that dimensional space there are likely forms of trust which exist but which we can’t conveniently see or reason about right now, and who knows, perhaps forms of trust that do not yet exist but should.
“What do we do with this dollar substitute?”
“Is it a dollar?”
“No. It is critically not a dollar.”
“So what is it?”
“I hope it is as close to a dollar as anything in the world can be without being a dollar.”
“So it is worth a dollar in all circumstances?”
“No. Not that dumb.”
“So when is it worth a dollar?”
“Almost all of the time.”
“Can you drill into *almost* a bit?”
“I’d prefer not to.”
“Have you considered the thing you think is not a dollar may in fact be similar to most members of the set ‘Things that are not a dollar.’”
“Not ideal.”
“And?”