Gergely Orosz Profile picture
Mar 13, 2022 20 tweets 5 min read Read on X
As it's been ~3 years, figured I'll answer "What caused the Uber Eats glitch that allowed ordering free food for a weekend in India?"

This was an outage on my watch. Given Quora is paywalled - can't post the answer w/o a sub - here's the story on idempotency & breaking changes: Image
1. What happened? One morning someone in India tried to order food via UberEats in India, using Paytm as a payment method. But they didn't have enough balance.

Got an error message.

Ordered again.

The order went through!! Without having money for it.

News spread quick. Image
2. This was a payments-related bug. The problem with these is how the bug was in the reconciliation flow. And Uber reconciled with Paytm maybe once a week.

How Uber discovered this: restaurants started going offline thanks to huge order quantities in very short times.
3. After it was clear something was up, Uber shut down Paytm as a payment method and started the investigation.

My team owner the Paytm payment method at the time, so this was me and my team.

We naturally looked at what code changes we've made in the timeframe. None.
4. So if we made zero changes on our end, what happened?

Turns out the Paytm team did a change late on a Friday that looked innocent enough.

It silently changed an API endpoint from behaving idempotent to non-idempotent.

Why does idempotency matter?
5. Idempotency means that you can safely repeat requests as you get the same response every time.

I remember the endpoint was charge-related.

Before, it always returned the same error when trying to charge a wallet without enough credits. With the change, not anymore:
6. Before
1. "Try to charge wallet X without funds" -> Error1
2. "Try to charge wallet X without funds again" -> Error1

After
1. "Try to charge wallet X without funds" -> Error1
2. "Try to charge wallet X without funds again" -> A Brand New Error
7. Now this might look like a small change, but on Uber's side, the assumption was the endpoint was idempotent, so there was no testing on getting anything else back. The new error was unknown and not mapped to anything.

Long story short it was interpreted as "success".
8. So Paytm returned an error never documented before without telling its partners. Some partners assumed idempotency changes are breaking API changes to be communicated: but they were not. Uber was one of these partners.

The result? Free food until discovered.
9. So who paid for the free food?

Restaurants got paid and customers abusing this functionality were never pursued.

The responsible party needed to foot the bill. But who was responsible?
10. I can't share the settlement, so leaving a poll here to decide. Who do you think should have footed the cost for the bug?

The API provider changing their API to return a new error? The API consumer not parsing a new error introduced - but not communicated?

Who should pay?
Both parties were at fault here, which is why liability is tricky.

1. The API consumer should have coded more defensibly & not assume implicit API behaviors are deliberate.

2. The API provider should have communicated changes ahead of time, and not provide implicit idempotency.
Being in the middle of this outage, a few things I learned:

- Don't assume "unknown" means "good". Assume the opposite.

- The worst outages make for the best stories later.

- College students can eat SO MUCH. They were responsible for the majority of food orders during outage!
Just to make things more gray, a correction. The new API behavior was not a clear-cut error if my memory correct:

1. "Try to charge wallet X without funds" -> Error1 (as before)
2. "Try to charge wallet X without funds again" -> A status that is not an error (also not success)
Lots of questions on “why did Uber not handle HTTP error codes?”

Because there were none. This API at the time retuned only 200s where the body had a message to be parsed which indicated success / status message / error.

Status codes would have made this trivial to catch.
“Did you have tests?”

Yes! As always the integration was unit tested with all possible API behaviours *at the time of building the integration*.

“Could have you not failed closed vs failing open?”

Of course we should have. It’s the morale of the story from consumer side.
Why would you *ever* fail open when there’s something unknown?

Growth! You prefer to provide a great experience even if the provider has issues. Reconcile later.

This was the case in 2015, when the integration code was written. By 2019, the mentality changed. The code: not yet.
Lots of replies on the payments API design.

I don’t want to give Paytm a hard time: they were a lot better vs lots of other PSPs we worked with (my team owned ~15 PSP integrations). We integrated with *much* worse APIs & providers.

Paytm - unlike many - kept & keeps improving.
Ah, and Willem led writing the postmortem on our side (Uber). Here are takeaways we had (from memory):

One thing I *really* appreciated at Uber was how every outage was treated as a learning opportunity. It was a blameless culture and boy, did we learn.

Lots of people saying Uber should have just interpreted the unknown message as “unsuccessful”. Not quite.

Here’s a story from a startup that did just that… double and triple charging their customers.

Alerting on never-before-seen responses is key over just assuming yay or nay. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Gergely Orosz

Gergely Orosz Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GergelyOrosz

Jun 30
So predictable that we’ll see an explosion of digital products selling “ideas for million dollar businesses” that you can “just vibe code quickly”.

Basically: “buy my digital product for $500, spend $1,500 on Lovable / Claude Code and become a millionaire.”

Another hype train
Ofc these products promoted by influencers will work just as well as crypto sh*tcoins launched by influencers in 2023.

We’ll see doctored evidence (“someone who built one of ideas idea is at $5K MRR after 2 weeks”) and nontechnical people will spend thousands for $0 in return
The predictable winners: AI infra companies! Lovable, Vercel (with v0), Claude Code, Cursor, Replit, Gemini and any and all products that (at least partially) position themselves as “AI tools to build your idea that work even if you’re not a developer”

And it’s stated. A gold rush where - and the surest winners are those selling the shovels!
Read 6 tweets
Jun 28
I generally like Anthropic: but the more they paint a dystopian future where AI “manages” people (“AI middle-managers”) the more I am starting to think they are losing their marbles.

LLMs is a tool humans should use. The tail should not wag the dog; Anthropic should know better
And frankly I’m getting tired of Anthropic being loud about how their AI will lead to mass unemployment, and while claiming to be a responsible lab to develop AI.

If your master plan is to wipe out the labor market for profit: you’re not responsible.

You cannot have both.
I DO feel recently that Anthropic is the single least responsible lab out there.

Thanks to their CEO parroting how their AI will lead to massive job losses: not being concerned the least, and seemingly *wanting* this outcome (even if it’s not realistic).

aimagazine.com/articles/white…Image
Read 7 tweets
May 28
Something I hear very little talk about:

How AI coding tools are so much LESS useful when used on existing, large codebases at work (with custom frameworks, conventions, coding style etc)

... compared to doing greenfield work or side projects

So common for me to hear: "yeah I love it on my side projects, but at work it's 'meh'"
I'm getting details talking with devs at the likes of eg Google, Meta, Microsoft: the companies building some of the best AI coding tools out there!

And yet, for their existing codebases, the usefulness is marginal. Mostly for autocomplete (that has a higher miss rate than for greenfield)
And yes, surely there are workarounds. I just don't hear much of these used or successfully used!

Point is almost all success stories I hear are greenfield ones or small projects, or ones started with these tools

Using on larger one a bigger challenge

Read 5 tweets
May 25
This blog is SO good at pointing out what should have been obvious about AI for coding (Copilot and others)

These tools are good for re-creating whatever they’ve been trained on.

They are not what will create the next, better generation of frameworks, libraries, technologies. Image
Full blog - you should *absolutely* read it

I also find these AI tools helpful when it’s doing the routine task I’ve done many times and can do it with eyes closed

But… it’s not helpful when I want to build something GREAT that is elegant, and better than beforedeplet.ing/the-copilot-de…
Anyone who tells you otherwise hasn’t built software from scratch that is best-in-class

And likely things software is all solved by now

But it’s NOT

Those who invent the next chapter I cannot see doing it relying mostly on AI. Quite the opposite
Read 9 tweets
May 16
I am hearing SO many stories about people realizing coding with AI tools (aka “vibe coding”) is a game changer after “reviving” an old side project or idea on the side and making so much progress

But… while I often hear the excitement on starting: not hearing “finished” often!
Almost like these tools were amazing at making rapid progress at first… but it still takes a ton of effort to finish things and feels like most people go back to leaving side projects unfinished (even if in a more advanced state?)
FWIW guilty as charged

I got a bunch of side projects “revived” and was amazed at how fast it was

Then I just… kind of let them on the side? Turns out the reason I don’t touch them is because… they are just not a focus. Even tho it’s less effort now: still effort!!
Read 4 tweets
May 13
Question from an ex-Uber engineer:

"I got this reachout from recruiting Uber. I responded that I'm happy to discuss why I left (so Uber can learn from it) but not planning to return.

I got ghosted. Why? They asked, after all!"

Here is exactly why (continued): Image
It's b/c you mis-read the email (which is so easy to do!)

It sounds like a "we'd love feedback and improve", right?

WRONG

This is a recruitment email, using Uber alumni as a high conversion channel.

It's from a sourcer: who only has one goal: get ppl in the hiring pipeline! Image
(Btw I got the same email - likely sent out to ex-Uber folks who have left for more than eg a year, in certain regions)

The "Sourcer" role if laser-focused on bringing in candidates to roles currently hiring.

If Uber wanted feedback, it would come from HR

A sourcer will not do a call with someone they know has a 0% chance of entering the hiring pipeline!

Check the signatures of the emails next time and you'll know what the goal of the person sending almost certainly is
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(