Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Gergely Orosz

@GergelyOrosz

Mar 13, 2022 • 20 tweets • 5 min read • Read on X

As it's been ~3 years, figured I'll answer "What caused the Uber Eats glitch that allowed ordering free food for a weekend in India?"

This was an outage on my watch. Given Quora is paywalled - can't post the answer w/o a sub - here's the story on idempotency & breaking changes:

1. What happened? One morning someone in India tried to order food via UberEats in India, using Paytm as a payment method. But they didn't have enough balance.

Got an error message.

Ordered again.

The order went through!! Without having money for it.

News spread quick.

2. This was a payments-related bug. The problem with these is how the bug was in the reconciliation flow. And Uber reconciled with Paytm maybe once a week.

How Uber discovered this: restaurants started going offline thanks to huge order quantities in very short times.

3. After it was clear something was up, Uber shut down Paytm as a payment method and started the investigation.

My team owner the Paytm payment method at the time, so this was me and my team.

We naturally looked at what code changes we've made in the timeframe. None.

4. So if we made zero changes on our end, what happened?

Turns out the Paytm team did a change late on a Friday that looked innocent enough.

It silently changed an API endpoint from behaving idempotent to non-idempotent.

Why does idempotency matter?

5. Idempotency means that you can safely repeat requests as you get the same response every time.

I remember the endpoint was charge-related.

Before, it always returned the same error when trying to charge a wallet without enough credits. With the change, not anymore:

6. Before
1. "Try to charge wallet X without funds" -> Error1
2. "Try to charge wallet X without funds again" -> Error1

After
1. "Try to charge wallet X without funds" -> Error1
2. "Try to charge wallet X without funds again" -> A Brand New Error

7. Now this might look like a small change, but on Uber's side, the assumption was the endpoint was idempotent, so there was no testing on getting anything else back. The new error was unknown and not mapped to anything.

Long story short it was interpreted as "success".

8. So Paytm returned an error never documented before without telling its partners. Some partners assumed idempotency changes are breaking API changes to be communicated: but they were not. Uber was one of these partners.

The result? Free food until discovered.

9. So who paid for the free food?

Restaurants got paid and customers abusing this functionality were never pursued.

The responsible party needed to foot the bill. But who was responsible?

10. I can't share the settlement, so leaving a poll here to decide. Who do you think should have footed the cost for the bug?

The API provider changing their API to return a new error? The API consumer not parsing a new error introduced - but not communicated?

Who should pay?

Both parties were at fault here, which is why liability is tricky.

1. The API consumer should have coded more defensibly & not assume implicit API behaviors are deliberate.

2. The API provider should have communicated changes ahead of time, and not provide implicit idempotency.

Being in the middle of this outage, a few things I learned:

- Don't assume "unknown" means "good". Assume the opposite.

- The worst outages make for the best stories later.

- College students can eat SO MUCH. They were responsible for the majority of food orders during outage!

Just to make things more gray, a correction. The new API behavior was not a clear-cut error if my memory correct:

1. "Try to charge wallet X without funds" -> Error1 (as before)
2. "Try to charge wallet X without funds again" -> A status that is not an error (also not success)

Lots of questions on “why did Uber not handle HTTP error codes?”

Because there were none. This API at the time retuned only 200s where the body had a message to be parsed which indicated success / status message / error.

Status codes would have made this trivial to catch.

“Did you have tests?”

Yes! As always the integration was unit tested with all possible API behaviours *at the time of building the integration*.

“Could have you not failed closed vs failing open?”

Of course we should have. It’s the morale of the story from consumer side.

Why would you *ever* fail open when there’s something unknown?

Growth! You prefer to provide a great experience even if the provider has issues. Reconcile later.

This was the case in 2015, when the integration code was written. By 2019, the mentality changed. The code: not yet.

Lots of replies on the payments API design.

I don’t want to give Paytm a hard time: they were a lot better vs lots of other PSPs we worked with (my team owned ~15 PSP integrations). We integrated with *much* worse APIs & providers.

Paytm - unlike many - kept & keeps improving.

Ah, and Willem led writing the postmortem on our side (Uber). Here are takeaways we had (from memory):

One thing I *really* appreciated at Uber was how every outage was treated as a learning opportunity. It was a blameless culture and boy, did we learn.

https://twitter.com/wspruijt/status/1503316486798168068?s=20&t=PxgmLuwKcqTzHBlyOrxxAg

Lots of people saying Uber should have just interpreted the unknown message as “unsuccessful”. Not quite.

Here’s a story from a startup that did just that… double and triple charging their customers.

Alerting on never-before-seen responses is key over just assuming yay or nay.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @GergelyOrosz

Gergely Orosz

@GergelyOrosz

Jul 27

Amusing use of LLMs at a more traditional company:

“A project with ~50 people got stuck. There are too many JIRA tickets, no clear specification, and anytime one team tries to make progress, the others shoot it down.

So a dev built an LLM to try and break the deadlock: (cont’d)

- Fed all JIRA tickets to the LLM. Built a basic RAG with vector DB

- Had it generate questions about the project, about topics not covered by the tickets

- Had the LLM attempt to answer the same questions

- Generated a report of what areas are not specified

- Tried to use this to stop teams rejecting suggestions “because this is not well specified”

A PM at this company told me this story. Asked him if this LLM helped break the deadlock? His response:

“No. We’re still stuck. But it was good fun to build it and an excuse to play around with vector databases!”

Ha.

Read 5 tweets

Gergely Orosz

@GergelyOrosz

Jul 13

Regarding the Windsurf sale (part of the team acquihired by Google, prob a great exit, but not all the team):

I feel we’re forgetting well-funded startups today are NOT scrappy startups in the past where employees work for pennies, paid well under market.

Its a different game

What is true, and always has been true: founders and decision makers always have the biggest potential upside - for anything! Including negotiating and acquisition.

This is why so many accomplished employees eventually become founders - because its hoe you have more control of your destiny

It still stings to have some people get much better outcomes during an acquisition.

It’s a reminder that as an employee, you really don’t have leverage beyond hoping founders look out for you… sometimes they do, sometimes they don’t

Read 7 tweets

Gergely Orosz

@GergelyOrosz

Jul 9

There was this engineer on my team a while back who was: a good dev, but not the best dev. Got everything done. But had zero ego, a very nice personality, and got along with *everyone* on the team very well.

When he joined, the team became... better. Nicer. More balanced.

I just got a reference check about this dev, asking the usual questions ("what is an example where they delivered over and beyond," "how did they execute", "what are growth areas" etc)

He did fine on all of them, but I still think how much better he made my team. With stuff that's hard/impossible to measure!

Makes me realize how hiring is not focused on this stuff: "how would this person make the team better."

I guess, it is hard to be focused on this.

But this was one of the *very* rare devs who made every team much better. Nicer. More motivated. More a "team."

Still think of it

Read 7 tweets

Gergely Orosz

@GergelyOrosz

Jun 30

So predictable that we’ll see an explosion of digital products selling “ideas for million dollar businesses” that you can “just vibe code quickly”.

Basically: “buy my digital product for $500, spend $1,500 on Lovable / Claude Code and become a millionaire.”

Another hype train

Ofc these products promoted by influencers will work just as well as crypto sh*tcoins launched by influencers in 2023.

We’ll see doctored evidence (“someone who built one of ideas idea is at $5K MRR after 2 weeks”) and nontechnical people will spend thousands for $0 in return

The predictable winners: AI infra companies! Lovable, Vercel (with v0), Claude Code, Cursor, Replit, Gemini and any and all products that (at least partially) position themselves as “AI tools to build your idea that work even if you’re not a developer”

And it’s stated. A gold rush where - and the surest winners are those selling the shovels!

Read 6 tweets

Gergely Orosz

@GergelyOrosz

Jun 28

https://twitter.com/anthropicai/status/1938630310985601131

I generally like Anthropic: but the more they paint a dystopian future where AI “manages” people (“AI middle-managers”) the more I am starting to think they are losing their marbles.

LLMs is a tool humans should use. The tail should not wag the dog; Anthropic should know better

https://twitter.com/anthropicai/status/1938630310985601131

https://twitter.com/anthropicai/status/1938630312734474248

And frankly I’m getting tired of Anthropic being loud about how their AI will lead to mass unemployment, and while claiming to be a responsible lab to develop AI.

If your master plan is to wipe out the labor market for profit: you’re not responsible.

You cannot have both.

https://twitter.com/anthropicai/status/1938630312734474248

I DO feel recently that Anthropic is the single least responsible lab out there.

Thanks to their CEO parroting how their AI will lead to massive job losses: not being concerned the least, and seemingly *wanting* this outcome (even if it’s not realistic).

aimagazine.com/articles/white…

Read 7 tweets

Gergely Orosz

@GergelyOrosz

May 28

Something I hear very little talk about:

How AI coding tools are so much LESS useful when used on existing, large codebases at work (with custom frameworks, conventions, coding style etc)

... compared to doing greenfield work or side projects

So common for me to hear: "yeah I love it on my side projects, but at work it's 'meh'"

I'm getting details talking with devs at the likes of eg Google, Meta, Microsoft: the companies building some of the best AI coding tools out there!

And yet, for their existing codebases, the usefulness is marginal. Mostly for autocomplete (that has a higher miss rate than for greenfield)

https://x.com/clemkeirua/status/1927717022424608923

And yes, surely there are workarounds. I just don't hear much of these used or successfully used!

Point is almost all success stories I hear are greenfield ones or small projects, or ones started with these tools

Using on larger one a bigger challenge

https://x.com/clemkeirua/status/1927717022424608923

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Gergely Orosz

Try unrolling a thread yourself!

More from @GergelyOrosz

Gergely Orosz

Gergely Orosz

Gergely Orosz

Gergely Orosz

Gergely Orosz

Gergely Orosz

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!