Ben Kuhn Profile picture
Care a lot and try hard • making language models safer @AnthropicAI • prev CTO @WaveSenegal 🐧❤️

Sep 6, 2020, 6 tweets

This week vindicated my "all infrastructure config is code that lives in a Git repo" paranoia more quickly than I ever expected 😬

Someone hit "delete project" in Firebase, expecting to delete a test app; turns out this actually deletes the GCP project, i.e. prod. (WTF WTF WTF)

(And no, the "are you sure?" confirmation modal did not clarify this at all.)

Fortunately "delete" actually means "shut down all instances + schedule data deletion for tomorrow," so we could immediately notice and turn things back on.

We were mostly recovered within ~45 min (most of which was waiting for Kube clusters to boot back up). The 2 machines that took longer were exactly the ones where we'd messed with iptables by hand and never verified that they could be rebuilt from scratch.

Anyway, the moral of the story for me is that there are more vectors than I thought for "I got confused about which resource I was operating on and deleted prod"—e.g., I did NOT expect a different web console w/ different UI and branding to have a "delete prod" button.

And so the "repayment period" for the investment of "all infra is code in a repo"—which I thought was mostly about improving long-term auditability/maintainability—is (on expectation) quicker than I thought!

(The other moral of the story is that if you, like me, hand-edit iptables rules on Wednesday and promise to turn it into checked-in code on Friday, then Thursday will be the day someone deletes prod and your coworker spends 1h debugging to realize they were reset 🤦‍♂️)

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling