12,399 views

Charity Majors

@mipsytipsy

, 8 tweets, 2 min read

My Authors

https://twitter.com/IMatt611/status/1203607952755585024

https://twitter.com/IMatt611/status/1203607952755585024

Yes, yes. Sorry. I did not mean "screw TDD, find every bug in production and ONLY production". That would be too obviously ridiculous, or so I thought ☺️ but many did read it that way.

It is cheap and easy to find known unknowns, so we should check for them CONSTANTLY.

https://twitter.com/IMatt611/status/1203607952755585024

So much of running production systems comes down to your skill at turning unknown-unknowns into known-unknowns with some consistency.

Every unknown-unknown is an engineering problem. It requires creativity, novel thinking, and an open-ended amount of time to resolve.

Every known-unknown is a support problem. A pattern matching problem, an index lookup problem.

Unknown-unknowns are incredibly, catastrophically expensive to debug, especially for teams who do it rarely +/have not built their systems with an eye to doing it well.

Unless you are a world class SRE team, your team can handle *maybe* one per week and still ship some code.

More than that, and your roadmap would grind to a halt. Or you'd do what everyone else does and cut corners. Many, many corners.

If it recovered on its own somehow, you'd shrug uneasily and get back to work, never actually knowing what happened.

You'd file a bunch of tasks on how to fix it so the problem wouldn't happen again, and let the tasks rot away in jira forever.

You'd document the symptoms and rely on people's memory, instead of instrumenting the system to be clearer and saner.

You'd talk, again, about adding honeycomb. (Maybe next year, or whenever things are "less busy".)

And your system would sink a little further into the bog of unintelligibility every time. But I get it -- I've been there. Doing things well takes time.

That's why it's of such shimmering importance that, when you turn an unknown-unknown into a known-unknown, you at the very least DO NOT LOSE IT.

Write tests to check for software regressions. Write monitoring checks to check for system regressions. Run them a lot.

Enjoying this thread?

Keep Current with Charity Majors

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?