Profile picture
Charity Majors @mipsytipsy
, 10 tweets, 2 min read Read on Twitter
One of the dirtiest secrets in systems engineering is just how many outages are never really fully explained or understood.

Or how many *can't* actually be explained or understood, given existing telemetry.
I was talking to a friend the other day -- a good engineer! -- and he confessed sheepishly that "we never actually understand why, most of the times the site goes down. It recovers on its own, we don't really understand why it recovered either."
This is way, way more common than you'd think. We all do it. Most transient problems do resolve themselves before long, hours are limited, debugging appetite gets exhausted.

"Huh. Well if it happens again we will *definitely* need to figure out what's going on." ~everyone
But our tools are also a hindrance more than a help, all too often.

You're trying to debug why an event failed, and all you have are time series aggregates and metrics where the context has been stripped away and discarded.
... Whoops gotta go, will pick this rant up this evening 🐝💖😘
(on break, resuming rant)

Another reason we can't explain our outages is that usually all we have are dashboards, and the scars and memories of past outages.
A dashboard is not a debugger. You can't step through it. It's just a view of some facts about your system. All the relationships and flow between components exist only in your head.
So you stare at it, and you think real hard about how it might be working and what scenario might explain the data you see, and go looking for more facts to support or disprove the hypothesis

Can you imagine if we debugged lines of code this way? By thinking hard??
Getting the events and relationships and flow out of your head and into something you can actually explore and play with is a good first step. Like @honeycombio.

For software you own and instrument yourself, this should bend asymptotically over time toward a debuggable system.
But then there's the shit you don't own and can't instrument, can only wave and speculate about at in their horrendous little black boxes.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!