Read on Twitter

Charity Majors @mipsytipsy

, 10 tweets, 2 min read Read on Twitter

One of the dirtiest secrets in systems engineering is just how many outages are never really fully explained or understood.

Or how many *can't* actually be explained or understood, given existing telemetry.

I was talking to a friend the other day -- a good engineer! -- and he confessed sheepishly that "we never actually understand why, most of the times the site goes down. It recovers on its own, we don't really understand why it recovered either."

This is way, way more common than you'd think. We all do it. Most transient problems do resolve themselves before long, hours are limited, debugging appetite gets exhausted.

"Huh. Well if it happens again we will *definitely* need to figure out what's going on." ~everyone

But our tools are also a hindrance more than a help, all too often.

You're trying to debug why an event failed, and all you have are time series aggregates and metrics where the context has been stripped away and discarded.

... Whoops gotta go, will pick this rant up this evening 🐝💖😘

(on break, resuming rant)

Another reason we can't explain our outages is that usually all we have are dashboards, and the scars and memories of past outages.

A dashboard is not a debugger. You can't step through it. It's just a view of some facts about your system. All the relationships and flow between components exist only in your head.

So you stare at it, and you think real hard about how it might be working and what scenario might explain the data you see, and go looking for more facts to support or disprove the hypothesis

Can you imagine if we debugged lines of code this way? By thinking hard??

@honeycombio

@honeycombio

Getting the events and relationships and flow out of your head and into something you can actually explore and play with is a good first step. Like @honeycombio.

For software you own and instrument yourself, this should bend asymptotically over time toward a debuggable system.

But then there's the shit you don't own and can't instrument, can only wave and speculate about at in their horrendous little black boxes.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

This content may be removed anytime!

Try unrolling a thread yourself!

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?