, 19 tweets, 5 min read
My Authors
Read all threads
i want to start a series called "That's It?" where we walk through horrendous outages that took many people many hours of debugging time to resolve, with all the metrics and clues they used

then show how they would have found it in honeycomb in 1-2 clicks, every goshdarn time.
a dear friend visited last week. they run a tight shop with good engineers, but have ... outgrown their tools.

he was describing this thundering herd problem, where thousands of workers would spin up and hammer the one redis cpu. it took a long time to discern this and why.
he was describing all the truly impressive heroics they underwent, and then he says skeptically "and honeycomb would help me with this ... how?"

me "oh god so easy. just sum up all the time spent by the workers, break down by backend or userid, either way it's *right there*."
me "and that's, like, the slow and manual old fashioned way! nowadays we'd say 'start with a heatmap of the latencies, then draw bubbleup around the thing you want to understand.' it computes ALL the dimensions and sifts the ones that differ to the top, no guessing necessary!"
which is super powerful since usually if you're just guessing and using your human brain, you might guess one or two of the causes, but not all of them.

e.g. when the errors are all a particular version of ios, device, language pack, region, hitting a certain endpoint, etc.
another edition of "That's It?" is what i think of as the @github problem. you have many users, and suddenly one of them gets hacked and starts emitting a stream of bot traffic.

...not enough to get up into your top 10 or 100 users, but enough to put strain on a shared service.
@github ok. so you can hire a team of ML or AI experts, and train them on massive datasets so they can write tools that "learn" what "normal" looks like, and then drive your ops team bananas with false positives every time a human confuses the AI...

(i said you *could*. i wouldn't.)
@github i would fucking sum up the resources used, then break down by user id.

"oh look, that guy's consuming 90% of the processing time and he pays us $20/month." block the fucker and go for a drink.

jesus people this isn't rocket science, just good clean fun with high cardinality.
@github debugging doesn't have to be that hard. we have MADE it hard by scattering all the relevant detail to the four winds.

of course it's complicated if you're trying to hop from tool to tool to tool, just to recreate what happened from log spew and metrics and traces.
@github it's as though you're a detective, and your file folder has been shredded and deliberately scattered around the house. and before you can read the fucking folder you have to reconstruct it, oh and it's in invisible ink so you can't even carefully re-attach the shreds by pattern.
@github (metaphor note: the shreds with invisible ink are metrics, and the shreds with ink that you COULD use to re-assemble them are log lines, at least if you were disciplined enough while emitting them.)

if you just had the fucking folder of data, it probably wouldn't be that hard.
@github anyway i'm supposed to stop telling stories like this because it makes it sound like it's only useful for people who have SUPER HARD, big, sexy distributed systems or something.

nope. newsflash: your tools suck even if you're small, even if all you have is a monolith.
if your tools are working well for you, godspeed. you need honeycomb if:

* deploys are hard or scary or flaky
* it takes more than a few minutes to deploy
* things happen that are not well understood
* on call is scary and stressful
* you are afraid of your systems
actually, here is the biggest tell of all:

* if the "debugger of last resort" is always the person or people who have been there the longest

... then you have a system that is being propped up by tribal knowledge and tall tales and cargo culting, not true debugging.
i say this sheepishly, because i've *been* that person. i *love* being that person. i love glancing at a dashboard, intuiting the problem, pronouncing it to an astonished room -- and being right. FUCK i love that shit.

i love it, but it's sooo toxic and deadly. for everyone.
i eat my weight in ice cream each month, i think we've confirmed i like things that aren't good for me.

it's better if you move this shit out of our heads and put it in a tool, where it is accessible to everyone. democratize access to systems information.
i've seen a few teams now with solid observability, and one way i can tell is this: the best debuggers aren't the folks who have been there the longest.

they're the people who are persistent and curious, who regularly go poke around in production.
often this is actually the more junior people. the ones who who get hooked on the dopamine hit of going and looking and understanding something new, or finding something surprising.

while senior folks often keep limping along with the thing they know that's good enough.
... until the junior people start debugging circles around them, and they make the jump out of professional embarrassment.

(yes yes, talking about myself again)
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Charity Majors

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!