Profile picture
Charity Majors @mipsytipsy
, 10 tweets, 2 min read Read on Twitter
Real talk, you should never have a paging alert on a system stats metric. Or a single host anything metric. (Or an aggregate host metric, or an aggregate divided by host count, or ...)
Sometime it’s the only hammer you have, and you gotta do what you do; sure.

But it smells terrible and it does not scale. This “quick fix” is gonna drain your ops team of their life force (or whoever carries pager) faster than anything I can think of.
System statistics are fucking meaningless. They mean nothing. What does the number “10” mean? Sound and fury, signifying nothing. Except in the context of a struggling application.
There is no such thing as objective health, there is only “can my application get the resources it needs to do its job.”

So pay attention to that.
Any time I see people purchasing ML or AI to sift through thousands of useless system alerts for load averages or TCP stack stats I want to scream at them. JUST DELETE THEM ALL. Every last motherloving one of them. They are actively harmful.
You’re generating the noise that’s making it impossible to locate the signal. Stop hurting yourself!

Declare bankruptcy. Delete all your paging alerts today. Liberate yourselves from the tyranny of a thousand small cuts.
If you trust your ability to introspect and debug your systems in real-time, all you really need are the big four: latency, errors, request rate, and (possibly) saturation, plus a few well curated end-to-end checks that traverse your critical paths. The ones that make you money.
If you don’t trust your ability to debug and explore your systems in real time, well, I’ve got a service you might like to try. 🐝

Sooner rather than later. The shift doesn’t happen overnight, and this shit is only going to get worse for you.
Circling back to my original point, though: the other reason you don’t page on system stats is that they are noisy af.

Things break. All the fucking time.
The anticipation of continuous, disastrous failure is what separates the dev from the ops more than anything. Everything fails, all the time. And it’s fine.

(This is why I’m so fucking giddy about this nascent “DesignOps” movement. Yesss let us infect them with our values)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!