, 16 tweets, 3 min read
My Authors
Read all threads
raise your hand if "CPU usage" is still how you find outages and anomalous events on the regular

(my eyes are closed, i don't fucking want to know. i just want you to see yourself right now.)
Okay, that came out a bit harder edged than it sounded in my head. Sorry!

I think it's incredible that just about everyone still does this. 🥰 Some of the most impressive sites on the internet still rely on it. And I don't mean once in a while, I mean **RELY** on it.
Incidentally, if any #codefreeze folks are up and about, I'm sitting in Beacon with a truly excessive quantity of nachos. Send halp before I coma.
What does that mean?

I mean I know of multiple top-20 websites whose most reliable way of detecting runaway bots and bad actors is the cpu on their mysql primary gets pegged.

I mean, no shame. I'm not too proud for a quick, dirty and devastatingly efficient gross hack.
... if only this were one of them. 😬. This isn't so much "detection" as it is "detection that something is bad enough that it's spilling over to effect everyone else.
(Oops, thumb slipped. That should be "affect everyone else.")

What about the bots that *aren't* greedy enough to slurp resources as fast as they can get them, for example? They go undetected forever.

And once you suspect you have a bot, you...reload the page a bunch. 😬😬😬
paying attention to the raw query in flight, hoping to catch the uuid as it is usually a high % of the queries.

(usually)

or letting your eyes glaze over slightly as the log file tail -f's, so you can try to pick out a pattern with your eyeballs. 😬
The problem is, hacks are heuristics. There's no guarantee your bad actor is issuing tons of queries. Maybe it's just one bad UPDATE WHERE?

What you actually want to do is sum up the usage of the constrained resource (probably a lock) and break down by who all is using it.
The hacks work alright as long as the heuristics rarely change. They work for relatively fixed known unknowns.

And they work for reactive firefighting. You can keep your site from dying, for the most part, by paging yourself whenever it gets close to death and needs a massage.
But this makes people associate firefighting with safety and brinksmanship in some regrettable ways. If the only way to detect a problem is by driving your system to the brink of death, that is... not great. ☺️

Here is a thought experiment I have been toying with...
🌜"How would you run your systems if you had no paging alerts?"🌛

Assume that downtime is not a huge concern. What *is*? What would you need to do a better job of fixing, how would you know if your changes were successful?
A few paging alerts are essential to the prod toolkit.

A FEW.

Leaning on paging alerts lets you cut corners on quality, skimp on instrumentation...and never develop the habit of going to _look_ at your changes as you are shipping them, or get neck deep in your own telemetry.
All for the low low price of burning out anyone who has to carry a pager full of shit alerts with no relation to user pain, of course.

BUT WAIT IT GETS WORSE

This is how we build opaque systems of infinitely stacked brokenness that no one has ever understood.
Do you know how big and damaging a bug has to be to actually trigger a paging alert? PRETTY DAMN BIG. Unquestionably catastrophic.

This is by design. Flappy and borderline alerts are unacceptable.

So how many subtle bugs do you think you deploy but *not* catch that same day?
Probably at least 10x as you do catch with a paging alert. ☺️

If you aren't instrumenting your code and going in every day to play with it, check on it, explore the world through it, and verify that it's doing what you want it to... 🌷you don't know🌷 how your systems work.
In conclusion, Paging alerts are a crutch. Run your shit as though you don't get any, and craft your feedback loops to keep you in constant conversation with your code in prod, and everyone's life will be better.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Charity Majors

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!