, 10 tweets, 3 min read Read on Twitter
I love this post on the evolving state of the art of being on call. It's neat to get the perspective of someone "oncall-adjacent", not neck deep and sleep deprived.
also 💙'd @lizthegrey's pithy summary in slack: "if the alert is not for user pain, thank it and let it go." Is Liz not *totally* the Marie Kondo of observability? ☺️🐝
This reminds me of something @chimeracoder was just asking about.

How does one resolve the emerging consensus for alerting exclusively on user-visible outages, with the undeniable need to learn about and react to things +before* users notice? Like a high cache eviction rate?
The answer, obviously, begins with training a very complicated machine language algorithm against a corpus of every build artifact for the past decade, synced to the cycle of the blood moon.

Oh and don't forget AIOps. Definitely some of that.
Back to reality. There's actually no conflict here! Let's recap. When I talk about "paging alerts," I mean the loud noisy kind, the ones you *want* to wake you night and day.

These alerts should be few, high level, and directly correlate to user experience.
But that's not all there is to alerting. Everyone needs at least two lanes for on call tasks: the wake-me-up-now lane and the get-to-this-soon.

As well as a common intersecting variant, daytime hours and nighttime hours.
There are plenty of states your system can get itself into that aren't *great*, but can also totally wait til morning. Or til you are back from the store. Or til the end of the day when you batched them all up to do at once. That's what your SLO is *for*.
And if this isn't true -- if your systems can't limp along perfectly well for a few hours in degraded mode -- then *that's* what you must fix.

Making on call not suck is actually less about reducing the load to zero alerts, and more about making it so none of them are urgent.
Would you really hate being on call if you could set aside a block of time in morning and evening, go through any alarms that had piled up since last time, and ship a couple quick fixes for any interesting new problems? That actually sounds kinda fun, right?

(It is ☺️)
Oh... some people use email for the secondary/non-urgent lane, some use slack, some use Jira tickets, etc. They're all mildly annoying so just pick your poison.

I strongly advise against having more than two sources of alerts.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!