Something just clicked for me this week: how much the transition from monitoring to observability is about a shift in perspective, from third-person observer to first-person narrator.

And how much this says about why putting software engineers on call is hard.
Monitoring has traditionally been one piece of software keeping an eye on another piece of software, and describing its state. This goes for white box monitoring too, not just black box.
Whereas observability requires a different level of abstraction, one that practically demands it be an instrumentation game. This is a firsthand perspective -- the code explaining itself back to you, your future self, your future team.
Obviously it is going to be event based, because that's how requests traverse code paths -- certainly not in disconnected metrics and counters.
But what does this have to do with software engineers on call?

Well, we have 20+ years of building monitoring tools for ops eyes. Ops thinks in terms of system resources and /proc values. SWEs think in terms of lines of code and code paths.
Asking software engineers to look at an old school dashboard of system resources and draw any kind of coherent line back to the problem they are trying to investigate is... unreasonable.

The amount of knowledge, intuition and scar tissue required to interpret those is too large.
In my experience, asking software engineers to be on call when they have a debugging tool like honeycomb -- one that speaks in variable names and request path and context broken down by event -- is supremely reasonable.
When they don't, then you start hearing the gripes about how you're asking them to do two jobs, or how helpless they feel when the pager goes off.

It's not two jobs, but you *are* asking them to store all the debugging wisdom and conventions of two roles in their brain. Hard.
The solution isn't to stop letting engineers support their own code. You *need* those tight feedback loops to be a high performing team.

The solution is to empower them with tools to make their jobs possible for mere mortals. First-person, event-driven, instrumentation based.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charity Majors

Charity Majors Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mipsytipsy

Feb 19
Let's talk about OpenTelemetry, or "OTel", as the kids like to call it.

I remember emitting sooo many frustrated twitter rants back in 2017-2018 about how *behind* we were as an industry when it comes to standards for instrumentation and logging.

Then OTel shows up.
For those of you who have been living under a rock, OTel is an open standard for generating, collecting, and exporting telemetry in a vendor agnostic way.

Before OTel, every vendor had its own libraries, and switching (or trying out) new vendors was a *bitch*.
Yeah, it's a bit more complicated to set up than your standard printf or logging library, but it also adds more discipline and convenience around things like tracing and the sort of arbitrarily-wide structured data blobs (bundled per request, per event) that o11y requires.
Read 15 tweets
Feb 17
I want to give this a slightly longer treatment. ☺️ (Gergely and I *just* talked about it, so it's all rustling around in my head.)

I think it's a ✨great✨ idea for every engineer to spend at least a couple years at both a big company and a startup (series B or earlier).
It's hard to formulate career goals in your first decade or so as an engineer; there is just SO MUCH to learn. Most of us just kinda wing it.

But this is a goal that I think will serve you well: do a tour of duty at a startup and another at a bigco, in your first 10y as an eng.
Besides the obvious benefits of knowing how to operate in two domains, it also prevents you from reaching premature seniority. (charity.wtf/2020/11/01/que…)

The best gift you can give your future self is the habit of regularly returning to the well to learn, feeling like a beginner.
Read 20 tweets
Feb 10
Several people asked this. It's a good question! I will share my thoughts, but I am certainly not religious about this. You should do what works for you and your teams and their workflows. 📈🥂☺️
1) "assuming you have good deduplication"... can a pretty big assumption. You never want to be in a situation where you spend more time tweaking dupe, retry, re-alert thresholds than fixing the problem.
2) having to remember to go futz with a ticket after every little thing feels like a lot of busywork. You've already committed some code, mentioned it in #ops or wherever, and now you have to go paste all that information into a task (or many tasks) too?
Read 12 tweets
Feb 9
a caviar-quality rant on deployment, security, testing ... actually more like five rants stuffed into a single trenchcoat. via @beajammingh

mumble.org.uk/blog/2022/02/0…
@beajammingh the title particularly caught my eye. for the past month or two i've been sitting on a rant about how i no longer associate the term "devops"** with modern problems, but with fighting the last war.

** infinitely malleable as it may be
yes, if you have massive software engineering teams and operations teams and they are all siloed off from each other, then you should be breaking down (i can't even say it, the phrase is so annoying) ... stuff.

but this is a temporary stage, right? a bridge to a better world.
Read 18 tweets
Feb 9
I've done a lot of yowling about high cardinality -- what it is, why you can't have observability without it.

I haven't made nearly as much noise about ✨high dimensionality✨. Which is unfortunate, because it is every bit as fundamental to true observability. Let's fix this!
If you accept my definition of observability (the ability to understand any unknown system state just by asking questions from the outside; it's all about the unknown-unknowns) then you understand why o11y is built on building blocks of arbitrarily-wide structured data blobs.
If you want to brush up on any of this, here are some links on observability:

* honeycomb.io/blog/so-you-wa…
* thenewstack.io/observability-…
* charity.wtf/2020/03/03/obs…

and on wide events:

* charity.wtf/2019/02/05/log…
* kislayverma.com/programming/pu…
Read 16 tweets
Feb 6
Close! "If you're considering replacing $(working tool) with $(different tool for same function), don't do it unless you expect a 10x productivity improvement"

cvs to git? ✅
mysql to postgres? ❌
puppet to chef? ❌
redhat to ubuntu? ❌
The costs of ripping and replacing, training humans, updating references and docs, the overhead of managing two systems in the meantime, etc -- are so high that otherwise you are likely better off investing that time in making the existing solution work for you.
Of course, every situation is unique. And the interesting conversations are usually around where that 10x break-even point will be.

The big one of the past half-decade has been when to move from virtualization to containerization.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(