hey man, you know me, I don't like talking smack about others, and I'm not sitting over here whittling and looking for excuses to litigate people's usage of the word observability.
they are literally describing monitoring. good ol', 30-year-old traditional monitoring.
* Notify
* Triage
* Understand
this is a company with a billion dollar valuation and they literally don't know the difference between monitoring and observability
i mean, we can all argue over the subtleties of observability and that's relatively understandable, but doesn't fucking EVERYBODY know what *monitoring* is and does?
cause it hasn't changed. in like.. ever
and no. the workflow with a good observability tool looks nothing like this, really. if you're practicing observability-driven-development, you're instrumenting as you go, and checking prod regularly via the lens of your instrumentation, in a tight, virtuous feedback loop.
the overwhelming majority of the bugs you write and ship to production will never actually trigger an alert or a notification. only the really super bad ones rise to that level, the ones that consume way more resources.
and those are actually pretty rare.
you need to be looking at your code in production every day even if it's *not* paging you constantly. exploring production is an important part of understanding if your code is doing what you expected it to.
it's not just about the bugs and the outages and the downtime,
it's also about asking questions like, is anybody using this feature? what was the actual impact of my change on this other system? where is most of the time being spent?
you don't wait for your system to go down before you bother to begin to understand how your system works.
you want to be in a constant conversation with your code in production. like any responsible builder.
i like to think of newly deployed code as 'the fourth trimester'. it isn't fully baked yet; it can't fend for itself. you need to raise it up, gain confidence in it over time.
apropos of nothing but my need for a palate cleanser, here is a brilliant piece about how sociotechnical software systems really work. by @paulosman
good morning kittens, guess what honeycomb been up to? ? oh not much really, we've only just STAVED OFF OUR OWN INEVITABLE DEMISE AND DESTRUCTION, 🔥YET AGAIN🔥.
We can hardly even fail if we try for another two, three years now! Take that, heat death of the universe!🪐🌑 💜
I wonder if it will ever stop feeling so bizarre just to still exist. 🙃 The list of people we are grateful for and permanently indebted to gets longer and and longer and longer with each passing year.
From our investors, who are principled, curious, endlessly thoughtful and helpful -- nothing like the stories and stereotypes about VCs that tend to filter down to eng circles -- to our family members, especially anyone who had to live with us those early few years 😬
I've been talking to lots of teams about their observability journey, or how they managed to dig themselves out of hell and get a handle on shit. Some patterns definitely emerge.
The first thing many teams look at is the on call rotation. (Smart; heading straight for the pain.)
Folks are worn out, product is upset whenever something unexpected comes up -- it's a bad scene, because they're too tightly coupled. ANY non feature work means a deadline slips.
So the first thing they do is enact a simple rule: no product work during on call weeks. Period. Those weeks are for fixing and maintaining the system.
This forces leadership to plan for using 75-85% of full capacity as a steady state. Whew; now we have some flex in the system.
Yeah. This gets to a weakness of engineering leveling systems. We rightly encourage high level engineers to seek out work that is a challenge at their level...
But there isn't always enough of that highly difficult or tech lead work to go around.
When level-appropriate work comprises a lot of your performance review, you get something very dangerous: roving bands of skilled, restless engineers competing for vanity projects and systems that should never, ever have been built, but which you now have to maintain. 😬
One way to prevent this is to *not* over hire, especially very senior engineers. Hire juniors and mid-levels with room to grow.
Most engineering work is not rocket science, and mid levels in particular are often the most prolific and productive engineers you have.
Communication pathways are sooo hard to get right, and inspire such frothing, unreasonable rage when they get it wrong.
The last time I used jira was well over a decade ago, and I thought it was impenetrable spaghetti at the time. I can't imagine it's gotten any simpler...
But it's kind of an impossible problem, of course it's going to turn into feature soup when you've been making bank on enterprise for this long.
Every team starts out trying to replicate and "improve" on how a squintillion people and teams interoperate,
I was just editing the o11y book chapter on build vs buy and ROI, and this sentence jumped out at me:
"High-performing organizations use great tools."
It's true, right? Behold all the FAANG engineers who leave their cushy perches and are shocked by the amount of tooling they had come to take for granted. It's almost like having to learn to engineer all over again
Big companies know how critical good tooling is, and pay for it.
I'm going to say two very contradictory things, both of which are true:
1) Tools are getting better and better, and you should try to keep up
2) Switching tooling is hard, and you should only do it when the gain is ~an order of magnitude better than what you've got.
You don't owe it to your employer to fix all the ways they are fucked up. Before going to battle, ask yourself:
* how much power do I have here?
* is the problem within my domain of responsibility or influence?
* who are my allies?
* do I have a reasonable chance of success?
and also: are they worth it? Is your employer fundamentally worth you staying and fighting? Is their product a net good for the world? Are your leaders decent, ethical people who care a lot?
If so, sure, pick some battles. See what happens. ☺️