Charity Majors Profile picture
Aug 12, 2024 25 tweets 5 min read Read on X
I wrote a post about the emerging generational gulf in observability tools.

O11y 1.0: many pillars, many tools, lots (but not all) are metrics backed. APM, RUM, tracing, logs, etc

O11y 2.0: wide, structured canonical log events are your source of truth

charity.wtf/2024/08/07/is-…
It's never been simpler to understand the generational divide -- no laundry list of features to memorize and reason about -- and never been easier to understand why it matters so much.

Either your time and attention are scattered across many irreconcilable data sources, or not.
The technical distinction between 1.0 and 2.0 is so minor that it's easy to underestimate the seismic importance, and the waves of sociotechnical transformation this unlocks.

There are two historical analogues I can't stop thinking about: virtualization and TDD.

Bear with me...
Virtualization is old, old tech. We've had VMs since the 70s! But it wasn't til VMware productized it in 1999 that shit got interesting. That opened the door to cloud computing and mainstream SaaS.

It's also what sparked 💥the DevOps movement.
Before VMs, ops teams were bound to hardware and data centers, operating systems and deps. Software engineers lived in application land, and the overlap between the two worlds was *miniscule*.

That all began to change once you could write code to manage your infrastructure.
It took us like 15 years to get from VM-as-a-product to, like, "stop hugging servers" and "cattle not pets". A decade or more of people figuring out better workflows, building better tools, hooking up feedback loops that lets engineers move more and more swiftly, with confidence.
These days, I feel like we are in the waning days of devops. (Which is not the same as saying "devops is dead". 🙄) Rather, the problems devops was invented to solve have increasingly *been solved*.

We don't have siloed ops and dev teams who need to learn to work together.
Increasingly, we live in a post devops universe, where every engineer writes code, and every engineer owns the code they wrote. As God intended. 😉

This is a good thing!! Of course, now we have new problems. One of them is that our o11y tools are a relic of a siloed age,
...where front end, back end, and ops were all expected to look at different data and care about different things. And at the time hardware was insanely expensive, so we threw away alllll the context and kept just the summary metrics.

(Oooof.) Anyway, that's how we got o11y 1.0.
The other historical moment on my mind right now is also circa 1999, when @KentBeck (re)discovered TDD, test-driven development.

Up til that point, people basically just wrote code and crossed their fingers, waiting for users to tell us if anything broke or regressed, lol.
@KentBeck TDD is when programming started to become software engineering, imo. Not because you necessarily have to do TDD per se, but you do need to have *tests*. TDD made it clear to every developer that YOUR code is YOUR responsibility.

*You* own the quality of your code, not QA.
@KentBeck It's hard to overstate what a revolutionary concept that was. Or how utterly necessary it has turned out to be.

However, the effectiveness of TDD and tests in general have been losing ground steadily over the past ten or fifteen years.
I'm not saying, "don't write tests". You should still write tests! It's the fastest, cheapest way to make sure you didn't do any seriously dumb shit.

But the fact that your tests pass doesn't mean your code works. It means your code is probably logically correct, and that's all.
Back then, you were probably working on a monolith application in a simple architecture -- web, app, data. If your tests passed, and you had good tests, you could feel reasonably confident in it. The overwhelming majority of the complexity was bound up neatly in the app tier.
Now your code is running...where? Maybe one of your hundreds or thousands of microservices, maybe storage service or proxy, maybe PaaS, SaaS, serverless, lambda jobs, client side compute or cache on your phone or web or IoT device ... who the fuck knows?
And how is it going to perform? Lol if you think you can write a test to simulate the concurrency, emergent behavior, thundering herds, or flighty user whims of the day.

You can't. No one can. Staging is just a glorified laptop. Only prod is prod.
Twenty-five years ago, engineers began to realize that the only way to ship consistently good code was for devs to embrace tests.

That's how feedback loops work. You can't say "ok, THESE people are going to write the code and THOSE people are going to make sure it works" 👍😬
And now, in the waning days of the devops era, I think we're seeing a similar seismic shift around operability and production systems.

The only way to ship consistently good code is to embrace a development workflow that includes instrumentation, observability and prod.
You can't say "ok, THESE people are going to write the code and THOSE people are going to understand the code." Understanding is part of writing.

I think of this as ODD, or "observability driven development" (but @jessitron prefers "observability during development" 😉).
ODD means: you instrument your code as you go, asking yourself "how will I know if this is working or not?"

Then you deploy it. Then you inspect your code through the lens of the instrumentation you just wrote.

1, is it doing what you expected?
2, does anything else look weird?
Your job isn't done when you merged your code and tests pass. Your job is done once you've verified it in production, and maybe let it bake for a bit.

Or at least, that's the bare minimum. Here's where we finally circle back to the original topic, observability 1.0 vs 2.0.
Testing before production is great, and you should do it. But you run into diminishing returns really fucking fast, because it's all fake and controlled.

If you care a lot about the quality or stability of your code, or the user experience, you *have* to invest in production.
There's been an explosion of innovation in this space, and this is just the beginning.

- progressive deployments
- canaries
- feature flags

However, most companies are still using o11y 1.0 tools and this is ... more of a problem than you might think.
The beauty and joy of production tooling like this is all about taking a scalpel to production traffic.

"What happens when I flip this flag?"
"What happens when I promote this canary from 1% to 5%?"

But with metrics-backed tools, you only get averages and randoms. No scalpels.
By averages and randoms, I mean: you can get buckets (mean, median, 90%, 99%, 99.99%), and you can get random exemplars.

What you can NOT do is ask precise questions to go with your precise tooling. With o11y 2.0, you could see _exactly_ what happened with your flag or canary --

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charity Majors

Charity Majors Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mipsytipsy

Sep 21, 2024
I've written a few threads in recent memory on skip levels, and tbqh, some of these responses I've gotten have thrown me for quite a loop.

"You should talk to people who don't report to you directly" -- I did not think this would be a controversial statement!

Whuf 🙈
Replies came from all over the place:

* what's a skip level?

* why would I *want* to talk to my boss's boss? Sounds boring, what would even we talk about?

* why *should* I talk to my reports' reports? I don't want to confuse them, and anyway there are just too many of them
The most troubling ones came from middle managers themselves:

* "my manager should only have to talk to my reports if she doesn't trust me"

* "skips are disempowering to managers and confusing to my reports"

* "what if they criticize me behind my back & I can't defend myself?"
Read 15 tweets
Sep 2, 2024
I woke up this am, scanned Twitter from bed, and spent an hour debating whether I could stomach the energy to respond to the latest breathless fatwa from Paul Graham.

I fell asleep again before deciding; just as well, because @clairevo said it all more nicely than I would have.
(Is that all I have to say? No, dammit, I guess it is not.)

This is so everything about PG in a nutshell, and why I find him so heartbreakingly frustrating.

The guy is brilliant, and a genius communicator. He's seen more and done more than I ever will, times a thousand.
And he is so, so, so consistently blinkered in certain predictable ways. As a former fundamentalist, my reference point for this sort of conduct is mostly religious.

And YC has always struck me less like an investment vehicle, much more like a cult dedicated to founder worship.
Read 30 tweets
Aug 30, 2024
I would soften this a bit. Managers are *not* the only ones who can transform the culture of an org, but their buy-in and support are fundamental.

Good managers are channelers. They use their power to filter, amplify and leverage voices around the org to shape company culture.
Important context: that post was quote tweeting this one.

Because I have also seen designers come in saying lovely things about transformation and user centricity, and end up wasting unthinkable quantities of organizational energy and time.

If you're a manager, and you have a boot camp grad designer who comes in the door wanting to transform your org, and you let them, you are committing professional malpractice.

The way you earn the right to transform is by executing consistently, and transforming incrementally.
Read 5 tweets
Aug 28, 2024
Another way of looking at this is: if the real product of any software engineering team is shared understanding,

if the speed of sense-making is a core limiting factor on the speed of delivery,

then strong, futureproof engineering teams are 🌸continuous learning machines🌸
(by "futureproof" I mean "true 5y from now whether AI is writing 0% or 100% our lines of code)

And you know what's a great continuous e2e test of your team's prowess at learning and sensemaking?

1, regularly injecting fresh junior talent
2, composing teams of a range of levels
"Is it safe to ask questions" is a low fucking bar. Better: is it normal to ask questions, is it an expected contribution from every person at every level? Does everyone get a chance to explain and talk through their work?

Are great questioners celebrated? Rewarded?
Read 8 tweets
Aug 21, 2024
The advance of LLMs and other AI tools is a rare opportunity to radically upend the way we talk and think about software development, and change our industry for the better.

The way we have traditionally talked about software centers on writing code, solving technical problems.
LLMs challenge this -- in a way that can feel scary and disorienting. If the robots are coming for our life's work, what crumbs will be left for you and me?

But I would argue that this has always been a misrepresentation of the work, one which confuses the trees for the forest.
Something I have been noodling on is, how to describe software development in a way that is both a) true today, and b) relatively futureproof, meaning still true 5 years from now if the optimists have won and most code is no longer written by humans.

I have two propositions.
Read 24 tweets
Aug 19, 2024
A couple days back I went on a whole rant about lazy billionaires punching down and blaming wfh/"work life balance" for Google's long slide of loss dominance.

I actually want to take this up from the other side, and defend some of the much hated, much-maligned RTO initiatives.
I'm purposely not quote tweeting anyone or any company. This is not about any one example, it's a synthesis of conversations I have had with techies and seen on Twitter.

There seems to be a sweeping consensus amongst engineers that RTO is unjust, unwarranted and cruel. Period.
And like, I would never argue that RTO is being implemented well across the board. It's hard not to feel cynical when:

* you are being told to RTO despite your team not being there
* you are subject to arbitrary badge checks
* reasonable accommodations are not being made
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(