This is 100% true. What always jumps out at me is that there's always "the tracing expert(s)" that everybody goes to for help on the rare occasions when they need a trace. Fluency rarely transcends the few to reach the many.

I have a different diagnosis, tho.
I don't think it's this so much as it is that ... tracing is *inherently* a niche use case. A tracing-first approach to observability turns the world on its head for no good reason.

What most people want is the ability to slice and dice their requests,

and occasionally trace one.

A lot of the munging engineers get so good at doing with logs is to make up for the fact that they didn't capture them as events in the first place.

The fact that they inevitably end up implementing some form of rudimentary

tracing with their logs shows that there *is* value in being able to visualize your events over time.

But when you ask people to hop from one tool to the next, you're going to lose 90% of the people. They'd SO much rather find a hacky way to do it all in the same tool.
Even the best tracing tools aren't exactly known for their friendliness or ease of use. People know how to get what they want out of logs, and the cognitive lift is so much less than to shift from one tool to another, harder tool.
So they keep solving for tracing in a locally-optimized way, over and over and over. Unless you're running like 200+ microservices, the costs probably won't outweigh the gains.

Don't forget that people are already jumping once, from metrics to logs.
You see a spike in your metrics. You jump to your logs to try and visually correlate by time to see what the errors meant (since you can't slice and dice or dive deeper into a metrics-based tool, having discarded all that connective tissue at write time).
And then you find a log line that looks suspicious. Then you have to copy the id over into your tracing tool, and hope it got sampled correctly so it appears there too?

That's a heavy lift. And most people don't need to do it many times a day, it's more like monthly.
That's just not often enough to give someone the chance to become fluent in it. So nobody knows how to use it well except the people who rolled it out.

When you're in the middle an intense debugging session, stopping to wrestle with a new tool SUUUUUCKS. it destroys your flow.
Any time you're asking people to learn a new tool or change the way they're doing something, as a rule of thumb, the new thing needs to be an order of magnitude better or more valuable than what they currently have.

For most teams, standalone tracing doesn't meet this bar.
People have been doing the same janky jump from metrics to logs for their entire career, btw, which is why they rarely notice how shitty and unscientific it is.

It's pattern matching and intuitive leaps, not evidence.
The right solution is to start with arbitrarily-wide structured data blobs, capture hundreds of dimensions per request per service, & append request ids, trace ids, and span ids.

Derive your metrics and SLOs from those raw events. Practice read-time aggregation, not write-time.
Then you can start at the top ("is there a problem?"), follow the trail of breadcrumbs to the answer ("the problem is requests doing xyz"), and click on any particular request to visualize it as a trace.

No jumping around, no guessing, just debugging.
Relevant:
Also, the rest of this thread + its replies were super interesting and fun to read. Always fun when @copyconstruct throws a truth bomb and walks away. 💣🚶💅

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charity Majors

Charity Majors Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mipsytipsy

12 Sep
I was just editing the o11y book chapter on build vs buy and ROI, and this sentence jumped out at me:

"High-performing organizations use great tools."
It's true, right? Behold all the FAANG engineers who leave their cushy perches and are shocked by the amount of tooling they had come to take for granted. It's almost like having to learn to engineer all over again

Big companies know how critical good tooling is, and pay for it.
I'm going to say two very contradictory things, both of which are true:

1) Tools are getting better and better, and you should try to keep up

2) Switching tooling is hard, and you should only do it when the gain is ~an order of magnitude better than what you've got.
Read 15 tweets
10 Sep
Another great point that many of us struggle with.

First of all, sure, there *should* be no bad jobs, but that injustice is likely to exceed your personal capacity to resolve. 🙃
You don't owe it to your employer to fix all the ways they are fucked up. Before going to battle, ask yourself:

* how much power do I have here?
* is the problem within my domain of responsibility or influence?
* who are my allies?
* do I have a reasonable chance of success?
and also: are they worth it? Is your employer fundamentally worth you staying and fighting? Is their product a net good for the world? Are your leaders decent, ethical people who care a lot?

If so, sure, pick some battles. See what happens. ☺️
Read 7 tweets
7 Sep
Ah! This is a very good point. Good recruiters are outnumbered by bad ones, which are indistinguishable from spam. And yes, the more you put out the more you'll get.

Here's how to tell who is worth speaking to:
1) is the person reaching out to you the hiring manager or some other non-recruiting leadership role? Always talk to them. These are golden and rare.

Even if you don't take a job there, it will be an interesting conversation and perhaps a valuable new connection.
2) has the recruiter read your fucking profile? Shout out to the recruiter who just emailed me about a hot new entry level javascript role. 🙋‍♀️

3) did the recruiter just send you a list of companies and jobs? 🚫
Read 8 tweets
7 Sep
Here we are, now going on the fourth straight month of headlines all about how a record number of people are quitting their jobs.

There's a lot of pain behind that statistic, but also a strident, activated edge to labor that feels unlike anything seen in my lifetime.
I am *all for* more people quitting their jobs. I am *all for* employers needing to compete for employees by treating them better, increasing their wages, and offering more flexibility and support.

Most people in our industry stay at jobs they don't love, far too long.
So here's a piece of advice that I find myself giving over and over again, to senior folks who are daunted by the prospect of having to go out and search for the right role, the right team, the right company ... it's like looking for a needle in a haystack, right? 😰
Read 10 tweets
27 Aug
a fun little pocket dictionary of cognitive biases, motivated reasonings, and common objections to fixing one's software deploys 😆

charity.wtf/2021/08/27/sof…
favorites:

Ostrich effect: ignoring an obvious (negative) situation

IKEA effect: The tendency for people to place a disproportionately high value on objects that they partially assembled themselves, such as furniture from IKEA, regardless of the quality of the end product
and some delightful ones that I failed to work in:

Zeigarnik effect: That uncompleted or interrupted tasks are remembered better than completed ones.

Tachypsychia: When time perceived by the individual either lengthens, making events appear to slow down, or contracts
Read 4 tweets
15 Aug
kicking back in my chair, stroking my beard, thinking fondly of all the VCs who told us we were dooming ourselves by writing a storage engine first.

"Can't you just shove it all in mysql or elastic or ??? until you find product-market fit, and THEN go back and optimize stuff?"
the answer is no, we can't; there were no off the shelf columnar dbs, let alone any with flexible schemas or the rest of our wishlist.

if we had shoved it in an existing data store we would have looked and felt just like every other monitoring tool. same perf, same tradeoffs.
the VC's were right too, though; we did almost doom ourselves. 🙃 it took us nearly a year before we could even really start signing up users or BEGIN working on the product. it was 2.5-3 years til we found PMF.

meanwhile our seed investors gave up on us long before that.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(