My Authors
Read all threads
@swyx Hey!! A friend pointed me to your post; they said you are sharp and curious and eager to learn more, so I hope you don't mind if I weigh in. ☺️

Your post is a great intro to metrics, logs and traces. But it has nothing to do with -- and says nothing about -- observability.
@swyx The confusion is understandable. Several large, well funded vendors recently decided to redefine observability as "metrics, logs and traces", because interest in observability has been heating up, and - well - they have metrics, logging, and tracing tools to sell. 😑
@swyx Observability is a borrowed term from mech eng/control theory. It means "can you understand what is happening inside the system, can you understand ANY internal state the system may get itself into, simply by asking questions from the outside?"

Has nothing to do with datatypes.
@swyx In fact, metrics are all but irrelevant to observability, since they are preaggregated and disassembled from the request before they were ever written out.

Doesn't mean metrics aren't useful for some things! But mostly monitoring, not observability.
@swyx Ditto logs. The connotations we have for logs are mostly "unstructured strings, written to disk, sprinkled throughout your source code and emitted randomly at runtime", which again, have nothing to do with observability (though they can be quite useful).
@swyx Traces -- now we're getting closer. Tracing is absolutely key to o11y, but tracing is just a particular way of visualizing events by time.

So what *is* observability, applied to the software domain? Glad you asked 😄

(Plane taking off, sec! ✈️)
@swyx Observability applied to software systems means this: the ability to ask any question of your systems, or understand any user/users behavior or subjective experience, without having to predict that question in advance.

Observability is about the unknown-unknowns.
@swyx (No WiFi on that flight. Thanks United 🙃)

Plenty of tools are great at asking the questions you could predict in advance that you would need to ask. This is the easy part.
@swyx Monitoring tools do this -- you predefine some checks, then set thresholds that mean ERROR/WARN/OK.

Logging tools let you define indexes and schemas for questions to ask efficiently. It's easy to gather, store and query the known unknowns. And for a long time that was plenty.
@swyx But if you *can't* predict all the questions you'll need to ask in advance, then you're in o11y territory.

Observability starts with gathering the data at the right level of abstraction, oriented around each request, to let you ask any question about the requests in the system.
@swyx To do this, we need to stop firing off metrics and log lines ad hoc-like and start issuing a single arbitrarily-wide event -- one per service, per request -- containing the *full context* of that request.

EVERYTHING you know about it, did in it, parameters passed in to it, etc.
@swyx Anything that could help us find and identify the request later. Think of that request like a needle in a haystack, and you need to be able to swiftly locate every damn needle in the stack.

You especially want to stuff in any kind of IDs. Userids, app id, shopping cart id, etc
@swyx Then, when the request is poised to exit or error the service, you ship that blob off to your o11y store in one very-wide blob.

(A maturely instrumented service usually has a few HUNDRED dimensions per event, and it will have one event for each service that the request hits.)
@swyx There are a few other requirements on the backend for how to achieve this, but let's skip to the good stuff. What this gets you is the ability to **ask any question** about what is happening with users running your code in prod, *without* having to ship new code to ask it.
@swyx For example. You see a spike of errors. In metrics land, you would have had to predict and create a custom metric for every possible dimension or combination of dimensions that could be erroring.

Now, you can simply poke and find out that all the errors are, for example..
@swyx ... requests that are hitting /payments from ios10 devices using the French language pack on Canadian installs with a particular out of date firmware version and an excessively large payload.

(Could you have predicted you were going to need that custom metric? 😬😬)
@swyx Being able to ask any question is key to observability, as is associating the full context with the request.

In order to deliver observability, your tool also needs to support *high cardinality and dimensionality*. Cardinality means the number of unique items in a set, and
@swyx dimensionality means how many details can describe your event.

Metrics cannot support high cardinality dimensions. It's a running joke, people trying to put userid or request id in a metrics tag ... and blowing up the world. 😕
@swyx Which is shitty since unique ids are always going to be the most useful debug info!

You NEED the ability to chain and filter as many high cardinality dimensions as you want if you're going to be able to ask arbitrary questions about your unknown unknowns.
@swyx Alright, this is getting pretty long. Let me tell you why I care so much, and why I want people like you specifically to understand what's at stake in the observability term wars.

We are way behind where we ought to be as an industry. We are shipping code we dont understand,
@swyx to systems we have never understood. Some poor sap is on call for this mess, and it's killing them, which makes the software engineers averse to owning their own code in prod.

Meanwhile everyone says they waste >40% of their day doing bullshit that doesn't move the biz forward.
@swyx We all just accept this as "what it's like working on software", like it's inevitable. But it's fucking not.

If you instrument your code, watch it deploy, then ask "is it doing what I expect? does anything else look weird?" as a habit? It never has to become a shitty hairball!
@swyx And honestly, it's people like you who are best poised to learn the new ways and take advantage of them. Observability is easier than the old ways ... but only if you don't have decades of scar tissue and old habits to unlearn.

And you don't. 🥳
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Charity Majors

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!