Read on Twitter

Charity Majors @mipsytipsy

, 16 tweets, 3 min read Read on Twitter

in honor of #monitorama and all the awesome talks and tweets, here is a mini-rant on why the common mantra "three pillars of observability: metrics, logs, and traces" is wrong, wrong, and wrong (or at least woefully outdated). with love. 📈❤️📉

first, some definitions.
📈 "metrics" can either mean a generic synonym for telemetry data, or "statsd-style integers with tags appended for grouping". the first is uselessly broad in this context.
📈 "logs" usually means unstructured strings written out to disk. (cont'd)

"logs" can be interpreted so broadly as to mean "any data emitted by any piece of software", but this is also uselessly broad. to most people it means strings => disk.
📈 and "tracing" generally means distributed tracing.

ok. definitions in hand now.

i will super grant that these are three types of telemetry that are broadly grouped by the majority of types of software that were available for consuming telemetry data, esp ~5 years ago. but it's a weird, weird set of choices thru any other lens.

what about stack traces? what about event-oriented instrumentation? what about APM? what about magical APM vs intentional instrumentation? what about exception handlers? what about aggregation vs sampling? what about aggregated tracing? what about...(she drones on forever)

rather than draw lines around how we set bits on disk, it makes more sense to draw lines around the *use cases* we have. (&& if we were better at thinking that way, would we have less of a devil's patchwork of "solutions" to eyeball and correlate between today?)

one clear line you can draw is between the health of the *system*, where context-stripped aggregates and metrics are helpful, and the health of the *event*, where they are actively misleading -- only event-oriented instrumentation (incl tracing) and sampling will help you.

and let's stop thinking and talking about UNSTRUCTURED STRINGS THAT WRITE TO DISK as being a valid life choice in the year 2018, shall we?

fuck logs. fuck disks for that matter, as anything more than a backoff+retry cache.

"logs" in the traditional sense of the term are a firmly deprecated solution for operational data. support them only with the same distaste you reserve for, oh, JMX 1.0.

structure your fucking data like a fucking adult. and don't flush to disk except as a temp replay cache.

if you structure your data, you get to break down and group by all those tasty, tasty high cardinality fields -- raw query, uuid, shopping cart id, etc.

this is ACTUALLY UN POSSIBLE with either metrics OR logs. this is a quantum leap in o11y functionality -- seems worth a nod.

@getsentry

@getsentry

and what is a trace, anyway? it's actually just another structured event, only depth-first rather than breadth-first like most events.

and what about exception trackers, e.g. @getsentry?? where do they fit in to these 3 pillars? (they don't, because it's not a sensible frame)

there's no quality shared by metrics, logs, and traces that breaks down into three pillars. they aren't all storage formats, or use cases, or instrumentation types, etc.

there's one storage format (metric), one use case (tracing), and one amorphous garbage heap (logs).

of the three: metrics are increasingly irrelevant/out of favor, logs are archaic and should be abandoned everywhere parents love their children, and traces are ... well traces are great. but only make sense as a pillar if the criteria are broken down by *use cases*.

anyway. in conclusion. if i had to propose a replacement Three Pillars, i'd start with use cases.

* health of the system (metrics, aggregation, time series)
* health of the event (structured log events, sampling, instrumentation)
* bugs in my code (exceptions, stack traces)

lots of different technologies, storage formats, and instrumentation techniques can be used to solve each of them, tho not every tool can be used to solve every problem.

this strikes me as the most reasonable and realistic accounting of modern tools & practices.

** oh yes ... tracing would fall under the "health of the event" bucket using that frame. tracing is *depth-first* event-driven debugging. something like honeycomb classic or scuba (and, i suspect, many competitive tools soon) is *breadth-first* event-driven debugging.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?