My Authors
Read all threads
0/ Now that organizations are building or buying observability, they are realizing that it can get really damned expensive. And not just “expensive,” but “expensive and out of control.”

This is a thread about *observability value:* both the benefits and the costs.
1/ You hear so much about observability because it *can* be awesome. :) Benefits roll up into at least one of the following:

- Reducing latencies or error rates (foreach service)
- Reducing MTTR (also foreach service)
- Improving velocity or communication (foreach team)
2/ But most observability vendors charge based on something that has literally no value on its own: *the telemetry.*

This is rough for customers, especially since these vendors provide no mechanism to scale or *control* the telemetry volume (why would they? it’s $$$!).
3/ Let’s dig into this a bit more. Really, there are two flavors of telemetry – statistics (i.e., “metrics”) and events (i.e., “traces and logs”), and they should be considered separately.
4/ For metrics telemetry, the cost driver is cardinality, especially around “custom metrics.” Per-metric cardinality is combinatorial and grows well into the millions, and customers pay accordingly.

It really doesn’t need to be that way!
5/ Most of that “cardinality budget” is spent on long-tail metrics that *literally never appear in query results*. Customers should have a simple slider to trade off cardinality, spend, and query result quality – schematically, like this:
6/ Want perfect fidelity for a business-critical metric? Drag the slider all the way to the right. Want to degrade gracefully for a customer_id tag with millions of values? Drag the slider to the left, pay 99% less, and still have metric data for your largest customers.
7/ Event data is interesting, too.

(An aside: 99% of “logging telemetry” is really just “tracing telemetry.” Any log *about a transaction* (i.e., “almost all of them”) should be attached to the trace context so it can benefit from trace analysis.)
8/ In any case, there is a fundamental tradeoff between the *number of transactions recorded* (txns/sec – i.e., those that survive sampling at a given analytical stage) and the *level of detail* (bytes/txn) in those transactions.
9/ When we multiply “txns/sec” by “bytes/txn,” we end up with a “bytes/sec.” In order to visualize the trade-offs here, we chart various telemetry throughput targets as (hyperbolic) lines against the following axes:
10/ For tracing (or logging) data, we must trade off between *sampling* (either in the clients, in collection infra, or before hitting long-term durable storage) and *detail*. There is no “right answer” here, and it’s just something that should be considered carefully.
11/ But no matter where we draw that line, the reality is that there is *A LOT* of tracing/logging/event data; especially when microservices are involved, as the data volume is a function of transaction rate *multiplied by* microservice count!
12/ The elephant in the room around tracing data is where we *collect, store, and analyze* the data. The moment you send things over the WAN, you are paying a 100x cost penalty; but if you just keep things in a “dumb” collector, you can’t query over it dynamically.
13/ Architecturally, this means that computation *must* be distributed – close to the data – and that data *must* stay close to the services themselves; or the network cost is simply too disruptive to overall observability ROI.
14/ So, in closing…

Observability can be _incredibly_ valuable!

*Do* build an ROI case around its primary benefactors: your services, your teams, and your brand.

But `Telemetry != Observability`: *don’t* pay vendors extra for a telemetry firehose you can't even control.
PS: Given all of the above, @lightstephq's approach to metrics cleanly separates telemetry from observability, and will provide explicit control over cardinality costs.

You can sign up for early access here: lightstep.com/metrics/
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Ben Sigelman

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!