Read on Twitter

Charity Majors @mipsytipsy

, 13 tweets, 3 min read Read on Twitter

view original on Twitter

view original on Twitter

It’s this simple: if you don’t sample, you don’t scale.

If you think this is even a controversial statement, you have never dealt with observability at scale OR you have done it wastefully and poorly.

External Tweet loading...
If nothing shows, it may have been deleted
by @smol_computer view original on Twitter

Operational data is not like billing data or transactions: not every detail is equally important.

When things are normal, you care about the shapes and trends and representative samples. When things are not or may not be, you care about every gorram detail of every thing.

Think about it this way. Do you ever care how much free working memory you have, SPECIFICALLY? Hell no, you just care about whether you have a memory leak, if you’re running out or swapping to disk ... and maybe which events precipitate a sharp drop.

Think about your databases. If you’re debugging a problem like “the queue is filling up”, you do not need every query to sum up the locks held and break down by uuid.

You need a lot less than you think for confidently mapping patterns and trends.

Aren’t some things easier with every event ever? Yeeeesss .. but it isn’t free. Y’all bitch so much about a few hundred a month, I *know* you don’t want to pay for observability that costs as much or more than your infrastructure.

So you think you’re capturing everything now. But you aren’t. Instead of sampling away some disposable events, you’re dropping the MOST IMPORTANT DATA OF ALL: all of your motherfucking context.

You’ve got all these counters that tick on every request, but you can’t break down by uuid, shopping cart, request id, shard, instance type, query family, build id, ip:port, browser type, arbitrary headers, etc

High cardinality dimensions like these are literally all that fucking matter. Yes, your metrics can tick on every request, but what’s their upper bound on cardinality ... 100? 150? Cool so that will work until you have 100 customers 👍

You are also exerting heavy pressure on your developers to *limit the detail that they capture* if you store every request.

I cannot predict all the questions I may need to ask. So I have to store raw events, not rollups. Period.

Your choices are:

1) store aggregates (this is not observability, you can’t drill down)
2) limit horizontally (capture less context)
3) limit vertically (sample the boring stuff)
4) have infinite money and capacity and humans to burn

With honeycomb, you get datasets with hundreds of dimensions and details.

That’s why debugging is so fun and easy. ☺️🐝 You don’t have to use your intuition, or predict which details will be relevant to a future outage or question. Got context? Gather it!

So, to sum up. It’s not like right now you have all the data, and I’m asking you to drop some of it via sampling.

No, you are implicitly choosing to drop all of a far more important type of data: literally all of your context.

At facebook I think we once calculated that 200 events were generated for every web or api request. At Parse the median was like 35 per api req.

Are you going to pay for an o11y stack 50x as much as prod? No?

Sample.

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

This content may be removed anytime!

Try unrolling a thread yourself!

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?