My Authors
Read all threads
If you don't have observability -- in the technical sense, not the marketing sense ("three pillars") -- then you don't have chaos engineering, and you don't have chaos experiments.

You just have chaos.
Here's why.

(I've said all this a hundred times, but I still hear people say "oh! *this time* it clicked for me!" so I will keep explaining it. 🙃 )
If all you have are metrics, logs and traces, esp if they are in separate tools, then you do not have the ability to slice and dice and breakdown and aggregate by every combination of service, container, instance, node, build version, etc.

These are all high cardinality sets.
This means you don't have the ability to start at the edge with an aggregate error count and systematically follow the trail of breadcrumbs to the source of the chaos you injected, wherever it might be.
You also don't have the ability to look at a spike of errors and wonder, "I wonder what n things are different about the errored requests compared to the baseline requests", which is the fastest shortcut to an answer most of the time.
(Which is useful under any circumstances, but is downright WIZARDRY when n turns out to be like 12 different things. How many of those could you have caught by hand??)
Here, let me put it this way. Have you ever seen error rates look a little elevated, and figured out only muxh later that it was due to a process that couldn't talk to the database?

Observability lets you track down any variation of this problem swiftly and reliably.
Which is fortunate, since chaos experiments basically just sprinkle a random bunch of these throughout your systems.

Which is fine! As long as you have the ability to locate them. Which you do not, if all you have are aggregated metrics and logs.
If you can't get from aggregates to raw events, you don't have observability.

If you can't group by any element of a high cardinality dimension, you don't have o11y.

If you have metrics that are not grouped together by request id/service, you don't have o11y.
And if you don't have observability, if you cannot dependably identify the source of chaos, then your time would be better spent fixing that first.
P.S. I will send stickers to any active honeycomb user or customer, including our free tier. Just DM me a mailing address on the pollinators' community slack. 🐝🥰

More favorites:
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Charity Majors

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!