My Authors
Read all threads
0/ This is a thread about *Logging* and how – for decades – it’s been a needlessly “selfish” technology.

And how that should change.

I promise this eventually gets concrete and involves real examples from production. :)

👇
1/ First off, a crucial clarification: I don’t mean that the “loggers” – that is, the human operators – are selfish, of course! The problem has been that their (IMO primitive) toolchain needlessly localizes and *constrains* the value of the logging telemetry data.
2/ How? Well, traditional logging tech encourages the code-owner to add logs *in order to explain what’s happening with their code, in production, to themself.* Or, maybe, to other/future owners of that particular patch of code.

Useful? Sometimes. But both limited and limiting.
3/ It’s a paradox… the logging data itself seems so rich, and surely relevant to countless “unsolved mysteries” in production.

And yet: during any particular incident or investigation, the narrow sliver of relevant logging data can’t be separated from the rest of the firehose.
4/ So how do we realize the *actual* value of our logs? How do we rank and filter them in the context of an actual incident or investigation? How do we “multiply their value,” and make logging a “generous” discipline rather than a “selfish” one?
5/ Like this:

I) Automatically attach the logs to the trace context
II) Collect *everything*
III) Use tracing data and AI/ML to correlate critical symptoms in one service with the relevant logs from others

… and now the logs help *everyone,* not just the logger. ✨
6/ These sorts of things are much easier to explain with a real-world example, so let’s make this more concrete. I’ll pick a recent issue from @LightstepHQ’s own live, multi-tenant SaaS in order to illustrate the concepts here.
7/ I heard that we had an elevated error ratio for HTTP POST in a central service – code-named “crouton” – a few nights ago. Here’s how it looked:

(PS: “crouton” is the oldest service in our system, and the name is my fault. It was a dad-joke about “breadcrumbs.”)
8/ In a conventional metrics or monitoring tool, this is when you’d start guessing and checking – very basic, and in a bad way.

What you’d rather do is just click on the spike and understand what’s changed. Here, you can:
9/ We need to understand “what changed.” To help with that, this view color-codes regression state (red) and baseline state (blue) and uses the root-cause-analysis engine to examine all of its upstream and downstream dependencies – including logs from those distant services!
10/ Here’s why this would be so painful without 1000s of traces to power the analysis. This is the dependency diagram *just for this HTTP POST regression*. No human has time to figure this out manually, much less analyze the relevant snippets of log data along the way.
11/ Thankfully, that sort of strenuous manual effort is no longer needed. An observability solution should be able to do that work for us, and should also provide quantitative impact data. Here’s what this view has to say about correlated logs:
12/ This is interesting!

We now have concrete evidence of an actual data integrity issue, including some specifics (“byte 22”, etc) – and we can follow up by looking at *many* distributed traces.
13/ Here’s the first one of those, for example – this trace shows the connection between this deeply-nested log message and the HTTP POST that concerned us at the outset, and gives us enough detail to reproduce and resolve outside of production.
14/ So, logging tech *used* to be “selfish” – service logs only really helped that service’s owners. Enriched with context from tracing data, though, logs can help the entire org resolve mysteries across the stack.

This is what unified observability can – and should – deliver.
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Ben Sigelman

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!