Charity Majors Profile picture
Sep 18, 2020 22 tweets 5 min read Read on X
"Log Everything" is:

a) logically impossible
b) a bottomless money pit
c) mostly junk (boring things being infinitely more common than interesting things)
d) ends up with a heavy bias towards errors, and
e) causes good people to lurch desperately towards bad ideas (like "AI")
a piece i wrote years ago, just as relevant today: honeycomb.io/blog/lies-my-p…
just a few minutes ago i was having a conversation with @paulosman about the future of our profession as systems engineers, and he pulled up a quote about four ways to solve problems: solution, resolution, absolution, and dissolution. (e.g. squiretothegiants.com/2016/06/16/so-…)
logs are such a canonical example of a solution that teams are used to either absolving (ignoring it and hoping it will go away) or solving (an outcome that is good enough to move on).

if your problem is "do something with logs", a good enough answer is "pick any log vendor."
logs are a commodity now, a race to the bottom. you can ship them to any one of twenty or fifty vendors, pay a few cents per gig to not think about them, and sleep fine at night.

but what if the problem you are trying to solve is "understand my systems" or "understand my code?"
(er sorry, i got that wrong above ... "resolve" is the one that yields a good enough outcome. "solve" is do something that yields the best possible outcome. CRITICAL DISTINCTION ☺️)
if what you're trying to do is *understand your systems*, logs are a shit-poor way of trying to do that. we fall back to logs when that's all we have. but what if you didn't log the detail you needed to understand the problem in front of you?
what if what you logged is misleading, or flat out wrong? what if the problem only manifests when you examine system behaviors in aggregate, and is invisible from the perspective of any given host? what if it doesn't match up with the telemetry from your monitoring systems?
what if your logs are filled with useless spew left over from the last engineer who was frantically stuffing in shit to help her "step" through the code from the last outage?

worst of all, what if you don't know what to look for? logs _only work_ when you know what to look for.
the problem most people face is that they've been using logs for so long that they've wrestled them into a state where it's "good enough" -- everyone is familiar enough with the local whimsies that they know how to get what they need out to get thru most situations.
and so they completely lose sight of the fact that the problem they're trying to solve is, "understand my systems".

and they start shoveling more and more and more engineering energy into the gaping maw of their logging "solution". the problem instead becomes "managing logs".
many companies end up sinking so much engineering talent into this problem, you'd think they had a hybrid mission -- whatever their company does, plus log management.

logs for logs' sake will swiftly become a millstone around the neck of any team that lets this happen.
there's so much emotional attachment around this topic, it seems like many teams (and leaders) have come to imbue their logs with a kind of emotional security blanket magic.

(for which you can prob thank the millions of dollars spent on marketing by aforementioned logs vendors.)
but what is the problem you are trying to solve?

it is not "keep a record of every thing that happened".
it is not "search our logs".
it is not "keep everything".

the problem you are trying to solve is understanding your systems and the code you write that makes your business.
if the goal is to "solve" it (find the best possible outcome) and, in some cases, "dissolve" it (reframing the question or redesigning the system so it no longer exists) ... what would you do?
well, first identify the minimal set of logs that you actually do need (usually for compliance reasons). pick any log vendor, stash and forget.

then, of course, spin up a prometheus instance or a datadog account and -- LOLJK, you know what i'm going to say about observability ☺️
but if you CAN'T start using honeycomb or lightstep or similar, if you're stuck with what you've got, what can you do to dig yourself out of logs hell?

at LEAST you can shift away from random acts of logging violence towards emitting arbitrarily-wide structured data blobs,
one per service per request, containing the full context of the request, env, parameters and so forth. i've written about this extensively charity.wtf/2019/02/05/log…

aws apparently does this, and stripe calls it "canonical logs" stripe.com/blog/canonical… so it's not just my bullshit.
then ship these events to someplace where you can aggregate and slice and dice them. you can also tee them off to a honeycomb free account, just fyi ;) since that's basically what our client side integrations do.

charity.wtf/2019/02/05/log…
btw, if you haven't yet done this refactoring and if you shudder to think how much labor it would take, check out @cribl_io. it's by ex splunk folks and it reconstitutes log spew into coherent events for you.

this is *not* a solution, but could be a bridge to your future. 😉
in the end, if your goal is to *understand your systems* then you need to be aggregating loads of rich context around the request, because that is what most closely tracks your users' experience.

this is observability.
logs can (*can*) be pressed into the service of this goal, if gathered and formatted and expressed in the correct ways, but the presence (or lack) of logs is orthogonal to the goal.

use logs or don't use logs, but don't let "logs" become the problem you solve for. the end.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charity Majors

Charity Majors Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mipsytipsy

Apr 28
In 2023 we saw several rounds of quality conversation around engineering productivity, thanks to McKinsey, @GergelyOrosz and @KentBeck and others.

It moved the industry forwards. 🙌 But it also felt fairly inside baseball to me. Deeply technical, lots of metrics.
It felt, to me, like those participating were stepping very cautiously around a few of the third rails Jaana just tripped over. (💜)

"Work-life balance"
"Working hard vs working smart"
"Meritocracy"

The intersection of company tech cultures and expectations and performance.
These are hard, complicated topics, and there are some very good reasons for speaking carefully. People can pick up a sentence and run in the wrong direction with it, and do a lot of damage.

I have abandoned god only knows how many drafts on this topic, for that reason.
Read 26 tweets
Apr 22
The question is, how can you interview and screen for engineers who care about the business and want to help build it, engineers who respect sales, marketing and other functions as their peers and equals?

It's a great question!! I have ideas, but would love to hear from others.
I said "question", but there are actually two: 1) how to hire engineers who are motivated by solving business problems and 2) aren't engineering supremacists.

They are not *unrelated*, but they are different things.charity.wtf/2022/01/20/how…
Some things you can ask to tease out these attitudes are,

* when was the last time you paired with someone outside engineering? Outside R&D?

* what did you learn? How did it change your perspective or the way you do your job, or did it?
Read 12 tweets
Apr 11
Say you want to modernize your org and introduce progressive deploys, feature flags, switch o11y vendors, etc. You could:

* roll each change out, one at a time
* change all at once, Big Bang style, migrating one service at a time

Has the Big Bang style EVER worked? For anyone?
I can think of lots of examples of engineering orgs who *tried* the Big Bang style, but got wedged halfway through, or 20% of the way through.

I can think of lots of examples of orgs who are successfully bringing up *new* services on a new stack.
I can't think of any examples of folks who have successfully migrated *off* an old stack, tool chain, and workflow.

Surely they exist. How did they do it, and what do they credit their success to? I would love to hear from y'all!
Read 12 tweets
Mar 22
Ooooohhh boy, this is a terrific question. I have written two closely related pieces,

* for engineers interviewing at a new company, on how to sniff out bad management culture:

* how to tell if the co is rotten on the inside: charity.wtf/2021/02/19/que…
charity.wtf/2022/01/29/how…
But both of those were written from the perspective of the engineer/interviewee, not the interviewer. The dynamic is different, for sure. 🤔

I would probably start by asking them why they became a manager, why they enjoy the job (if they do). (Softballs)
* what was the most demoralizing week of your management career to date, and why? What would it take for you to give up management entirely?

* I would probe their familiarity with our tech stack, and ask what they do to stay sharp and up to date technically.
Read 7 tweets
Feb 14
Pro tip: any time you see someone confidently opining on what all good CTOs know or do, it is ✨bullshit✨

There is no stock template for CTO, or default set of expectations or responsibilities. It stands alone among the C-levels in that good ones are all over the freaking map.
This may not hold true for publicly traded companies. But in my experience, a good CTO can be:

* over all of R&D
* over engineering, like a VP eng
* like a principal eng or architect
* team lead for special projects
* a great senior programmer

(continued... 👉)
A CTO can also be:

* a great communicator and popularizer
* on the road as a devrel
* a field CTO, whose authority opens doors to big customers
* a product visionary who sweats the details
* more of a cofounder than technical contributor, sharing "company-running" duties w/CEO
Read 12 tweets
Jan 22
Yeah, this is a fair caveat. If you're already a decent senior engineer and manager, it's kind of possible to split your attention between managing a small team and writing code.

But you aren't going to improve at either skill set. Those cycles get devoured by context switching.
Tech lead managers ("TLMs") are a mistake we make over and over in this industry. I've written about this a bit, but the definitive post was written by @Lethain.



Instead of being the best of both worlds, TLMs are poorly equipped to do either.lethain.com/tech-lead-mana…
(I will now brace for complaints. 🙃)

This is one of those topics that people really get worked up about. There are roughly two groups:

1) TLMs, or EMs whose identity is tied up in also being TL

2) Engineers who only respect their EM to the extent that they write great code
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(