It is staggering how incredibly durable the myth of "you can't afford events, use metrics" has proven to be. 🤔
I think there are several contributing factors. First of all, most people's frame of reference is logs. Shitty, spammy, undisciplined string-infested logs.
The median log line contains maybe 1-5 nouns of information, and repeats any/all correlating identifiers on every line. That's...not a lot of information density per write or buffer flush.
But it gets worse! The strings are often padded with sentences and human readable crap,
and the log lines themselves are virtually useless unless you reassemble the full context of the event in post processing.
Your write amplification is massive (could easily be tens, hundreds per request) and a typo here can be fatal to disk space or budget.
Events, on the other hand, are set at one per request per service. A mature instrumented service tends to have 300-500 dimensions, most of which are populated.
Adding another dimension doesn't mean another write, just appending a few more chars to the existing one.
So structurally events are compact, dense and resistant to bloat -- and no post processing necessary to make them usable.
No printing out the unique ids and time stamps again and again, on every log line. No need to allocate the memory and setup tcp every time.
And that's just what you save by aggregating context around the event. I know y'all don't have access to an efficient columnar store; the closest options are probably elastic (built for text search) and druid (lacks flexible schemas). Surely there's something in the works tho.
We've written extensively on some of the things we did to optimize storage costs, from compression to replacing repeated strings with pointers, to (most recently) aging the files out to S3 and moving the query planner to lambda jobs.
(Aka "We serverlessed our database 😍")
All that without even mentioning the loaded S-word: sampling.
To be clear: honeycomb does not depend on sampling in ANY way, many of our customers don't sample at all, it's completely up to you. But dynamic sampling is a fucking superpower. You ignore it at your own peril.
Any time a monitoring vendor tells you smugly that THEY don't throw away any data, ask them what time interval they aggregate on.
(That's called throwing data away too, btw, and it's way more fatal to observability than simply getting a representative sample.)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Well, I for one am not past this bullshit by now. ☺️ EMs who do some hands on engineering are better EMs.
Forbidding EMs from touching code at all is almost as silly and counterproductive as telling EMs that writing and shipping code is a core function of their role.
I say "almost" because if I had to choose one or the other, I would choose the clarity of "EMs responsible for team outcomes, SWEs responsible for technical outcomes" over the muddle of holding EMs responsible for everything and splitting their focus between people and code.
But I don't have to choose! EMs who keep a hand in the code are better EMs. They have more empathy and understanding for their team. They are better equipped to evaluate their engineers, they have more credibility and context. Everyone wins.
I have a new piece up. It's a bit of a rant, even for me, so buckle in.
A lot of "thought leaders" have been making their mortgages lately off of bits on how AI is going to replace software engineers, particularly entry-level engineers.
This is a dumb idea. It bespeaks a wealth of misunderstanding about what it means to be an engineer and write code, and what is valuable and hard about software systems.
But even really dumb, damaging ideas can weasel into people's heads if you repeat them blindly enough times.
Generative AI has made it easier than ever to generate lots of code. @kentquirk says it's "like a junior engineer who types really fast". 🤣
But writing code has always been the easiest part of software engineering -- *always*. And it's getting easier by the day.
It felt, to me, like those participating were stepping very cautiously around a few of the third rails Jaana just tripped over. (💜)
"Work-life balance"
"Working hard vs working smart"
"Meritocracy"
The intersection of company tech cultures and expectations and performance.
These are hard, complicated topics, and there are some very good reasons for speaking carefully. People can pick up a sentence and run in the wrong direction with it, and do a lot of damage.
I have abandoned god only knows how many drafts on this topic, for that reason.
The question is, how can you interview and screen for engineers who care about the business and want to help build it, engineers who respect sales, marketing and other functions as their peers and equals?
It's a great question!! I have ideas, but would love to hear from others.
I said "question", but there are actually two: 1) how to hire engineers who are motivated by solving business problems and 2) aren't engineering supremacists.