Let's talk about OpenTelemetry, or "OTel", as the kids like to call it.
I remember emitting sooo many frustrated twitter rants back in 2017-2018 about how *behind* we were as an industry when it comes to standards for instrumentation and logging.
Then OTel shows up.
For those of you who have been living under a rock, OTel is an open standard for generating, collecting, and exporting telemetry in a vendor agnostic way.
Before OTel, every vendor had its own libraries, and switching (or trying out) new vendors was a *bitch*.
Yeah, it's a bit more complicated to set up than your standard printf or logging library, but it also adds more discipline and convenience around things like tracing and the sort of arbitrarily-wide structured data blobs (bundled per request, per event) that o11y requires.
In terms of investment, it's one of the best things any platform team can do to make their code maintainable over the long run.
And since you can switch from vendor to vendor without reinstrumenting (!), it forces vendors to compete on the merits instead of relying on lock-in.
Or that's the theory, at least. That's why Honeycomb went all-in on OTel. honeycomb.io/blog/all-in-on… Lightstep, NewRelic, and many other vendors have also put their chips on the table with OTel.
It's better for everyone. In theory.
So what happens if one of the biggest players in the space decides to have it both ways, by treating OTel as a one way street? they'll "support" it, but only as a bridge for new users to get data in to their own walled garden?
DataDog has been telling users they can use OTel to get data in, but not get data out.
The DataDog OTel collector PR was silently killed. github.com/open-telemetry… The person who wrote it appears to have been pressured into closing it, and nothing has been proposed to replace it.
Behind the scenes, we've heard from our customers (who are also DataDog customers) that they have been downplaying the maturity of OTel and discouraging people from using it.
They push you towards their locked-in proprietary integrations, saying it's not ready for prime time.
They say it's "unstable, with breaking changes" and point to the fact that it isn't 1.0 yet.
Nonsense. We've been running it in prod for over a year now, and so have tons of customers. Go look for yourself -- the repo isn't exactly a backlog of hipri breaking tickets. 🙄
Not to mention, the integrations were **based on datadog integrations** in the first place! (at their insistence).
The reason that OTel collector isn't marked fully stable is that the team reserves the right to break INTERNAL APIs between OTel collector components.
All that means is that vendors (like us) have to update some APIs when we build new collector distros. If you're a normal end user who uses the collector packages, you'll never notice.
And the language SDKs have been GA for over a year now. It's a total scare campaign.
Here is the DataDog post from 2.5 years ago, tooting the horn about donating their tracing integrations to OTel. datadoghq.com/blog/opentelem…
Yes, these are the same integrations they are now darkly warning users to stay away from, because they "aren't stable."
I don't usually call competitors out by name. But this isn't a "honeycomb vs datadog" thing. There are dozens of vendors adopting or supporting OTel on your behalf.
As far as we can tell, DataDog is the only one pulling this kind of shady shit to keep users locked in.
And ultimately that's what matters. We support OTel because that's what's in the best interest of our users. It's in *your* best interest.
Even if you're a very happy DataDog customer who never plans to leave -- and there are lots of those! -- you should care about this.
I've got no argument with companies who want to continue to publish and promote non-OTel integrations.
But if you support the open standard to let people get their data in, you should support the open standard to let people get their data out. Or you're just an asshole.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
It's hard to formulate career goals in your first decade or so as an engineer; there is just SO MUCH to learn. Most of us just kinda wing it.
But this is a goal that I think will serve you well: do a tour of duty at a startup and another at a bigco, in your first 10y as an eng.
Besides the obvious benefits of knowing how to operate in two domains, it also prevents you from reaching premature seniority. (charity.wtf/2020/11/01/que…)
The best gift you can give your future self is the habit of regularly returning to the well to learn, feeling like a beginner.
Several people asked this. It's a good question! I will share my thoughts, but I am certainly not religious about this. You should do what works for you and your teams and their workflows. 📈🥂☺️
1) "assuming you have good deduplication"... can a pretty big assumption. You never want to be in a situation where you spend more time tweaking dupe, retry, re-alert thresholds than fixing the problem.
2) having to remember to go futz with a ticket after every little thing feels like a lot of busywork. You've already committed some code, mentioned it in #ops or wherever, and now you have to go paste all that information into a task (or many tasks) too?
@beajammingh the title particularly caught my eye. for the past month or two i've been sitting on a rant about how i no longer associate the term "devops"** with modern problems, but with fighting the last war.
** infinitely malleable as it may be
yes, if you have massive software engineering teams and operations teams and they are all siloed off from each other, then you should be breaking down (i can't even say it, the phrase is so annoying) ... stuff.
but this is a temporary stage, right? a bridge to a better world.
I've done a lot of yowling about high cardinality -- what it is, why you can't have observability without it.
I haven't made nearly as much noise about ✨high dimensionality✨. Which is unfortunate, because it is every bit as fundamental to true observability. Let's fix this!
If you accept my definition of observability (the ability to understand any unknown system state just by asking questions from the outside; it's all about the unknown-unknowns) then you understand why o11y is built on building blocks of arbitrarily-wide structured data blobs.
If you want to brush up on any of this, here are some links on observability:
Close! "If you're considering replacing $(working tool) with $(different tool for same function), don't do it unless you expect a 10x productivity improvement"
cvs to git? ✅
mysql to postgres? ❌
puppet to chef? ❌
redhat to ubuntu? ❌
The costs of ripping and replacing, training humans, updating references and docs, the overhead of managing two systems in the meantime, etc -- are so high that otherwise you are likely better off investing that time in making the existing solution work for you.
Of course, every situation is unique. And the interesting conversations are usually around where that 10x break-even point will be.
The big one of the past half-decade has been when to move from virtualization to containerization.
Maybe not "full transparency", but I think *lots* of engineers chafe at the level of detail they have access to, and wish they were looped in to decision-making processes much earlier.
One of the most common reasons people become managers is they want to know EVERYTHING. They are tired of feeling left out, or like information is being kept from them (true or no).
All they want is to be "in the room where it happens", every time it happens.
I mean, that's why I got in to management. 🙃👋 And it works! It scratches the itch. Everything routes through you. It feels great...for you.
But you still have a team where people feel like they have to become managers in order to be included and heard.