... i miss you all! wish i was in portland with you! *blows kisses*
I first saw the talk three months ago, and it made my knees tremble. I learned *so much*, and been meaning to share ever since.
Without further ado.
Would that team have achieved observability?
Cultural practices need to shift in order to achieve and maximize the value of observability.
Alyson and Alaina witnessed this firsthand, in their transition from their former company to honeycomb.
* no SPOFs,
* democratizing knowledge,
* increasing thoughtful discussions about our systems.
These cultural practices kept them from growing as individuals, progressing as teams, or taking ownership over outcomes
At @honeycombio, they say, their lives are much better with a culture of observability. 🐝🌈
Curiosity is rewarded, because we can ask any question of our production data.
There is a lot less division between teams, and a lot more emphasis on ✨software ownership✨.
While we do have folks whose affinity is more towards frontend or backend, the lines are very blurry.
"determine the dataset sending us the most traffic in the last two hours, and figure out if it should be rate-limited or not. which customer is responsible for this dataset?"
"we forgot to instrument a new feature, but we know which endpoints are involved. figure out how many users tried out the new feature in the last week. who was using it most often?"
"find a dataset running slow queries. what qualities of the queries seem to make it slow?"
As you can see, these questions start to get at the beating hearts of both how our system works and how our users experience it, and how each affects the other.
(Sometimes new hires point out things that look weird or off, and then we get to investigate together!)
We talk a lot about bringing everyone up to the level of the best debugger in every corner of our systems, by leveraging each other's expertise.
At Honeycomb, A accidentally erased a field from the entire production db. Worse: a cron job runs every 10m and if that field didn't exist, it would erase all customer data.
Turned out, even a customer could have accidentally triggered this behavior via the UI. They restored the database together and improved the system.
A was already beating herself up, so having the senior engineers be cheerfully supportive & focused on actionable lessons was a balm and surprise.
Instrument once, understand everywhere. And learn from each other by watching our teammates interact with these systems, too.
People ❤️ finding answers to questions!!
If you'd like to watch for yourself: