It's a great talk, you should read the slides. schd.ws/hosted_files/k…
"But wait," I hear you asking. "Is Prometheus going to help with any of those problems, of complexity and end to end tracing and the request path? Does Prometheus even do observability? Isn't it metrics and preaggregated dashboards?"
That connective tissue is exactly what you needed to reason about the internal workings of your system as a developer. It tracks your user experience too.
Developers don't give a shit about system health. They care about the health of *each individual request*. Events.
I mean, ops doesn't give a shit about each and every request either, as long as the system is healthy and errors below SLOs.
My problem with it is that they claim it's more than that. Which leads to a very bad experience for users with more honeycomb-ish shaped problems.
And yes, I am aware that some proprietary implementations of metrics based systems do not have the same cardinality limitations, but doesn't invalidate my point -- they still aren't oriented around the event.
It's even better than it sounds. You don't have to double your spend. Don't have to hop from tool to tool. It just works.
Which means you can flip back and forth between exploring ("find me an example of this bug"), tracing ("now trace it"), and exploring ("who else was affected?").
Holy grail ✋
I'd expect better from our fellow vendors, except.. never mind, I guess this all makes sense. 🖕 just keep muddying the waters, bros