Making distribute tracing easier with more sophisticated visualizations - @YuriShkuro
The first is color coded by service graph. The second is a heat map #QConNYC
Now @YuriShkuro is talking about a tool that compares traces.
[ ed- omg this: just yesterday I was talking to a vendor at QCon and was wondering if it’d be possible to compare traces. They said their product didn’t offer this. IMo this is the most important aspect of tracing]
And the diff tool deals with aggregate traces. You can then drill down into an individual trace. @YuriShkuro at #QConNYC
[ed - this. This so freaking much. Starting with a trace is like being in a hiding to nowhere. Need to begin with an aggregate view.]
I missed the first part of this talk, but everything I saw was 💯.
Tracing only becomes useful when you can surface the relevant information from a trace. That requires aggregate analysis.
But this isn’t without its flaws. Finding the right baseline might be hard.
Here’s a tool that’s internal to Uber that helps with “root cause” (sic) analysis to drill down from business metrics to app level telemetry. @YuriShkuro at #QConNYC
One of the challenges of taming microservices complexity is dealing with data.
Tracing can help here by enabling building lineage graphs h/t @palvaro
The hardest part of doing all this analysis is application instrumentation.
Doing the analysis is easy for us (Uber) - getting almighty fidelity data is the challenge. - @YuriShkuro at #QConNYC
In summary:
- tracing really becomes usable when you have creative visualizations
- engineers don’t really know how their services work. Tracing helps unlock unparalleled insights.
I agree that misguided to suggest the only way for managers to “be technical” is by coding.
But boy, some management folks seem so virulently anti-coding in a way that’s just absurd.
Coding is definitely one way (though not the only way or even the best way) to “be technical”.
There are many ways “leadership” can contribute to the technical betterment of a project (and improve your own credibility) without writing production code:
- build small side projects using libraries your team authors, giving them feedback on code quality, testing, design etc.
There are many benefits to being able to read and use your team’s code.
People working very closely on projects given deadlines etc can have blind spots.
A strong technical leader who can comprehend code or systems can point these out.
Every time I say something - anything at all - about software quality or dev productivity, I have legions pontificating about unit tests and documentation in my mentions.
So here’s another (slightly contrarian) take: unit tests/docs aren’t always the best yardstick of “quality”
In other words, if you’re looking at a project with inadequate docs/test coverage, and immediately think the way to fix it or improve it is by adding more tests/docs, then it’s possible your immediate impact on the project or team productivity might be rather meagre.
I used to think this way years ago myself; when seeing some code that was ambiguous or where it wasn’t possible to test it easily, my immediate instinct was to “fix it”.
Learning how to work around these issues without “fixing” it was a far more valuable skill.
So many companies, large and small, end up solving all the wrong problems or the least important ones. Especially common when building infra tools/software.
I see this happen again and again, and we then wonder why the state of the art hasn’t improved in the past 5 years.
When there’s a problem space where frankly everything sucks at every layer, it’s common to try to think the way to tame this space is by taking a bottom-up approach.
this approach fails, time and time again. Because it almost always doesn’t provide any immediate value to users.
Most infra software is hard to use for even other infra engineers.
The UX almost universally sucks, whether it’s APIs, protos, yaml, UI, dashboards.
The thing is, a product doesn’t need to solve *all* of these problems to be great and provide immediate value to users.
The paper I've been looking forward to the most is now out: zero downtime deployments at Facebook.
Disruption free release of services that speak different protocols and serve different types of requests (long lived TCP/UDP sessions, requests involving huge chunks of data etc.)
"Socket Takeover" should be familiar to traffic nerds. Transferring the listening socket over a Unix Domain Socket with ancillary message (CMSG) + SCM_RIGHTS is *precisely* how HAProxy does seamless reloads.
What *is* novel is how they transfer UDP (QUIC) socket fds.
The second approach is one called Downstream Connection Reuse used for long lived persistent connections. This involves rendering in path proxies stateless and tunnel requests over H2.
- inexperienced managers aren’t probably the best suited to hire and mentor junior engineers, unless these managers themselves have mentorship/guidance from senior managers/leadership folks. A bad manager can be a horrendous formative experience.
- mentoring junior devs remotely presents unique challenges. A lot of what I learned as a junior engineer was via osmosis - listening to conversations other senior engineers were having, even if I wasn’t a part of the conversation. This is hard to replicate in a remote setting.
Still remember the days when @mattklein123 claimed “developer productivity was one of the highlights of modern C++” when introducing Envoy and people raising eyebrows at this claim.