First talk of the day on the #QConLondon production track, by @glenathan and a challenge: can we build observable services without logs?
"We needed to build a new service in Go, without our usual existing scaffolding in Clojure... That led to some bikeshedding but also gave a chance for experimentation!" @glenathan#QConLondon
"Before this, we spent a lot of money to know what our applications were doing in production"
Giving up logs, for an event-first approach. "Events are just good logs" @glenathan#QConLondon
Totally agree with @glenathan: it's ok to "clutter" your code with telemetry code, because it's right there in front of you, and encourages to do it consistently. Other approach like aspect oriented can hide it away, and gets forgotten.
Telemetry first! #QConLondon
It's not only about high-cardinality (eg user id...) but expressing more about units of work with good span names, gives a solid high level view of traces. @glenathan#QConLondon
... But still do the high cardinality stuff! Throw all information, context and ids available at that stage (it's doesn't cost anything extra in a volume-based pricing!) @glenathan#QConLondon
Instrumenting with trace first doesn't mean we can't have logs, especially when developing locally or running tests. With a bit of magic on pretty printing all of the json events, and outputting events during failed tests. @glenathan#QConLondon
So that's logs and traces from events. Next is metrics...which can also be derived from the events, with the same source of truth. @glenathan#QConLondon
So the 3 pillars are actually all the same thing, just from a different angle. @glenathan#QConLondon
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Here we go, last talk of the day on the production track at #QConLondon, with @rdelvira and "an entertaining outage story" (his own words) when slack rolled out DNSSEC
"Who here tried to rollout DNSSEC?, Ok one person... Now how failed when trying to rollout DNSSEC? Welcome to the club!" 😂 @rdelvira#QConLondon
"We planned DNSSEC carefully, with the necessary changes and replicated most of our DNS use cases... And you'll see later why I said 'most'..." @rdelvira#QConLondon
Continuing the #QConLondon production track with @yurynino, and using visual metaphors to understand our production data in a different way.
"In our field, observability is about humans and about how humans interact with technology" @yurynino#QConLondon
Collecting metrics and signals are only one part of the solution - observability has to come with good visualisation, and engineering a solution for humans. @yurynino#QConLondon
"Observability is the capability to continuously generate and discover actionable insights based on signals from the system under observation with the goal to influence that system" and that's for both people (eg debugging) and automation (eg autoscaling) @mhausenblas#QConLondon
Observability can go beyond usual metrics, logs and traces: @mhausenblas introducing profiles and eBPF #QConLondon