we used to guess all the time, but now we /know/"
"we used to guess. now we know."
"i used to run crazy sed/awk/histogram scripts, now i can just look"
"we used to sweep it under the rug or it would eat up all day; now we don't have to."
everyone always used to ask, can you tell me what to look at? and i would sigh wearily and say no.
until now.
1) formulate a hypothesis about what variable could be causing a behavior
2) break down by that variable to test its relevancy
3) check, repeat as necessary
honeycomb.io/blog/heatmaps-… @FisherDanyel awesome post on this
take a spike of errors. you want to know if they were mostly to a shard, a node, a pod, a user, an ip addr...
with structured events you can do this, either by intuition or brute force.
enter bubbleup. honeycomb.io/blog/diving-in…
your eye is *immediately* drawn to whatever outliers exist, however many of them there are. it's like fucking magic.
see how easy it is to spot the problems? pure sorcery.
this is why i keep pounding on about using the beelines and structuring your goddamn data.
if you're not a honeycomb user, *you should still do this.*
*this still matters.*
if you are adopting microservices, know that you really cannot understand how your system works unless you pack the data up this way.
github.com/honeycombio/
is
lying
to
you.
and if you bought it, you were the sucker who knew better deep down and did it anyway, weren't you?
(i'm sorry)