And now, thanks to @jasonallen206, I have a third link: m.subbu.org/incidents-tren…
1) change is the trigger in 2/3 of outages
2) config drift is deadly
3) we don't know why things fail
4) infra changes are a shrinking %
5) certs lol
The distributedness of systems is increasingly their most salient characteristic.
Hardware isn't failing any less, it's just been successfully made into someone else's job. Ops is moving up the stack.
"..except that's hard? So invest in observability and spend more time ACTUALLY UNDERSTANDING your systems."
Or as I keep barking, invest real developer hours into instrumentation, your CI/CD pipeline, and deployment code, and practice observability-driven development.
One last thing. You notice what isn't mentioned? Better monitoring. Monitoring is ~useless to developers shipping services and debugging code.
Only @honeycombio gives you that. That's why our customers give such breathless happy quotes. 💖🐝
That said, it's pretty damn easy -- just install the gem or go get package or whatever. If you wanna try us, here are three links:
🌈Play in a sandbox, honeycomb.io/play
🌈Sign up for a trial, ui.honeycomb.io/signup
Onward, unto the undebuggable breaches of tomorrow. 🐝📈❤️