I'd like to sharpen up a couple small points.
Tests are to software engineering as monitoring is to operations. Always cover your known unknowns/failures.
In prod.
You should not invest any time into chaos engineering, experimentation or anything else. Fix this first.
(Most of you have never actually had or seen this kind of observability, which is... more than a little terrifying to me.)
This is the other wildly underinvested area. What's the proximate cause of most outages? "We intentionally changed something/we shipped new code" and yet we still deploy with Capistrano lol
Write tests, run your tests, but don't delude yourself into thinking tests will save you.
Oops gotta go