@gphat "We are not going to define Observability today" 👍
@gphat The normal zone, the weird zone.
@gphat The robot needs to understand all of the states in the system.
@gphat Promote your work. Make sure people know about the work you are doing.
@gphat Teach basic skills. Your first interaction with tooling should not be at 3am under duress.
@gphat Stretch your systems. Resilience engineering. Chaos testing. Gamedays.
@gphat Measure usage. Capex and Opex, it costs you money either way. Observability will be one of the most expensive parts of your stack. Think about charge back to teams. At prev company, 99.8% of metrics written never got read.
@gphat Measure read usage. People are using those metrics.
@gphat Track need in detail. Mttr, mttd, big systems change a lot. Only way to learn about failure is to fail. Where are people saying "how does that work?". Measure metrics of the successful systems also. What near misses are there.
@gphat Instrument change. Big boom in conference hall. Change management, feature flags. Humans become more important when you automate things. Avoid humans being out of the loop. Instrument automation.
@gphat Stygmergy. Follow the cookies. Establish "your way". Better understanding through practice. Be prepared when finance asks about spends. Examine success as much as failure. Instrument change events.
Some of all of these should mean a more valuable Observability program.