What can we learn from ALL the incidents? @courtneynash at @verica_io compiles reports from lots of companies into the VOID: Verica Open Incident Database. #SREcon
While every incident and every company is different, the distributions have the same shape. They are “positively skewed:” more short incidents than long ones.
Here’s the thing:
If you don’t have a symmetric distribution (and incident’s don’t),
then central measures like mean, median, mode
don’t represent your data. @courtneynash#srecon
That’s why MTTR is at best useless.
If you improve your incident response times, you might lower the MTTR. Or not. That metric cannot represent this data.
Also, you’d think long = bad, but “long” doesn’t correlate with “high impact to customers.”
What can we do instead?
Focus on learning, with humans as solutions.
Learn from our successes! Near misses are great opportunity to learn from incidents that didn’t happen. @courtneynash#SREcon
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I laugh at people who talk about “exactly-once delivery”
The specs that claim it have been proven wrong.
But we have methods (like idempotency) to do things well. @mjpt777#YowLondon
Make handover/resumption protocols.
“This is what I thought I sent to you last, did you get it?”
“Here’s what I got from you last, let’s work it out from there”
If we go from Idea to Behavior change to new Idea…
how quickly we can do that depends on the structure. @kentbeck
If we go Idea to Behavior to Idea to Behavior
as fast as we can,
it’s gonna get slower and slower and then the developers will get frustrated and leave and the new developers will be even slower…
So sometimes, we make a structure change before the behavior change. @KentBeck
SREs in the audience? (Dozens of hands)
Experienced SREs? (Like 2.5 hands)
We @RedHat used to ship products. Build a thing, package it, send to customers. Then it was their problem. Customer hires a consultant or figures it out.
Now we mostly ship services. Now it’s our headache, reliability and uptime etc. It’s different
The team deserves someone
who wants to manage people.
who is not bitter about meetings
who is interested in sociotechnical systems and nurturing careers
whose technical skills are strong enough to evaluate their work.