Today at #srecon, @allspaw and @ri_cook give deep insight on real tools, incident timelines, and clumsy automation.
But not in person. 😭
Great tools (as opposed to machines) are near to hand and conform to the person who wields them. Like a hammer, or `top`. Yeah.
They are opinionated, but not prescriptive.
(machines do what they do, and you conform to them)
In software, tools like `top` help us see what’s going on in the digital space. @ri_cook et al see our work taking place on two sides of a divide. There’s meatspace (where we are) and digital space (where the software runs). You can’t reach out and feel digital stuffs.
Between human & digital space is the Line of Representation.
Screens, mice, keyboards.
“Everything we claim to know about the systems we run comes from inferences based on representations.” @allspaw@ri_cook#srecon
We have so many tools that work Below the Line (of Representation). `top`, @honeycombio, everything we look at to find out what’s going on down there in software-land.
Are there tools that help us keep track of what’s going on Above the Line, in the people?
(we’re talking about _tools_ here, not machines. We want something that we willingly reach for, that molds to our needs instead of molding us to its needs.
So no, your JIRA workflow is not it.)
There’s a lot more to an incident than the official [Declared Incident]. That’s a small part of a timeline that includes sources, buildup, [troubleshooting, recovery], sensemaking, reviews, etc
Tools that help humans work could remind us: who can we call to help? who needs brought up to speed? what options do we have right now?
and later help us make sense of the whole affair. @ri_cook@allspaw#SREcon
But watch out!! Beware the clumsy automation:
any automation that requires more work at the busiest time in order to save work at a less-busy time is a _clumsy automation_.
Like if a plane thingie helps during cruising but asks for input during takeoff 😾
If an automation asks for attention during the crunch time of an incident, so that reporting is easier later — fail! Clumsy automation!
and the best part, from q&a, someone used the O-word in a question:
“There are no ‘objective’ incidents. They are defined by our desires and goals. Incidents are a social construction.” @ri_cook#srecon
“the efforts you all go to, to define a start and stop time for an incident, represent your need to project a degree of control.
“And the tortured language you use to define something called incident severity! 😱” @ri_cook#SREcon
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I laugh at people who talk about “exactly-once delivery”
The specs that claim it have been proven wrong.
But we have methods (like idempotency) to do things well. @mjpt777#YowLondon
Make handover/resumption protocols.
“This is what I thought I sent to you last, did you get it?”
“Here’s what I got from you last, let’s work it out from there”
If we go from Idea to Behavior change to new Idea…
how quickly we can do that depends on the structure. @kentbeck
If we go Idea to Behavior to Idea to Behavior
as fast as we can,
it’s gonna get slower and slower and then the developers will get frustrated and leave and the new developers will be even slower…
So sometimes, we make a structure change before the behavior change. @KentBeck
SREs in the audience? (Dozens of hands)
Experienced SREs? (Like 2.5 hands)
We @RedHat used to ship products. Build a thing, package it, send to customers. Then it was their problem. Customer hires a consultant or figures it out.
Now we mostly ship services. Now it’s our headache, reliability and uptime etc. It’s different
The team deserves someone
who wants to manage people.
who is not bitter about meetings
who is interested in sociotechnical systems and nurturing careers
whose technical skills are strong enough to evaluate their work.