Tweet

Jessica Joy Kerr

Mar 14 • 7 tweets • 7 min read

@courtneynash

What can we learn from ALL the incidents?
@courtneynash at @verica_io compiles reports from lots of companies into the VOID: Verica Open Incident Database. #SREcon

@courtneynash

“Software runs the world, and you run that software.”
#SREcon @courtneynash

While every incident and every company is different, the distributions have the same shape. They are “positively skewed:” more short incidents than long ones.

@courtneynash

Here’s the thing:
If you don’t have a symmetric distribution (and incident’s don’t),
then central measures like mean, median, mode
don’t represent your data.
@courtneynash #srecon

That’s why MTTR is at best useless.

@courtneynash

If you improve your incident response times, you might lower the MTTR. Or not. That metric cannot represent this data.

Also, you’d think long = bad, but “long” doesn’t correlate with “high impact to customers.”

from @courtneynash #SREcon

@courtneynash

Language matters in a sociotechnical system.
How we talk about our systems changes how we think about our systems,
and that changes our systems.

@courtneynash #SREcon

#NoRootCause

@courtneynash

What can we do instead?
Focus on learning, with humans as solutions.
Learn from our successes! Near misses are great opportunity to learn from incidents that didn’t happen.
@courtneynash #SREcon

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @jessitron

Jessica Joy Kerr

@jessitron

Mar 15

@datadoghq

This afternoon at #srecon, Adam Mckaig and Tahia Khan from @datadoghq about the evolution of their metrics backend

The high-level architecture looks very familiar to me. The slightly more detailed less so — many parts!

For scale, break up incoming data, put into kafka.
hash(customer_id) -> partition_id
… but then one kafka topic gets overloaded, so…
hash(customer_id) -> topic_id, partition_id
to send to topics in different clusters.

Read 6 tweets

Jessica Joy Kerr

@jessitron

Mar 15

@allspaw

Today at #srecon, @allspaw and @ri_cook give deep insight on real tools, incident timelines, and clumsy automation.
But not in person. 😭

Great tools (as opposed to machines) are near to hand and conform to the person who wields them. Like a hammer, or `top`. Yeah.
They are opinionated, but not prescriptive.

(machines do what they do, and you conform to them)

@ri_cook

In software, tools like `top` help us see what’s going on in the digital space.
@ri_cook et al see our work taking place on two sides of a divide. There’s meatspace (where we are) and digital space (where the software runs). You can’t reach out and feel digital stuffs.

Read 13 tweets

Jessica Joy Kerr

@jessitron

Mar 14

@caseyrosenthal

As SREs, we need to build the narrative of our work’s value.
the future of all software is at stake.

@caseyrosenthal builds a beautifully illustrated narrative at #SREcon

Like Security, SRE’s value hides in all the incidents that don’t happen.
So its easy to ignore. But people and legislative bodies value it.

The Success in SRE is Silent

and if our success remains silent, our profession (and software development in general) will go the way of security: regulation.

Read 8 tweets

Jessica Joy Kerr

@jessitron

Feb 13

Today in #golang:

fmt.Println("What is truth?", true)

can output:

What is truth? false

because Go lets you do this:

`true := false`

The local variable `true` overrides the keyword!?

Go obstinately protects you from declaring an unused variable or importing an unused library.

But shadow a keyword? ooookaaaaay

Read 4 tweets

Jessica Joy Kerr

@jessitron

Dec 16, 2021

People.
There is a difference between a backend and an API.

Taking the endpoints that you wrote for your site, slapping some documentation on them and publishing it
does not make an API.

An API needs designing. It needs a conscious language and consistent conventions.
Standard auth.
Paging.
Careful error codes and messages.
Versioning.

A backend is whatever your front end needs. It should change when your front end needs it to change.
Don’t restrict it to historical behavior because other systems have grown dependent on it.

Read 4 tweets

Jessica Joy Kerr

@jessitron

Nov 20, 2021

“so that we have a single process for the entire organization”
is a death toll of software.
This is how large organizations slow themselves.

Unifying process gives every change a wide impact, and that means change must be slow.

“But architecture is important! It affects the whole organization! We must take it seriously.”

Importance implies a single process; a single process implies slowness.

Single careful process does _not_ imply the best solution is reached.
Instead, it retards iteration.

A single careful process _does_ imply a defensible solution will be reached.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Jessica Joy Kerr

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jessitron

Jessica Joy Kerr

Jessica Joy Kerr

Jessica Joy Kerr

Jessica Joy Kerr

Jessica Joy Kerr

Jessica Joy Kerr

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?