Jessica Joy Kerr Profile picture
Oct 26 19 tweets 10 min read
SRE teams try to keep toil under 50%.

Only 50% of work that has no enduring value...

@DivineOps #QConSF
SREs in the audience? (Dozens of hands)
Experienced SREs? (Like 2.5 hands)
We @RedHat used to ship products. Build a thing, package it, send to customers. Then it was their problem. Customer hires a consultant or figures it out.

Now we mostly ship services. Now it’s our headache, reliability and uptime etc. It’s different

@DivineOps #QConSF
SRE: “hopefully people get paid more for having this title”

The innovative part of SRE is: explicit agreements that align incentives. Between dev, ops, business.

@DivineOps #QConSF
SLA = financially-backed availability.
The contract has a % of cost that is refunded if availability is lower than advertised.

This aligns incentives between vendor and customer.
(So much as a single metric can)

@DivineOps #QConSF
SLO = targeted reliability. What do we care about?
(Service level objective)

Example: availability from a customer’s perspective

@DivineOps #QConSF
SLI = actual reliability.
Without good monitoring, you don’t know whether the service does what the user expects it to do.

Monitoring improves as internal users catch stuff and move it into automation (only problems seen before)

@DivineOps #QConSF
SLO = business approved reliability.

Explicitly aligns incentives between Business & Engineering.

@DivineOps #QConSF
Error budget = acceptable level or Unreliability

When it’s gone, developers shift focus from delivering features to improving reliability.
… which they never had the incentive to do before!

@DivineOps #QConSF
100% availability… no. Impossible, unnecessary, extremely expensive.
99.999% … will your users even notice?
The background error rate of the internet is .01-1% (depends on ISP)

@DivineOps #QConSF
Things we got wrong about SRE:

the book says “it’s what you get when a software engineer designs an operations team” … no.

This led to:
Hire developers to do ops things, and get effective SRE? …no.

@DivineOps #QConSF
Why didn’t Ops automate themselves out of a job?
because they didn’t have a software engineer? no.

They didn’t have APIs! The only way to update a registry was a human clicking.

@DivineOps #QConSF
Kudos to Jeffrey Snover, who fought for PowerShell automation for Windows administration.

Google explicitly built Borg to be automatable.

Then people got the message: infrastructure as code.
Puppet, chef, etc.

@DivineOps #QConSF
As an industry, we worked really hard to make the tools to make this automation happen.
THAT is what makes SRE possible.

Consistent APIs and reliable monitoring are prerequisites to automation.

@DivineOps #QConSF
Second thing we got wrong:

Toil is bad, it’s useless, eliminate it.

Are we striving for a human-less system?

@DivineOps #QConSF
Humanless systems don’t maintain themselves. No matter how automated, how well structured — they don’t maintain their structure.

@DivineOps #QConSF
People do troubleshooting, responding, adapting, noticing
to keep the system functioning: it looks a lot like toil.
Are people rewarded for that?

Some SREs do more automation, others more on-call keeping the system running and learning from it.

@DivineOps #QConSF
Cloud provides an industry standard for consistent infrastructure-level APIs
❤️☁️❤️

Also kubernetes 🤩

@DivineOps #QConSF
Align your toil where your business value is.

Below that, call a PaaS API.

@DivineOps

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jessica Joy Kerr

Jessica Joy Kerr Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jessitron

Oct 27
Software is magic because it scales so well.
I can take the output of my brain and scale it to the world.

@KentBeck
If we go from Idea to Behavior change to new Idea…
how quickly we can do that depends on the structure.
@KentBeck Image
If we go Idea to Behavior to Idea to Behavior
as fast as we can,
it’s gonna get slower and slower and then the developers will get frustrated and leave and the new developers will be even slower…

So sometimes, we make a structure change before the behavior change.
@KentBeck Image
Read 14 tweets
Oct 26
In which @mipsytipsy speaks up about the Engineer<->Manager Pendulum

#QConSF Image
The team deserves someone
who wants to manage people.
who is not bitter about meetings
who is interested in sociotechnical systems and nurturing careers
whose technical skills are strong enough to evaluate their work.

@mipsytipsy #QConSF
And each of us deserves a long, interesting career.

Trick: don’t self-identify as a manager OR as engineer.

Look at yourself as a Technologist
Or Technical Leader
… who needs engineering AND management skills.

@mipsytipsy #QConSF
Read 27 tweets
Oct 25
How to build reliable systems under unpredictable conditions

@MrBWilms is excited to be at #QConSF :-) Image
Normal testing is happy-path testing.

@MrBWilms #QConSf
Reliable: consistently good in quality or performance.
Reliable: able to be trusted.

@MrBWilms #QConSF

Note that he doesn’t say “perfect.” Good isn’t perfect, but it can be _real_.
And trust is not assuming perfection.
Read 15 tweets
Jul 19
We have known how to build software for a while now, and the question is,
Why don’t we?
@bethcodes #Agile2022
I look at SAFe and I see us reinventing Taylorism from first principles.
@bethcodes
#Agile2022
The alternative typically presented is structurelessness.
Let every team do exactly what it wants, whoever keeps doing it longer without getting fired wins the argument.

People don’t feel safe without structure.
@bethcodes
Read 22 tweets
Jul 18
The cycle of no-improvement:
Bad estimates? get better at estimates!
Unclear requirements? get better at requirements!

Most things we deal with as problems are not actual problems.
They’re indications of problems. @WoodyZuill #Agile2022 ImageImageImage
If you can’t solve the problem, maybe you can overwhelm it.
Focus on what’s good. Make that bigger! ImageImageImage
the pictures are worth it alone. Andrea Zuill’s pictures are the best ImageImage
Read 4 tweets
Jul 10
A ten-principle checklist for socio-technical design

by Albert Cherns, quoted by Jackson in Critical Systems Thinking

paraphrased by me, with commentary for software teams 🧵
Compatibility
the process of design is compatible with its objectives. Want democracy? use participative design.

The means match the ends.
You don't get autonomous teams by specifying Scrum.
Minimal Critical Specification
of the who and how of carrying out the work, only the essentials are decided up-front.

Figure out the basics, and plan on figuring out more later, within the team. Keep detailed decisions near the work.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(