Jessica Joy Kerr Profile picture
Oct 25 15 tweets 7 min read
How to build reliable systems under unpredictable conditions

@MrBWilms is excited to be at #QConSF :-) Image
Normal testing is happy-path testing.

@MrBWilms #QConSf
Reliable: consistently good in quality or performance.
Reliable: able to be trusted.

@MrBWilms #QConSF

Note that he doesn’t say “perfect.” Good isn’t perfect, but it can be _real_.
And trust is not assuming perfection.
What are the conditions we are dealing with?
(8 fallacies of distributed computing)

the network is NOT reliable, secure, free, quick, unlimited, static, homogenous, or the product of one admin.

So our conditions are NOT predictable.
Unit testing is testing the bubbles. Not the connections, and especially not the unpredictable connections.

@MrBWilms #QConSF
Production is not a beautiful place.

@MrBWilms #QConSF

(I was too slow to catch the beautiful picture of fire that said “Production hates you”)
“My customer is not calling me."

This is not a strong metric.

Let’s be aware of production problems before that.

@MrBWilms #QConSF
The cycle of death: incident -> fix -> deploy -> something else went wrong

“Welcome to my life."

@MrBWilms #QConSF Image
we need

1. Testing
2. Resilience

Resilience is an outcome... a system’s ability to recover from a fault and maintain dependability

@MrBWilms #QConSF Image
… then @MrBWilms does a demo of steadybit, which is
- injecting a “blackhole attack” of taking out a service, then taking out DNS
- making the outcome legible: which services went down? Which ones are getting socket timeouts?
Time for some timeouts, circuit breakers, fallbacks, bulkheads for isolation...

and retries. “Be careful with retries.” You can cause more problems with a retry storm.

@MrBWilms #QConSF
“You have transformed an incident into an acceptance test.” by using chaos engineering to replay an incident.

using @steadybitHQ
There’s a cute graph of HTTP activity, and you can kind of see what kubernetes is doing. An insight into one of the universe’s unending mysteries Image
Test under real conditions in a safe environment.
Do it continuously. Like unit testing, integration testing, load testing.
Don’t be haunted by production.

@MrBWilms #QConSF
Congrats to @steadybitHQ for their first big US conference sponsorship.
Contact them for workshop on these principles in your company, and they can show how to

Express expectations about what your system should do under adverse conditions, and then verify that behavior.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jessica Joy Kerr

Jessica Joy Kerr Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jessitron

Oct 27
Software is magic because it scales so well.
I can take the output of my brain and scale it to the world.

@KentBeck
If we go from Idea to Behavior change to new Idea…
how quickly we can do that depends on the structure.
@KentBeck Image
If we go Idea to Behavior to Idea to Behavior
as fast as we can,
it’s gonna get slower and slower and then the developers will get frustrated and leave and the new developers will be even slower…

So sometimes, we make a structure change before the behavior change.
@KentBeck Image
Read 14 tweets
Oct 26
SRE teams try to keep toil under 50%.

Only 50% of work that has no enduring value...

@DivineOps #QConSF Image
SREs in the audience? (Dozens of hands)
Experienced SREs? (Like 2.5 hands)
We @RedHat used to ship products. Build a thing, package it, send to customers. Then it was their problem. Customer hires a consultant or figures it out.

Now we mostly ship services. Now it’s our headache, reliability and uptime etc. It’s different

@DivineOps #QConSF
Read 19 tweets
Oct 26
In which @mipsytipsy speaks up about the Engineer<->Manager Pendulum

#QConSF Image
The team deserves someone
who wants to manage people.
who is not bitter about meetings
who is interested in sociotechnical systems and nurturing careers
whose technical skills are strong enough to evaluate their work.

@mipsytipsy #QConSF
And each of us deserves a long, interesting career.

Trick: don’t self-identify as a manager OR as engineer.

Look at yourself as a Technologist
Or Technical Leader
… who needs engineering AND management skills.

@mipsytipsy #QConSF
Read 27 tweets
Jul 19
We have known how to build software for a while now, and the question is,
Why don’t we?
@bethcodes #Agile2022
I look at SAFe and I see us reinventing Taylorism from first principles.
@bethcodes
#Agile2022
The alternative typically presented is structurelessness.
Let every team do exactly what it wants, whoever keeps doing it longer without getting fired wins the argument.

People don’t feel safe without structure.
@bethcodes
Read 22 tweets
Jul 18
The cycle of no-improvement:
Bad estimates? get better at estimates!
Unclear requirements? get better at requirements!

Most things we deal with as problems are not actual problems.
They’re indications of problems. @WoodyZuill #Agile2022 ImageImageImage
If you can’t solve the problem, maybe you can overwhelm it.
Focus on what’s good. Make that bigger! ImageImageImage
the pictures are worth it alone. Andrea Zuill’s pictures are the best ImageImage
Read 4 tweets
Jul 10
A ten-principle checklist for socio-technical design

by Albert Cherns, quoted by Jackson in Critical Systems Thinking

paraphrased by me, with commentary for software teams 🧵
Compatibility
the process of design is compatible with its objectives. Want democracy? use participative design.

The means match the ends.
You don't get autonomous teams by specifying Scrum.
Minimal Critical Specification
of the who and how of carrying out the work, only the essentials are decided up-front.

Figure out the basics, and plan on figuring out more later, within the team. Keep detailed decisions near the work.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(