Profile picture
Tom Leaman @tleam
, 21 tweets, 6 min read Read on Twitter
Final session of my day: "Applying Principles of Chaos Engineering to Serverless" presented by @theburningmonk. It's a great chaotic bookend to my day after a start with "Breaking Containers." I'm interested to see what differences (if any) there are in the serverless world!
Like all talks about chaos we are starting with “what is chaos” from principlesifchaos.org @theburningmonk #reInvent
We're going down the analogy route this time - chaos testing is like a vaccine that inoculates against frailties in modern software. Not the first time I've heard the analogy but it's definitely my favorite!
@theburningmonk - "Chaos engineering has an image problem - there's a lot of focus on 'breaking things." Chaos isn't about breaking things... it's the opposite - it should be keeping things up. Opinion: Chaos isn't chaotic, it's methodical.
This is now my favorite meme. If you’re always breaking... you don’t have a steady state
@theburningmonk - *don't* start running experiments in prod unless you have built confidence in other less risky regions. Opinion: good correlation between this and the @adrianco & @Ana_M_Medina talk from earlier - start with a small blast radius!
Getting to containment of chaos: communicate, communicate, communicate. Especially when starting off with chaos testing any potentially impacted teams should be notified. Lets not spend time investigating "issues" that are just FIT in progress.
Other containment key considerations:
- Run tests during regular hours so folks are around
- Avoid important dates
- Make smallest change to prove/disprove hypothesis
- Make rollback easy
Reiteration from earlier today, chaos can help improve:
- People, practices & process
- Applications
- Platforms
- Infrastructure

Credit goes out to @russmiles for the model
Now we're transferring into the Serverless aspect of Chaos. If your background in Chaos is limited to some of the standard simian army (ChaosMonkey, Gorilla, Kong, etc) you may think it's about 'killing' a container, AZ, or region but it's so much more!
@theburningmonk - resiliency for serverless is even harder than standard containers. *Every* function needs to be correctly configured and secured - each boundary must be solid and there are *a lot* more boundaries in serverless #reInvent
What exacerbates the problem is that serverless architectures tend to utilize large volumes of managed services with numerous failure modes.
Common weaknesses in serverless:
- Misconfigured timeout
- Misconfigured error handling
- Missing fallback
- Missing regional failover

Most common? Latency and performance.
Common metrics for steady state are your standard Golden Signals:
- Latency
- Error rate
- Yield
- Harvest* - not sure about this one, didn't pick up on the specifics
Request timeouts are tricky! Too short and you may not provide your lambda a good chance of success. Too long and the calling function may get timed out.
Solution: set timeout based on remaining invocation time minus the recovery time
A great way to validate this methodology? Chaos testing with latency injection. This shouldn't just be applied between homegrown Lambdas but in between managed services as well.
When dealing with a shared Lambda it can be very difficult to limit blast radius when performing chaos tests.
@theburningmonk - "One of our problems is that we don't prime developers to think about failures early in the development cycle."
Serverless chaos isn't just about injecting latency. It can also be valuable to inject 'thrown errors' like HTTP 5XXs or standard error states to ensure that Lambdas appropriately.
Want to inject failure via Lambda throttling? Temporarily set reserved concurrency to 1. This will limit the number of concurrent executable Lambdas to 1... making it very difficult to get requests through!
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Tom Leaman
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!