, 27 tweets, 15 min read Read on Twitter
#ChaosDay19 @mipsytipsy up next with Closing the Loop on Chaos with Observability. [ed: anyone surprised that @mipsytipsy is discussing this?]
#chaosday19 @mipsytipsy is an engineer, founder of @honeycombio, big background in DB Reliability Engineering (she wrote a book)
#chaosday19 @mipsytipsy : "Chaos is a fancy marketing term for running tests later in the software dev lifecycle"
#chaosday19 @mipsytipsy : "I think this is really just 'lets test in production'"
#chaosday19 @mipsytipsy : You can test both in prod and nonprod! You can test the... out of it in non-prod. [ed: meme game is strong here. Lots of parallels to @nora_js traps from earlier today]
#chaosday19 @mipsytipsy : Deploying code is not a binary switch, deploying code is a process of increasing confidence in your code.
#chaosday19 @mipsytipsy : "Deploying is turning on the oven to start baking your production cookies" [ed: hopefully you're baking in small batches at first!]
#chaosday19 @mipsytipsy : Why does our mental model fall apart after a series of deployments and flags
#chaosday19 @mipsytipsy : "Largely due to increasing systems complexity. How do we know this is true? Because I drew a graph to show that it is"
#ChaosDay19 @mipsytipsy differences in complexity of systems within a decade
#chaosday19 @mipsytipsy : "We are all distributed systems engineers now. Our predictions of how the system will perform is always a lie... Whenever something truly unpredictable happens it's all hands on deck to try and figure out what's going on (normally networking)"
#chaosday19 @mipsytipsy : "Distributed systems are particularly hostile to being cloned or imitated (or monitored). Is anyone going to spin up a clone of Facebook to test it? Diminishing returns fall off dramatically"
#chaosday19 @mipsytipsy : "Distributed systems have an infinitely long list of almost-impossible failure scenarios that make staging environments *particularly* worthless" [ed: systems are in constant state of unique failure! Not sure I agree with 'particularly worthless' tho]
#chaosday19 @mipsytipsy : "Without observability you don't have chaos engineering, just chaos. If you can't see what's going on, you are not tall enough to ride this ride" [ed: preach!]
#chaosday19 @mipsytipsy : "Monitoring systems have not changed significantly in 20 years - @grepory "
#chaosday19 @mipsytipsy - getting into the definition of Observability from control theory "a measure of how well internal states just by asking questions from the outside" [ed: based on above/below line I'd argue this is the only way we have *any* understanding of the sys]
#chaosday19 @mipsytipsy : "You have an observable system when your team can quickly and reliably track down any new problem with no prior knowledge or scar tissue that you've built from historical incidents"
#chaosday19 @mipsytipsy : "#o11y takes the perspective of the sofware"
#chaosday19 @mipsytipsy : "You *have to test in production* This means being able to run tests that you do not know the answer to" [ed: once we admit that we're in constant flux, I'd argue that most if not all 'touches' of production are in fact... a test]
#chaosday19 @mipsytipsy : "Every time an issue occurs you could technically add an alert/monitor for it. If it happens again you can catch it!" [ed: but then you have a tangled pile of alerts and monitors that rot away over time and you'll eventually deal with alert fatigue]
#chaosday19 @mipsytipsy : "Context is key when debugging things. Events provide more context than metrics including high cardinality (must have!)"
#chaosday19 @mipsytipsy : Non-interactive dashboards are garbage... maybe have one or two on a wall. These are just artifacts of past failures and don't enable debugging.
#chaosday19 @mipsytipsy : "Debugging requires getting to individual requests" Aggregates are a kiss of death in this space "Once you smush everything into an aggregate you can't unsmush it"
#chaosday19 @mipsytipsy : Giving developers observability tooling they need to become fluent in production and to debug their own systems is critical.
#chaosday19 @mipsytipsy : "We must watch our code run in production. If you don't watch your code on fire you won't know what you're doing"
#chaosday19 @mipsytipsy : "Tonight we test in prod!" /fin
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Tom Leaman @ #chaosday19
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!