Read on Twitter

Charity Majors @mipsytipsy

, 9 tweets, 3 min read Read on Twitter

@joab_jackson

@joab_jackson

This is an AWESOME piece by @joab_jackson.. clear, crisp overview of the industrywide lurch towards distributed systems, and some of the ripple effects as experienced by tooling and teams.

I'd like to sharpen up a couple small points.

https://twitter.com/thenewstack/status/1047229987529654272

Running untested code *is* asking for trouble. ☺️ You should always write and run local tests! I'm pointing out that they are *insufficient*, not useless.

The 80/20 rule applies here: you can catch most of the bugs with 20% of the effort, and you should absolutely *do that.*

The point I was making is that usually when people hear "test in prod" they assume you are ONLY testing in prod. That would be dangerous and stupid.

Tests are to software engineering as monitoring is to operations. Always cover your known unknowns/failures.

In a previous era when you could legitimately predict most of the failures, investing lots of eng effort into staging may have made sense.

Nowadays, it doesn't. Yet we haven't reallocated our engineering effort to reflect newer realities.

Companies need to reallocate some of these eng cycles away from pre-production to an area where returns on effort are much higher. Namely: democratizing access to prod, instrumentation and observability, tooling for experimentation, and guard rails for testing safely.

In prod.

If you don't have rich observability -- slicing and dicing raw events in real time, high cardinality, rich dimensionality -- you truly are flying blind.

You should not invest any time into chaos engineering, experimentation or anything else. Fix this first.

It doesn't matter how good a surgeon you are if you can't see what you're cutting into.

(Most of you have never actually had or seen this kind of observability, which is... more than a little terrifying to me.)

And let's talk about deploy tooling. Canaries, rolling releases, post-deploy checks.

This is the other wildly underinvested area. What's the proximate cause of most outages? "We intentionally changed something/we shipped new code" and yet we still deploy with Capistrano lol

In summary.

Write tests, run your tests, but don't delude yourself into thinking tests will save you.

Oops gotta go

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

This content may be removed anytime!

Try unrolling a thread yourself!

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?