, 12 tweets, 3 min read
My Authors
Read all threads
Again, making it so things never break is NOT THE GOAL.

Making it so many things can break before users are impacted is the goal. Making it so that any user impact is glaringly obvious and easy to identify and confirm and mitigate is the goal.
We have spent decades getting engineers used to developing thru the lens of their test suite.

Now we just need to expand that a smidge...and develop thru the lens of their instrumentation in production. Build for reality, not a simulacrum.
Observability-driven development, not test-driven development. Because code is just the beginning.

Reality is code plus architecture and infrastructure, time and elapsed time, dependencies, method of deployment, user activity, and any other concurrent activity.
Running lots of tests can increase your confidence in some piece of code.

What they can't do is tell you how confident in your confidence you should be, or how easy it will be to validate or find any bugs, or how many are impacted by the bug, and on and on. You need prod.
You need your engineers drilled in instrumenting to understand ✨every commit✨. How might it break or degrade? How will they know if and when it does?

The *overwhelming majority* of bugs are far too small and subtle to trip a monitoring check and page someone. (Thank God.)
And the answer to this mismatch is NOT, "ok so add a million more alerts to page on every edge case." Fuuuck. That.

The answer is to go and look at the shit you just deployed, thru the instrumentation you shipped with it, and verify it is working as you intended.
Say you just shipped a storage engine improvement to compact columns with lots of strings.

You might make sure your instrumentation is capturing column data type, before size and after size, a was_compressed flag, time elapsed compressing, compression format,
(not done yet) also userid, app id, datetime compression ran, any errors or warnings, why it wasn't eligible for compression if it wasn't attempted, any stats on fragmentation or physical layout, location, read/write access time and modification time, relevant indexes....
And as you're rolling it out, you might ship first to 1% (nothing scarier than storage format changes!) and then watch a graph showing old build_id vs new build_id errors and requests.

Obviously, you can watch for elevated errors in the newer version. But also:
1) is compaction running?
2) is it running only on the right data types?
3) is it reclaiming space?
4) what errors or warnings is it generating?
5) look for some data that should be skipped or bailed on. Is it?
6) go look at some data from the perspective of a user. Look ok?
How much more confidence do you have in what you just shipped now? Quite a bit.

And if you know you can use feature flags to immediately enable/disable the code, and history tells you that most bugs are caught swiftly and trivially... Well.. This is a bad example 😬
...because I would never ship a literal data storage engine format change on a Friday, or without a lot more paranoia.

BUT! I would totally ship this instrumentation in "dry run" mode and let it run for the weekend to see what it WOULD do. 🥰
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Charity Majors

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!