Tweet

https://twitter.com/robbiedelprete/status/1482061871628763136

More from @mipsytipsy

Charity Majors

@mipsytipsy

15 Jan

@dangolant

LARGE SYSTEMS USUALLY OPERATE IN FAILURE MODE, via @dangolant

Or like I used to say, your distributed system exists in a continuous state of partial degradation. There are bugs and flakes and failures all the way down, and hardly any of them ever matter. Until they do.

This is why observability matters. SLOs make large multitenant systems tractable from the top down, but observability makes them comprehensible from the bottom up.

Maybe only .001% of all software system behaviors and bugs ever need to be closely inspected and understood, but that tiny percentage defines the success of your business and the happiness of your users.

And you CANNOT predict what will matter in advance.

Read 8 tweets

Charity Majors

@mipsytipsy

13 Jan

bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have limits
bodies have l

I was homeschooled, and escaped to college when I was 15. I was a seething mess of pent-up rage and ambition (and undiagnosed ADHD) who had never done any sort of formal schooling. I had no idea what I wanted to do other than ALL OF IT. RIGHT NOW.

You're supposed to register for 12-15 credits, so I promptly registered for 24 (plus I had a piano performance scholarship I was supposed to maintain).

I didn't have any family support, money, or ability to take out loans, so I signed up for three local minimum wage jobs.

Read 22 tweets

Charity Majors

@mipsytipsy

9 Nov 21

https://twitter.com/arclight/status/1454013188366798849

it's a bit counterintuitive, but the better-instrumented and the more mature your systems are, the fewer problems you'll find with automated alerting and the more you'll have to find by sifting around in production by hand.

https://twitter.com/arclight/status/1454013188366798849

https://twitter.com/arclight/status/1454014973659729921?s=20

Becoming well versed in exploring your systems via production tooling has never been a more important part of being a good engineer.

It's also never been *easier* to derive rich insights. (why, in MY day, all we had was sar and *stat AND WE LIKED IT)

https://twitter.com/arclight/status/1454014973659729921?s=20

Apologies to whoever originally made this awesome gif about testing in production, but it holds just as true for alerting and debugging. 🙃

Read 15 tweets

Charity Majors

@mipsytipsy

29 Oct 21

hey man, you know me, I don't like talking smack about others, and I'm not sitting over here whittling and looking for excuses to litigate people's usage of the word observability.

but then there's this chronosphere.io/wp-content/upl…
and this chronosphere.io/learn/explain-…

and i go 🤯🥵😵‍💫🤯

they are literally describing monitoring. good ol', 30-year-old traditional monitoring.

* Notify
* Triage
* Understand

this is a company with a billion dollar valuation and they literally don't know the difference between monitoring and observability

i mean, we can all argue over the subtleties of observability and that's relatively understandable, but doesn't fucking EVERYBODY know what *monitoring* is and does?

cause it hasn't changed. in like.. ever

Read 9 tweets

Charity Majors

@mipsytipsy

20 Oct 21

https://twitter.com/honeycombio/status/1450795843863334922

good morning kittens, guess what honeycomb been up to? ? oh not much really, we've only just STAVED OFF OUR OWN INEVITABLE DEMISE AND DESTRUCTION, 🔥YET AGAIN🔥.

We can hardly even fail if we try for another two, three years now! Take that, heat death of the universe!🪐🌑 💜

https://twitter.com/honeycombio/status/1450795843863334922

(There, second time's the charm. Sorry!)

I wonder if it will ever stop feeling so bizarre just to still exist. 🙃 The list of people we are grateful for and permanently indebted to gets longer and and longer and longer with each passing year.

From our investors, who are principled, curious, endlessly thoughtful and helpful -- nothing like the stories and stereotypes about VCs that tend to filter down to eng circles -- to our family members, especially anyone who had to live with us those early few years 😬

Read 5 tweets

Charity Majors

@mipsytipsy

27 Sep 21

I've been talking to lots of teams about their observability journey, or how they managed to dig themselves out of hell and get a handle on shit. Some patterns definitely emerge.

The first thing many teams look at is the on call rotation. (Smart; heading straight for the pain.)

Folks are worn out, product is upset whenever something unexpected comes up -- it's a bad scene, because they're too tightly coupled. ANY non feature work means a deadline slips.

So the first thing they do is enact a simple rule: no product work during on call weeks. Period. Those weeks are for fixing and maintaining the system.

This forces leadership to plan for using 75-85% of full capacity as a steady state. Whew; now we have some flex in the system.

Read 29 tweets

Share this page!

Charity Majors

Try unrolling a thread yourself!

More from @mipsytipsy

Charity Majors

Charity Majors

Charity Majors

Charity Majors

Charity Majors

Charity Majors

Did Thread Reader help you today?

Like this author's thread?