Profile picture
Tom Leaman @tleam
, 16 tweets, 10 min read Read on Twitter
Next up: Building SRE from scratch during Hypergrowth at Coinbase with @phrawzty and @niallohiggins should be a great overview of the practice and a case study of adoption. #reInvent
Awesome delineation of the purpose of an #SRE : it isn’t to perform the firefighting, pager calls and incident response. It’s in place to help prevent and reduce these. #reInvent
#SRE isn’t just building observability into technical systems but your human and organizational systems as well. Took measurement is critical. #reInvent
@niallohiggins - “Move away from reaction and waiting for the house to burn down to more proactive activities”
SREs provide a backpressure mechanism (via SLIs and SLOs) to organizations to help reprioritize resiliency work over feature development. Opinion: The data driven aspect of this is *critical* - it makes priorities quantifiable #reInvent
I’ve tweeted about this a few times: the #GoldenSignals and SLI/SLOs GREATLY simplify complex systems operations. There are millions of ops metrics to observe in cloud systems. Prioritize those that correlate to whether your client is having a bad day #SRE #o11y #reInvent
Where you instrument to measure SLIs is also crucial - generally the closer to clients, the better. Getting the data at every level is important for triage and analysis but the closer to your client you measure the better your understanding of their experience #reInvent
For those of you interested in more after @niallohiggins and @phrawzty ‘s talk at #reInvent about SRE I *highly* recommend reading @srebook @SREWorkbook and watching videos from @SREcon. There’s a ton of great content out there on the topic.
@phrawzty - “You have to start somewhere! Start early and simple. You can even start with a spreadsheet.” Opinion: not sure I agree with this approach. You’re adding manual to I’ll for something that’s simple to setup in most monitoring platforms... #reInvent
Gathering time-series metrics for each golden signal is great to get started and essential for observability. I’m a big fan of building a “yield” metric which evaluates request level health by response time and explicit errors. healthy reqs / total reqs = service health #reInvent
Next up: document your SLIs and SLOs and track your reasoning behind their thresholds. Your SLOs will change over time and you want to reference the rationale. #reInvent
Additionally - publishing SLIs and SLOs act as public promises. Promises (especially those driven by data) can be powerful tools for human-human, human-machine, and machine-machine systems. “Thinking in Promises” is a good resource on the power of the promise #reInvent
Good examples of promises in the #SRE world: service will respon in x-time and on-call promises they will engage within 15min. The latter is an interesting one - serious implications in alert design. Can’t respond in 15m if alert logic only runs every 15mins #reInvent
What does your incident response look like? What happens when promises are broken? Teams need to have a plan when your SLIs go red. #reInvent
On to Blameless Post Mortems. @phrawzty “arguable whether you can really be ‘blameless’ in post-mortems, we take a data-driven approach” Opinion: there’s been plenty of research done on this outside of the tech sector and it is possible #reInvent
@threadreaderapp unroll please
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Tom Leaman
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!