, 17 tweets, 9 min read Read on Twitter
The next talk: @rmedranollamas, Google SRE, talking about Service Levels and Error Budgets. #DevOpsDays
First info: there’s a Google SRE book from O’Reilly now, available at g.co/srebook. #DevOpsDays Slide: info about the book
How good is your service? A Service Level Indicator is a quantitative measure of how good one attribute of the service is. This should be a property that users care about! #DevOpsDays @rmedranollamas
Example: able to log in within 300ms? Gather data, set up exporting it to your monitoring, and then pick a few metrics to carefully define indicators for. #DevOpsDays @rmedranollamas Slide: process of defining indicators
Understand your metrics: averaging monitoring datapoints over time on a monitoring graph can hide big peaks! So: when you get an alert but your SLI looks okay, drill down. #devopsdays @rmedranollamas
Use histograms and percentiles in preference to averages. If you focus on having a good median performance, you’ll miss terrible values at the 95th percentile. #devopsdays @rmedranollamas
How good *should* a service be? That’s a product question and always requires tradeoffs. The cost of marginal improvements grows exponentially. Ask product and dev management; they know the necessary info on what’s really required/worthwhile. #DevOpsDays @rmedranollamas
Your targets should be ambitious but achievable. “Latency always under 250ms” => you’re actually requiring 100% availability! Better: “under 250ms 99.9% of the time.” #devopsdays @rmedranollamas
There is one service (didn’t catch the name) that’s so reliable that they take it down 15mins a month—just to be sure no one builds on it expecting 100% availability, in case they ever have an unexpected outage! #DevOpsDays @rmedranollamas
Service Level Objective (SLO): what you publish to users, e.g. five nines uptime (99.999%). Aim to be a bit better than your SLO, but not too much better: users will come to rely on it. Oh, and do chaos engineering, but when it’s convenient, not at random! #devopsdays
An SLO implies an acceptable level of errors... an error budget. You can use this budget to accommodate rolling out new versions, unpredictable failures, etc. Watch the budget over the last 30 days and plan updates when you can afford some downtime. #DevOpsDays Slide on error budget concept, same as in tweet
Use alerts to keep track of your error budget exhaustion, e.g. if you’ve used 1% of the month’s budget in an hour. And keep a ‘rainy day fund’ for when you need to deploy a fix in an emergency. #DevOpsDays @rmedranollamas
SREs should work with devs to improve reliability *and* velocity. Build consistent systems for deployment, use load balancing and rollouts to reduce how much damage a bad update can do, create safe ways to rollback updates. #devopsdays @rmedranollamas
SREs also need the authority to stop releases if the error budget can’t accommodate them! #devopsdays @rmedranollamas
Finally, @rmedranollamas recommends doing the exercise to define an SLI even if you don’t plan to use it in production. :) #DevOpsDays
Q: do you also have SLOs about data quality, e.g. the same data returned in the USA, Asia and Europe?

A: yes—not Ramón’s own team, as authentication is quite binary, but other Google SRE teams do. #DevOpsDays
@threadreaderapp Please unroll.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Rae Knowler
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!