Discover and read the best of Twitter Threads about #srecon

Most recents (24)

I'm excited to announce several talks:
(1) I'll be talking about how to cut your observability bills with a bit of statistical magic at #SREcon EMEA in October!…
(2) and about production excellence at @blamelesshq's August summit!…
@blamelesshq (3) also, on Thursday this week, I'll be speaking on @honeycombio's journey with infrastructure-as-code and how @HashiCorp Terraform Enterprise has helped us safely refactor our AWS environment for cost efficiency & maintainability.…
(4) I'm also delighted to announce that in October I'll be speaking at Velocity Berlin, bringing the Production Excellence talk to European attendees, alongside a new @opentelemetry workshop I'm collaborating across company boundaries to create.…
Read 6 tweets
Final speaker of #SREcon: @deniseyu21 on why distributed systems are so hard.
@deniseyu21 She's an engineer on PCF, and also an avid artist and live-doodler of talks.

four things for today: a brief history lesson on distributed systems, CAP theorems, why networks partitions are hard/omnipresent, and how we can mitigate these risks. #SREcon
Once upon a time, everyone used a singular database that IT maintained.

But eventually IT became a business enabler we had to invest more in. Business analysts wanted to ask more complex questions, ML/NLP came along, and our requirements increased complexity... #SREcon
Read 18 tweets
Next up: Pragmatic Automation, by @mluebbe #SREcon
"Automate yourself out of a job"

What if you're asked to automate something you have no idea how to do?
Backstory: Google has cloud regions. The original four:

- us-central1
- us-east1
- europe-west1
- asia-east1

Then we decided to build more. How?


Read 29 tweets
#srecon @randyshoup: Learning from Learnings: Anatomy of Three Incidents
#srecon @randyshoup: Outage 1: Google App Engine Outage. App Engine was down globally for 8 hours. The playbook failed and triggered a cascading failure.
#srecon @randyshoup: Resolutions: increased traffic routing capacity, but more importantly, created a program to reduce probability of the same problem happening again.
Read 28 tweets
Next up: Learning from Learnings: Anatomy of Three Incidents by @randyshoup #SREcon
Review three incidents from different companies with common themes, and then discuss what we can change to improve post incident response.
Read 33 tweets
Last day of #SRECon! Track One is kicking off with "Optimizing for Learning" by Logan McDonald (@_loganmcdonald) 🎉
Fun fact: all the art for @_loganmcdonald’s #SRECon talk was done by @emilywithcurls
#SRECon @_loganmcdonald
"expert intuition in achievable"
this helped logan onboard to new systems
Read 16 tweets
Running excellent retrospectives: talking for humans
Goal of this tutorial:
- Learn how to run a retrospective.
- Create a safe space

Job running a retro:
- Facilitation
- Having a productive conversation
- Don't make bad jokes.

Facilitation: a.k.a. creating psychological safety, servant leadership.

Let's talk about language. English is blame-y. "you". Starting with "you" creates a line between participants.
Read 42 tweets
Next up is “An Introduction to GraphQL” with @icco from @Google #SRECon
What is GraphQL?
Now let’s make it more real
Read 11 tweets
Track 1 today @ #SREcon :

SRE Classroom - How to Design a Distributed System in 3 Hours

Ryan Thomas, JC van Winkel, Phillip Tischler, and Jennifer Mace, Google

Identify SLIs and SLOs
▪️Data freshness

Sample SLO: 99%ile of queries returns valid result within 100ms

#SREcon #SREclassroom
One way to scale is via microservices.

#SREcon #SREclassroom
Read 24 tweets
I am somehow surprised when I encounter folks with such strong beliefs in the "Humans-Are-Better-At/Machines-Are-Better-At" approach to designing software. At this point I shouldn't be surprised, but still am. 1/n
This "HABA-MABA" philosophy has been so ingrained that it almost flies under the radar as worthy of attention to some.

Who will argue with "make the computers do the easy/tedious stuff so humans can do the difficult/interesting stuff"? (apparently, I will) 2/n
This notion was first described in the late 1940s, and it was known as "Fitts' List" - original is shown here... 3/n
Read 12 tweets
Final talk of the day is my friend @ehashdn on monitoring k8s! #SREcon
She works as an SRE at @twosigma. She's also a k8s instrumentation SIG member.

Two Sigma runs one of the largest private cloud k8s deployments in the world.

And if you're interested in more on the k8s internals, attend @stillinbeta's workshop tomorrow! #SREcon
So, some information on observability, SLOs, collecting metrics, instrumenting k8s, and solving real problems in this talk.

Why are we trying to run our systems? We want to make users happy. We want to know when users might be getting unhappy before they tweet at us. #SREcon
Read 27 tweets
Next talk: @roguelynn on improving web perf through tracing! #SRECon
@roguelynn She's an SRE at Spotify working on backend infrastructure and tooling around processing pipelines.

She also does internal open source evangelism, getting Spotify employees to release more OSS. #SREcon
The tl;dr of every chapter in @srebook, according to @roguelynn, is "use distributed tracing". There are so many services you don't own that you need to know how your code fits into the grand scheme.

Perf debuggging, capacity planning, and diagnosing problems. #SREcon
Read 26 tweets
And now we're with: @JarenGlover on "Tackling Kafka, with a small team" #SRECon
For 3 years, "small team" is me.

Robinhood: democratising access to the financial markets. Saved $$$$ in customer commission fees already. #SRECon
This is not an authoritative talk; it's about providing context so that hopefully you avoid some of the pitfalls I ran into.

Organisational changes, technical evolution. #SRECon
Read 22 tweets
If you are at #SREcon19 and would like to see a real @honeycombio demo, come our booth around 3:25. ☺️🐝🌷

(May as well just bring by *all* your questions about observability, monitoring, distributed systems and "‘.’
(Oops, wrong tag -- thanks @lizthegrey!) Dear @SRECON kittens, if you want to see a demo of all the shit I'm always going on and on about wrt logging, data, debugging, collaboration etc etc.. come to our booth at 3:25. 📈📊🦠
ooohh and if you have a copy of mine and @LaineVCampbell's book, I have stickers to fix the cover: #srecon
Read 3 tweets
And now: Testing in Production at Scale, by Amit Gud. #srecon
Let's start talking about Alice.

She owns service A at a company. It get requests from services B and C. B and C are upstream service.

Service A talks to D,E,F, which are downstream services. #SRECon
So, replace A with A'.

How do we test that? We need to check that B and C still work, and that we don't break D,E,F. Also, you need D,E,F to test A'.

So, we can set up an entire parallel stack, but that ends up very complicated. #SRECon
Read 21 tweets
I'll be livetweeting @mipsytipsy's talk after lunch on monitoring vs observability and more. #SREcon
@mipsytipsy "I don't get my speaker's notes because I arrived a bit late, but I do get a unicorn instead, and I'm happy with that." -- @mipsytipsy #SREcon
"In the beginning, people wrote software for users, and we were motivated to fix it, because they would complain to us." #SREcon
Read 39 tweets
Next up in Track 1 at #SREcon is @damonedwards on brownfield SRE in enterprises!
@damonedwards "You may think, 'I don't work in an enterprise', but you will eventually when your company becomes successful enough." --@damonedwards

You'll have multiple business lines, acquisitions, generations of tech debt... #SREcon
We see a lot of companies saying they're doing every buzzword. But when you look behind the scenes and talk to the ops folks, they're squeezed between the "DevOps/digital transformation" groups pushing to move faster, vs. locking things down from audit/security. #SREcon
Read 30 tweets
Closing out #QConNYC in the chaos track is @otterbook on "turning it off and on again". He's promised to make my life "interesting" as a livetweeter so let's see how this goes.
@otterbook .@otterbook also has the best pre-talk dramatic/heroic amp-up music, which I'm digging. #QConNYC
This is going to be a very high audience participation talk, and is an experimental talk, says @otterbook. #QConNYC
Read 48 tweets
Now: @jaqx0r on A theory and practice of alerting with service level objectives.

@jaqx0r this talk is based on Jamie's experiences being on a rotation that eventually burnt him out.

Showing a photo of his contribution to @alicegoldfuss oncall photo collection, where he looked happy, but didn't know he wasn't yet.

@jaqx0r context is the team was lowest rating in Google's SRE reviews for 2 6-month periods in a row. Not a great place to be.

So they got instructed to focus on fixing that and reducing oncall load.

Read 28 tweets
After lunch, I'll be livetweeting an #SREcon talk on productionizing machine learning services by Google SREs @salim and @villaviejac, in their professional capacities... [ed: and I saw some 🔥🔥🔥 slides about ML privacy & ethics in their dress rehearsal so this gonna be good]
They are SREs but not necessarily ML scientists/researchers.

Data about what can go wrong has been gathered from 40+ interviews with varying teams. #SREcon
Myths: "machine learning is a black box." "you rarely have to rollback." "ML based monitoring is like other alerting." All nope. #SREcon
Read 44 tweets
Operational Excellence in April Fool's pranks, by @yesthattom #srecon
2015, April 1, 1023 UTC: stackoverflow enabled an easter egg.
But we rolled back, and it was fine. Let's talk about reliable easter egg/April fool's features.
Read 15 tweets
Next up, following neatly from incident response: @wcgallego on "Architecting a Technical Postmortem"
I'm a Systems Engineer at Etsy. Run many postmortems.
- Database fall over
- Bad deploys
- The time everyone got sick
- Coffeemakers overflowed

Everything had something to learn. #srecon
- who has never done a postmortem before?
- why do we have postmortems? (think about this through the talk)

Ask these before every meeting. They are our story times. #srecon
Read 40 tweets
And lunch is done. 3 more talks before the closing plenaries.

Kicking off track 2 this afternoon, @jpaulreed on "Whispers in Chaos: Searching for Weak Signals in Incidents" #srecon

How do you know an incidents are going on?

[ed: I get paged!]
Read 37 tweets
Up Now: @lizthegrey and @adam7mck on Resolving Outages Faster with Better Debugging Strategies #SREcon
I work with Liz, Adam is a top guy. There's some exciting stuff in this talk. Listen up! #SREcon
Liz - 8+ years at Google, now CRE. Excited about sharing things we've learnt with the world.

Adam - 18mths at Google, SRE/DevOps type-person for 8-10 years. Excited to share things that blew his socks off when joining Google.

Read 50 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!