Discover and read the best of Twitter Threads about #SREcon

Most recents (11)

This afternoon at #srecon, Adam Mckaig and Tahia Khan from @datadoghq about the evolution of their metrics backend
The high-level architecture looks very familiar to me. The slightly more detailed less so — many parts!
For scale, break up incoming data, put into kafka.
hash(customer_id) -> partition_id
… but then one kafka topic gets overloaded, so…
hash(customer_id) -> topic_id, partition_id
to send to topics in different clusters.
Read 6 tweets
Today at #srecon, @allspaw and @ri_cook give deep insight on real tools, incident timelines, and clumsy automation.
But not in person. 😭
Great tools (as opposed to machines) are near to hand and conform to the person who wields them. Like a hammer, or `top`. Yeah.
They are opinionated, but not prescriptive.

(machines do what they do, and you conform to them)
In software, tools like `top` help us see what’s going on in the digital space.
@ri_cook et al see our work taking place on two sides of a divide. There’s meatspace (where we are) and digital space (where the software runs). You can’t reach out and feel digital stuffs.
Read 13 tweets
What can we learn from ALL the incidents?
@courtneynash at @verica_io compiles reports from lots of companies into the VOID: Verica Open Incident Database. #SREcon
“Software runs the world, and you run that software.”
#SREcon @courtneynash
While every incident and every company is different, the distributions have the same shape. They are “positively skewed:” more short incidents than long ones.
Read 7 tweets
As SREs, we need to build the narrative of our work’s value.
the future of all software is at stake.

@caseyrosenthal builds a beautifully illustrated narrative at #SREcon ImageImage
Like Security, SRE’s value hides in all the incidents that don’t happen.
So its easy to ignore. But people and legislative bodies value it.

The Success in SRE is Silent Image
and if our success remains silent, our profession (and software development in general) will go the way of security: regulation. Image
Read 8 tweets
#srecon @randyshoup: Learning from Learnings: Anatomy of Three Incidents
#srecon @randyshoup: Outage 1: Google App Engine Outage. App Engine was down globally for 8 hours. The playbook failed and triggered a cascading failure.
#srecon @randyshoup: Resolutions: increased traffic routing capacity, but more importantly, created a program to reduce probability of the same problem happening again.
Read 28 tweets
Last day of #SRECon! Track One is kicking off with "Optimizing for Learning" by Logan McDonald (@_loganmcdonald) 🎉
Fun fact: all the art for @_loganmcdonald’s #SRECon talk was done by @emilywithcurls
#SRECon @_loganmcdonald
"expert intuition in achievable"
this helped logan onboard to new systems
Read 16 tweets
Next up is “An Introduction to GraphQL” with @icco from @Google #SRECon
What is GraphQL? Image
Now let’s make it more real Image
Read 11 tweets
I am somehow surprised when I encounter folks with such strong beliefs in the "Humans-Are-Better-At/Machines-Are-Better-At" approach to designing software. At this point I shouldn't be surprised, but still am. 1/n
This "HABA-MABA" philosophy has been so ingrained that it almost flies under the radar as worthy of attention to some.

Who will argue with "make the computers do the easy/tedious stuff so humans can do the difficult/interesting stuff"? (apparently, I will) 2/n
This notion was first described in the late 1940s, and it was known as "Fitts' List" - original is shown here... 3/n
Read 12 tweets
If you are at #SREcon19 and would like to see a real @honeycombio demo, come our booth around 3:25. ☺️🐝🌷

(May as well just bring by *all* your questions about observability, monitoring, distributed systems and "‘.’
(Oops, wrong tag -- thanks @lizthegrey!) Dear @SRECON kittens, if you want to see a demo of all the shit I'm always going on and on about wrt logging, data, debugging, collaboration etc etc.. come to our booth at 3:25. 📈📊🦠
ooohh and if you have a copy of mine and @LaineVCampbell's book, I have stickers to fix the cover: #srecon
Read 3 tweets
Now: @jaqx0r on A theory and practice of alerting with service level objectives.

#SREcon
@jaqx0r this talk is based on Jamie's experiences being on a rotation that eventually burnt him out.

Showing a photo of his contribution to @alicegoldfuss oncall photo collection, where he looked happy, but didn't know he wasn't yet.

#SREcon
@jaqx0r context is the team was lowest rating in Google's SRE reviews for 2 6-month periods in a row. Not a great place to be.

So they got instructed to focus on fixing that and reducing oncall load.

#SREcon
Read 28 tweets
Up Now: @lizthegrey and @adam7mck on Resolving Outages Faster with Better Debugging Strategies #SREcon
I work with Liz, Adam is a top guy. There's some exciting stuff in this talk. Listen up! #SREcon
Liz - 8+ years at Google, now CRE. Excited about sharing things we've learnt with the world.

Adam - 18mths at Google, SRE/DevOps type-person for 8-10 years. Excited to share things that blew his socks off when joining Google.

#SREcon
Read 50 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!