My Authors
Read all threads
Cloudflare had an outage today. That can't be good, right?! Au contraire... their response boosts my confidence in $NET. Yes their edge network's resiliency had a failure, but their tech tech and mgmt responded well. /1
A bad configuration change to a backbone pipe between Newark and Chicago caused huge issues on the internal part of their edge network. (Backbones are their giant pipes between data centers - their own private highway between major cities.) /2
The error made that route completely fail, so all the edge network traffic was re-routed through DC to Atlanta, the effect of which caused failures there, which then spiraled out. The net effect from the cascading error quickly caused ~50% traffic drop in their edge network. /3
Reading their response, they did a lot things right...

* First and foremost - they responded very quickly. They had an investigation and then implemented an incident response plan that fixed it immediately. The outage lasted all of 23 minutes. /4
* They were clear about what it was. It was human error in their patching processes on systems that were very not frequently touched.

* It affected their backbone, so impact was enormous, and they got in front of the problem immediately. /5
* It was not a hack. $NET is now a cybersecurity company over their edge network. A hack or breach would be much more devastating news. This was a mistake, which can be learned from.

* Besides the immediate fixes, they immediately developed a longer-term response plan. /6
* Besides the immediate failure of the patching process on their backbone, they got a view of a major weakness in the resiliency of their backbone routing (the cascading nature of this failure). This has already been addressed, and will be fixed in update on July 20. /7
Yes this was an outage, which never looks good at first blush. And certainly customers were impacted. Multiple errors happened in a way that worsened the net effect very quickly. /8
But Cloudflare did all the right things in the immediate aftermath, which I really like to see.

Luckily this is about as tame an impact as could be from a major backbone failure. They took from this many lessons and emerge a stronger edge network. /9
Epilog: I really like they were so extremely open about this incident (and others that have happened recently), with the CEO tweeting and their follow-up tech blog post. This allows investors like me to see into their process. Kudos to $NET for that. /fin
Ok one more: We got a really good glimpse at how talented their IT team working on their edge network. They got a glimpse at something that directly broke the resiliency of their system ... but they discovered and fixed it in 20 odd minutes. That is pretty amazing. /fin
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with hhhypergrowth

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!