So a question I posed in slack.lastweekinaws.com led to an unfortunate realization on my part:
@awscloud is too big, and has too many customers for the overall good of society.
"Well were things more reliable before @awscloud?" No! Good lord no! The difference is that I could have a bad day and take down a hospital. AWS has a bad day and takes down all the hospitals.
It's the simultaneous outage of everything that's the problem.
The worst part is that I don't even have the slightest clue how to fix it. You can plan and plan and plan around this. You can build out multi-region or multi-cloud until the cows come home.
And then one of your third parties did none of this and you're just as down.
A multi-day full outage of us-east-1 will have an observable effect on the world economy. That is not an exaggeration.
I don't know how to fix any of this. I just know that we should be talking about it.
And also in slack.lastweekinaws.com we confirmed the root cause of today's outage: the Managed NAT Gateways in us-east-1 overflowed and jammed ujp with money.
Yeah, this doesn't work. I assure you, no federal regulation or proposed penalty is going to make @awscloud say "oh, outages are bad, we should be more careful." They already say that! Constantly!
Today's event wasn't from a lack of care or diligence.
Here's an unexpected thread from me; I never expected to write one quite like it...
A while back I had @AjYawn on the podcast, where he talked about @bytechek with me. It was a great episode, and I came away impressed by what AJ was doing. buff.ly/3pYnCGO
When I saw this tweet from him, I reached out to AJ with a "sounds like someone raised a funding round, and congratulations are in order." This isn't my first rodeo when it comes to reading the tea leaves.
I was excited enough about what @bytechek does (helps companies get to SOC2 compliance quickly, because I am a nerd as well as a former SOC2 control owner) and about @AjYawn as a person that I asked whether I could invest as well.
Because this is incredibly dense and technical, let me try to simplify it. I'm sure I will be condescendingly corrected if I get this wrong...
"We made a change internally that caused a bunch of internal things to become extremely chatty, like AWS employees defending the company if someone says something even slightly unflattering on Twitter."
Welcome to my first-ever livetweet of an @aselipsky#reinvent keynote as a part of my requinnvent.com coverage. It's 8:30, sarcastically loud, I haven't slept, and it's time to see what our @awscloud friends have worked on all year long.
A reminder: Snarking about companies is usually okay; snarking about people (presenters, etc) is not. Punch up, not down. Be kind.
The failure mode of "clever" is "asshole."
We begin with a jarring transition from "loud rock" to "easy listening combined with a Windows screensaver theme." Clearly the #reinvent graphics refresh was delayed due to... I dunno, not having a bad enough team name or something.
Staff: “Sorry, employees (orange lanyards) aren’t allowed in keynotes.”
Me: “It’s red. The lighting is super odd here.”
Once again I snuck past the “no Corey allowed in keynotes” rule!
And now I will livetweet the @awscloud partner keynote in this thread. #reInvent
@awscloud And now @dougyeum takes the stage and thanks us all for being here in person even though he won't be. "I'm moving to a new role within Amazon. Later, hosers."