What happened ?
It appears Facebook has inadvertently cut itself off from the rest of the Internet. More accurately, it mistakenly removed every "road sign" worldwide pointing at its network.
The "why" will be a very interesting post mortem to read. How hard is it to fix ?
2/n
It depends on 2 factors : 1. Whether there's a *workable* backchannel for remote engineers to access not only the systems themselves, but crucially the comms tools and documentation they need. 2. How well rehearsed of a disaster recovery plan they had for this kind of issue.
3/n
You can reasonably assume they had some sort of emergency out of band remote access set up.
But do they still have access to all the fancy internal comms + incident management tools + documentation right now ?
This is less certain and could slow down remediation hugely.
4/n
How often did they run drills for a "network down" situation ? Did they have contact numbers in their phones ? Documentation printouts with them ?
Honestly, an outage like this one is so far fetched that it's unlikely they would have had 100% of bases covered.
5/n
WFH, of course, would have made this worse.
When the tools you rely on for your daily comms with colleagues are unavailable, it adds an extra burden to the already sky high cognitive load of troubleshooting a thorny and high stakes technical issue.
6/n
Needless to say this is an extremely stressful event for the engineers involved, but Site Reliability / Operations engineers are in it for the adrenaline. They will no doubt remember this day for the rest of their careers. Sparing a thought for my former colleagues !
7/7
Ok, if the below is confirmed and having also spoken to former colleagues, I'm clearly more pessimistic as to their preparedness for the issues I explained above 😓
[THREAD] Aujourd'hui @olivierveran et @EmmanuelMacron feront un choix historique sur les mesures d'isolement #stade3 en France. Continuer la "réaction proportionnée" ou frapper un grand coup ? Une simulation nous éclaire l'impact de ces choix sur l'épidémie #covid19france : 1/19
Simulation US: sans mesures de distanciation l'explosion de l'épidémie est clairement exponentielle. Elle est fortement réduite avec 25% de contacts en - mais n'est maitrisée qu'avec 75%, autrement dit des mesures drastiques comme celles qu'a prises (trop tard) l'Italie. 2/19
Tout l'enjeu est de maintenir le niveau d'infections actives à un instant T sous la capacité du système de santé à les traiter : c'est le fameux #FlattenTheCurve qu'@olivierveran a vulgarisé devant @Bruce_Toussaint lundi sur @BFMTV 3/19