The twist on an the ongoing Facebook outage is that infra/oncall teams at FB likely use their own, in-house chat service on top of fb.com or FB workspaces to communicate while resolving outages.
Now this is also down. Good thing Whatsapp is still up. Oh wait…
When Uber had its own, self-hosted chat service, this was a major point of discussion. What happens if they chat service goes down while coordinating an outage? Or what if it has an outage?
Most companies now have a different problem. What if Slack/PagerDuty goes down that time?
The good thing for lost companies using third parties that even if the third party goes down, you have your internal systems to e.g. check who is oncall, and their phone number.
That also went down for Facebook. This is a horrible day to be oncall there, and a good one to be OO.
Just in: coordination is happening via… IRC!
Facebook planned ahead for the unlikely case of their infra going down. Which it kind of did.
Wishing ppl oncall best of luck getting this resolved. It’s not the most fun to be in their seat, but it’s a story to tell for years.
Fun fact: if this type of global outage happened at Amazon, a top exec (e.g. VP, or maybe above) would likely be on that coordination call AFAIK. They'd drop any meeting they'd be in and jump in to see what's happening - it's the culture there.
Less common at other big tech.
After an Amazon exec joined Uber, on their first week there was this L5, low impact outage (the highest sev, meaning rides were impacted). It was a pretty "standard" outage then for me.
This person N levels up the chain shows up on the incident Zoom with engineers. We all go:
Another twist: Facebook's buildings use the Facebook domain for badge authentication:
WOW. Just WOW. You can't fall back to IRC there...
The outage has been mitigated, and I'm hearing a special shoutout goes to:
Google Docs
For helping the oncall teams coordinate and use it as chat, while everything .fb.com was down, including Workspaces.
Given how epic this story has been by itself (and all the learnings about how BGP can go wrong), I would have had no problem believing the angle grinder part that surfaced (but was not true).
Perhaps in a parallel universe there was that as well.
The third founder with a remote team sharing the same story:
An engineer they hired did great on the interviews but worked unexplainably slow day to day. When pairing: they worked fast. They stopped pairing: slowed down.
Turns out, the person had a second job in all cases.
With (experienced) software engineers being in demand, plus full-remote positions, it is *so* tempting to double one's income.
And some people are pulling it off. Especially when they have zero attachment to the new company.
Remote has upsides, but this is a real downside.
If you're a founder/manager and you think "surely this would never happen in my team", think again.
Places this happened included a tightly knit team of 6 devs who all knew each other for years. A startup paying top of the market and generous equity.
"Why are you looking to leave your current employer?" is a question I've asked hundreds of times when hiring at Uber. There are few things I've not heard.
Save for how Revolut treated engineers.
Here's how to build a culture where the hardest-working people voluntarily leave:
This was years back. I was doing the hiring manager interview. I ask the question and expect one of the common reasons - challenges, money, boredom etc.
"I'm pissed off, that's why. I put my heart and soul into this company, worked 80-hour weeks, and get slapped in the face."
Okay, this was new. "Can you give more details on what happened?"
"I didn't get a bonus."
"Well " - I think to myself - "that's not much to be pissed off about."
"So why did that tick you off?" - I ask. The person continues to explain this is not your usual bonus situation:
There is so little information written about equity for tech employees, and even less by software engineers who benefitted from it.
Uber and Square engineer @mcdickenson wrote the book Equity Compensation for Tech Employees, which fills this gap.
Here's the table of contents:
I reviewed the book before it was out, and it's a must-read if you are either based in the US or are/will be working for US-based companies issuing equity. Or if you want to know more on this important topic.
I dug deeper and the Amplitude founder is a legend.
They were the *first* company to introduce a 10-year post-termination exercise window. Their lawyers said they cannot be done. They did it, and open-sourced it, other companies following.
A 10-year post-termination exercise window means that an employee who joined in 2012, left 4 years later *still* made $10M, without having to exercise any options until now.
While there are many posts criticizing how difficult hiring processes are, especially for entry-level roles, it's only in private you hear interviewers and hiring managers talk about the other side.
Here's a short thread on a few observations that also explains some criticisms:
1. Entry-level applicants are on a very wide spread skillset-wise. They range from people who can only follow a tutorial to those who are hands-on with production-ready code. It skews towards the former though.
This is one reason there are so many automated closing assessments.
2. Resumes don't tell you much about entry-level people. Within the same bootcamp grads, there will be some strong candidates, and plenty of week ones. Same with colleges. It's impossible to predict how well a person will do on a coding exercise.