Great session by @MarcJBrooker earlier on building technology standards at Amazon scale, and some interesting tidbits about the secret sauce behind Lambda and how they make technology choices - e.g. in whether to use Rust for the stateful load balancer v2 for Lambda.
🧵
Nice shout out to some of the benefits of Rust - no GC (good for p99+ percentile latency), memory safety with its ownership system theburningmonk.com/2015/05/rust-m… great support for multi-threading (which still works with the ownership system)
And why not to use Rust.
The interesting Q is how to balance technical strengths vs weaknesses that are more organizational.
And it all boils down to this..
which is basically the same question that organizations all over the world have to answer when they consider adopting #serverless technologies like Lambda.
And I love Marc's answer - to innovate (ie. try new things) with guard rails that mitigate the risks.
As a consultant, I often find myself being one of those guard rails for organizations that want to adopt #Serverless
(nice plug, self hi-five! ✋)
Ha, I have heard @heitor_lessa mention "tenets" many times.
This line about avoiding baking language-specific choices into your contract and data is so important. It gives you an easier path to back out of that language choice if it turns out to be wrong.
Which, actually reminds me of what Bezos said in this article about the 2 types of decisions - one-way (aka, "no coming back from this decision!") and two-way doors.
"Baking these tensions into tenets and making it really obvious to everyone means we're upfront about the conversation that we're really having"
👍👍👍
Standards: top-down decision, comes with risk (e.g. limits upside - losing ideas that are better than what's baked into the standards)
"We use standards very sparingly, only in areas where we deeply understand the context and innovation has little upside"
"It all starts with the right incentives"
This, so much this👆
Why? because incentives drive outcomes.
And then there's ownership - because people making these decisions are on the hook for its long term success.
That's why the ivory tower architect is such a bad model - they make all the decisions but you're on the hook for it.
Yup, 100% agree here. A leader's job is to provide the necessary context so that others can make the best decisions they can. A leader's job is NOT to make all the decisions for others.
And then Marc describes his job as enabling end-to-end understanding of the business and technology and getting teams talking to each other so they can make the best decisions without those technical standards.
So did they end up using Rust?
Yes!
"When you try new things and they turn out to be successful, then you double down on those. And take the learnings of what's great and make sure you can multiply that"
And that's how many organizations has adopted #serverless successfully, starting with one success story.
And that's also been the story of the adoption of Rust at AWS. Both Firecracker and BottleRocket are built with Rust.
And great to see they're investing into the community itself, doubling down both internally and externally.
Love to see more details on how formal methods is applied here.
btw, AWS uses formal methods all over the place, I hear that TLA+ is widely used by its service teams. Someone told me that they used it to find a bug in DynamoDB during design that would have resulted in data loss in extremely rare cases.
"Building technology standards is a short-term thing that limits a company's creativity. Setting up incentives and helping people understand the decisions they're making and giving them full ownership of those decisions is the way I like to think about tech standards."
Well, that was great!
Make sure to catch this on a replay or when it becomes available on-demand.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Great overview of permission management in AWS by @bjohnso5y (SEC308)
Lots of tools to secure your AWS environment (maybe that's why it's so hard to get right, lots of things to consider) but I love how it starts with "separate workloads using multiple accounts"
SCP for org-wide restrictions (e.g. Deny ec2:* 😉).
IAM perm boundary to stop ppl from creating permissions that exceed their own.
block S3 public access
These are the things that deny access to things (hence guardrails)
Use IAM principal and resource policies to grant perms
"You should be using roles so you can focus on temporary credentials" 👍
Shouldn't be using IAM users and groups anymore, go set up AWS SSO and throw away the password for the root user (and use the forgotten password mechanism if you need to recover access)
However, this might not mean much in practice for a lot of you because your Lambda bill is $5/month, so saving even 50% only buys you a cup of Starbucks coffee a month.
Given all the excitement over Lambda's per-ms billing change today, some of you might be thinking how much money you can save by shaving 10ms off your function.
Fight that temptation 🧘♂️until you can prove the ROI on doing the optimization.
Assuming $50 (which is VERY conservative) per dev per hour, it would have taken them 40 months to break even on just having the meeting, before writing a single line of code!
With the per-ms billing, you're automatically saving on your Lambda cost already, by NOT having your invocation time rounded up to the next 100ms.
Unless you're invoking a function at such high frequency, those micro-optimizations won't be worth the eng time you have to invest.
re:Invent starts tomorrow, so let me round up the biggest #serverless related announcements from the last 2 weeks (I know, crazy!) and share a few thoughts on what they mean for you.
Mega 🧵If this gets 168 retweets then I'll add another batch!
1. Lambda released Logs API which works with the Lambda Extensions mechanism that was released in Oct. This lets you subscribe to Lambda logs in a Lambda extension and ship them elsewhere WITHOUT going through CloudWatch Logs
a. it lets you side-step CloudWatch Logs, which often costs more (sometimes 10x more) than Lambda invocations in production apps.
b. it's possible (although not really feasible right now) to ship logs in real-time
I sat down this weekend and had a look at my finances as I'm almost 5 months into my 2nd year as a full-time solo consultant, and noticed that my revenue streams have changed quite a bit over the last 3 years.
This is the result of a conscious effort to reduce my reliance on a few large clients, and also to offset seasonalities and other factors that can affect revenue and create a healthy mix of active and passive income streams.
Overall revenue has grown over time, and my largest client now accounts for less than 20% of my revenue. And I haven't seen too much seasonality to my work yet - summer was quieter because Europeans went on holiday, but it was still OK.
X: in light of last week's #AWS outage, should I make my app multi-region?
me: it depends.
X: on what?
me: how much did the outage cost you in lost sales, reputation cost, etc.? And how much are you willing to invest in improving your uptime in case of another region-wide outage?
X: erm... I'm not sure...
me: don't get me wrong, if you're a large enterprise, I expect you to be multi-region already! Hell, I expect you to be doing chaos engineering and proactively finding weaknesses in your architecture before disasters strike and force you into reacting.
me: but as we can see from these AWS outages, modern systems are complex, and even companies who like AWS who have invested heavily into resilience and are doing all the right things, 💩still happens