I've gotten a few questions about Aurora Serverless v2 preview, so here's what I've learnt so far. Please feel free to chime in if I've missed anything important or got any of the facts wrong.
Alright, here goes the 🧵...
Q: does it replace the existing Aurora Serverless offering?
A: no, it lives side-by-side with the existing Aurora Serverless, which will still be available to you as "v1".
Q: Aurora Serverless v1 takes a few seconds to scale up, that's too much for our use case where we get a lot of spikes. Is that the same with v2?
A: no, v2 scales up in milliseconds, during preview the max ACU is only 32 though
Q: is the cold start for Aurora Serverless v2 still a few seconds?
A: yes, unfortunately...
Q: so if you want to avoid cold starts, what's the minimum ACU you have to run?
A: minimum ACU with v2 is 0.5
Q: does v2 still scale up in double increments, e.g. 4 ACU -> 8 ACU?
A: no, it scales up in increments of 0.5 ACUs, so it's a much tighter fit for your workload, so you'll waste less money on over-provisioned ACUs
Q: is there anything I can do with v2 that I can't do with v1?
A: yes, v2 supports all the Aurora features, including those that v1 is missing, such as global database, IAM auth and Lambda triggers
Q: wait, but it's twice as much per ACU!
A: yes, but v1 requires a lot of over-provisioning because it doubles ACU each time and takes 15 mins to scale down. v2 scales in 0.5 ACU increments and scales down in < 1 min. AND you get all the Aurora features!
Q: can you use "provisioned" and "serverless" instances in the same Aurora cluster?
A: yes you can! cool, right!?
Q: is data API supported on v2?
A: not in the preview, I'm guessing it'll be there in GA
Q: if using from Lambda, do I need to use RDS Proxy to manage the connections to the cluster? Data API kinda mitigated that for v1..
A: yes, you probably should, until data API is enabled on v2, otherwise, more connections = more ACUs, it can run you into trouble
did I miss anything? plz feel free to chimp in
@jeremy_daly has written a nice summary post on this too, with a nice experiment on the scaling behaviour of v2, so check it out if you haven't already jeremydaly.com/aurora-serverl…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
"For amazon.com we found the "above the fold" latency is what customers are the most sensitive to"
This is an interesting insight, that not all service latencies are equal and that improving the overall page latency might actually end up hurting the user experience if it negatively impacts the "above the fold" latency as a result. 💡
This is far more complex than the most complex CD pipeline I have ever had! Just cos it's complex, doesn't mean it's over-engineered though. Given the blast radius, I'm glad they do releases carefully and safely.
If you look closely, beyond all the alpha, beta, gamma environments, it's one-box in a region first then the rest of the region, I assume starting with the least risky regions first.
Great overview of permission management in AWS by @bjohnso5y (SEC308)
Lots of tools to secure your AWS environment (maybe that's why it's so hard to get right, lots of things to consider) but I love how it starts with "separate workloads using multiple accounts"
SCP for org-wide restrictions (e.g. Deny ec2:* 😉).
IAM perm boundary to stop ppl from creating permissions that exceed their own.
block S3 public access
These are the things that deny access to things (hence guardrails)
Use IAM principal and resource policies to grant perms
"You should be using roles so you can focus on temporary credentials" 👍
Shouldn't be using IAM users and groups anymore, go set up AWS SSO and throw away the password for the root user (and use the forgotten password mechanism if you need to recover access)
Great session by @MarcJBrooker earlier on building technology standards at Amazon scale, and some interesting tidbits about the secret sauce behind Lambda and how they make technology choices - e.g. in whether to use Rust for the stateful load balancer v2 for Lambda.
🧵
Nice shout out to some of the benefits of Rust - no GC (good for p99+ percentile latency), memory safety with its ownership system theburningmonk.com/2015/05/rust-m… great support for multi-threading (which still works with the ownership system)
And why not to use Rust.
The interesting Q is how to balance technical strengths vs weaknesses that are more organizational.
However, this might not mean much in practice for a lot of you because your Lambda bill is $5/month, so saving even 50% only buys you a cup of Starbucks coffee a month.
Given all the excitement over Lambda's per-ms billing change today, some of you might be thinking how much money you can save by shaving 10ms off your function.
Fight that temptation 🧘♂️until you can prove the ROI on doing the optimization.
Assuming $50 (which is VERY conservative) per dev per hour, it would have taken them 40 months to break even on just having the meeting, before writing a single line of code!
With the per-ms billing, you're automatically saving on your Lambda cost already, by NOT having your invocation time rounded up to the next 100ms.
Unless you're invoking a function at such high frequency, those micro-optimizations won't be worth the eng time you have to invest.