The no. 1 question I get about #serverless is around testing - how should I test these cloud-hosted functions? Should I use local simulators? How do I run these in my CI/CD pipeline?
Here are my thoughts on this 🧵
There's value in testing YOUR code locally, but don't bother with simulating AWS locally, too much effort to set up and too brittle to maintain. Seen many teams spend weeks trying to get localstack running and then waste even more time whenever it breaks in mysterious ways 😠
Much better to use temporary environments (e.g. for each feature, or even each commit). Remember, with serverless components you only pay for what you use, so these environments are essentially free 🤘
When you start a new feature, create a temp environment, e.g. "sls deploy -s my-feature" with @goserverless and then write tests that execute your function code against the real AWS services, DynamoDB tables and whatnot.
@goserverless These "integration tests" (or "sociable tests" as Martin Fowler calls them) test your code against real AWS services and catch integration problems as well as biz logic errors quickly and give you fast feedback for code changes.
@goserverless Sure, you have to deploy any infrastructure changes, like adding new DynamoDB tables, etc. before you can run these tests. But you don't have to redeploy the whole stack every time you make a code change.
@goserverless I generally think "unit tests" (what Martin Fowler calls "solitary tests") don't have a great ROI and I only write these if I have genuinely complex biz logic. Most of my functions are IO heavy and do minimal data transformation and can be covered by integration tests.
@goserverless When I am dealing with complex biz logic, I encapsulate them into modules and write unit tests for them and make sure these tests don't deal with any external dependencies. They work exclusively with domain objects.
@goserverless Once I have good confidence that my code works, I write e2e tests to check the whole system works (without the frontend) by testing the system from its external-facing interface, which can be a REST API, or an EventBridge bus, or a Kinesis data stream, or whatever.
@goserverless These e2e tests would catch problems outside of my code - configurations, IAM permissions, etc. And a lot of the time, I write tests in such a way that I can reuse the same test case for both integration and e2e tests so they're not as labour intensive.
@goserverless If I'm building APIs then these e2e tests would call the deployed API and check the response. For data pipelines, they'd push events into an EventBridge bus and wait for the expected side-effect (e.g. data written to a DynamoDB table).
@goserverless Again, using temporary environments really helps here. You don't have to worry about pushing events to shared event buses that trigger lots of other stuff that you don't intend to.
@goserverless If the side-effect you're looking for is "an event is published to Kinesis/EventBridge/SNS" then it can be tricky to detect these. Check out this old post of mine on a few ways to do this.
@goserverless As part of the CI/CD pipeline, create a temporary environment and run your integration and e2e tests against it. Then delete the environment after the tests. No need to clean up test data from shared environments. If the tests passed, then you deploy to the real environment.
@goserverless This approach is broadly in line with and inspired by the testing honeycomb, and I have been very happy with it. It gives me the feedback speed for small code changes and the confidence I need to operate complex applications with lots of moving parts (and therefore configures!)
@goserverless Of course, testing doesn't stop there, there's the whole "testing in production" which includes observability, canary testing, smoke testing, load testing, chaos experiments and much more. You don't need to do all of them, but having good observability is a must.
@goserverless My go-to solution is @Lumigo it takes a few mins to set up, no need for manual instrumentation and gives me everything I need to troubleshoot issues I haven't seen before. And I love the built-in dashboard, it's designed by serverless users for serverless users.
@goserverless@Lumigo If you want to see these in action and learn how to apply them in practice then check out my upcoming workshop. We're gonna cover a lot more than testing and have something for both beginners and experienced serverless devs.
If you want to learn about the internal details of Lambda, then check out @MarcJBrooker's session "Deep dive into AWS Lambda security: Function isolation"
"For amazon.com we found the "above the fold" latency is what customers are the most sensitive to"
This is an interesting insight, that not all service latencies are equal and that improving the overall page latency might actually end up hurting the user experience if it negatively impacts the "above the fold" latency as a result. 💡
This is far more complex than the most complex CD pipeline I have ever had! Just cos it's complex, doesn't mean it's over-engineered though. Given the blast radius, I'm glad they do releases carefully and safely.
If you look closely, beyond all the alpha, beta, gamma environments, it's one-box in a region first then the rest of the region, I assume starting with the least risky regions first.
I've gotten a few questions about Aurora Serverless v2 preview, so here's what I've learnt so far. Please feel free to chime in if I've missed anything important or got any of the facts wrong.
Alright, here goes the 🧵...
Q: does it replace the existing Aurora Serverless offering?
A: no, it lives side-by-side with the existing Aurora Serverless, which will still be available to you as "v1".
Q: Aurora Serverless v1 takes a few seconds to scale up, that's too much for our use case where we get a lot of spikes. Is that the same with v2?
A: no, v2 scales up in milliseconds, during preview the max ACU is only 32 though
Great overview of permission management in AWS by @bjohnso5y (SEC308)
Lots of tools to secure your AWS environment (maybe that's why it's so hard to get right, lots of things to consider) but I love how it starts with "separate workloads using multiple accounts"
SCP for org-wide restrictions (e.g. Deny ec2:* 😉).
IAM perm boundary to stop ppl from creating permissions that exceed their own.
block S3 public access
These are the things that deny access to things (hence guardrails)
Use IAM principal and resource policies to grant perms
"You should be using roles so you can focus on temporary credentials" 👍
Shouldn't be using IAM users and groups anymore, go set up AWS SSO and throw away the password for the root user (and use the forgotten password mechanism if you need to recover access)
Great session by @MarcJBrooker earlier on building technology standards at Amazon scale, and some interesting tidbits about the secret sauce behind Lambda and how they make technology choices - e.g. in whether to use Rust for the stateful load balancer v2 for Lambda.
🧵
Nice shout out to some of the benefits of Rust - no GC (good for p99+ percentile latency), memory safety with its ownership system theburningmonk.com/2015/05/rust-m… great support for multi-threading (which still works with the ownership system)
And why not to use Rust.
The interesting Q is how to balance technical strengths vs weaknesses that are more organizational.