Great question from my current cohort of students, paraphrased:
"Should you always use Step Functions to chain together a few Lambda functions? Are there patterns to simplify this? How about using SQS between the functions?"
Here are my thoughts 🧵
Firstly, on the broader topic of orchestration vs choreography, I've written my thoughts before. TL;DR is that I prefer orchestration for intra-service workflows, and use events for inter-service communication.
If you have a dead-simple workflow then Step Functions can be overkill, especially if you're new to it.
Simplest approach: 1. implement workflow inside a single function 2. use Lambda destinations to chain several functions together
Some considerations to think about...
❓ How is the workflow started?
Lambda destinations don't work with synchronous invocations. This is actually one of my #awswishlist items, to be able to use destinations with sync Lambda invocations.
❓Does the caller need the end result of the workflow?
If so, then you're better off implementing the workflow in a function or using Step Functions' synchronous workflows. OR, you'd need something like the decoupled invocation pattern
❓ Do you need to be able to restart a failed workflow from where it failed?
Doing this usually requires you to implement the logic to skip/fast forward to the previously failed state yourself. But it's trivial when you chain functions via destinations or EB/SQS.
While Step Functions add complexity (for really simple workflows), it also adds value.
There's the obvious value in having visualization in both the design view, but also the execution view (great for customer support teams).
It also simplifies error handling when you move the error handling and retry logic to the state machine definition, it removes the tension between having a reasonable timeout value (for Lambda) vs allowing extra times for retries and exponential backoffs.
When you implement workflows in a Lambda function, it's tricky to work out what timeout value to use. Each step can fail, and each step should have some retry with exponential backoff, so execution time can have a big range between the happy path and the worst-case scenario.
What's wrong with just using a high timeout value?
Usually nothing.
But, if you make a mistake, or get attacked (e.g. regex DOS), then the high timeout value amplifies the cost of those Lambda invocations. Have seen a few clients get stung by this...
Finally, if you were to chain functions via X, should you use SQS or EventBridge?
It depends. Do you need batching, ordered delivery, archiving or replay of these messages? Those would heavily influence your decision here.
Also, there's a difference between events ("something happened") vs tasks ("do this thing").
My rule of thumb is to use SQS for tasks, and EventBridge for events. But for convenience's sake, I sometimes masquerade tasks as events to avoid adding lots of SQS queues...🙈
And that's a wrap!
If you enjoyed this thread:
1. Follow me @theburningmonk for more of these 2. RT the tweet below to share this thread with your audience
As the Lambda service becomes more mature and fully featured, there's also more confusion around when to use these new features. Function URL came up in a conversation today, so let's talk about that!
🧵
Let me start by saying that I'm quite excited by its release and I think it's great that it's now an option.
But I also think it shouldn't be the default for most of the people who are using Lambda today.
The no. 1 question I get about #serverless is around testing - how should I test these cloud-hosted functions? Should I use local simulators? How do I run these in my CI/CD pipeline?
Here are my thoughts on this 🧵
There's value in testing YOUR code locally, but don't bother with simulating AWS locally, too much effort to set up and too brittle to maintain. Seen many teams spend weeks trying to get localstack running and then waste even more time whenever it breaks in mysterious ways 😠
Much better to use temporary environments (e.g. for each feature, or even each commit). Remember, with serverless components you only pay for what you use, so these environments are essentially free 🤘
If you want to learn about the internal details of Lambda, then check out @MarcJBrooker's session "Deep dive into AWS Lambda security: Function isolation"
"For amazon.com we found the "above the fold" latency is what customers are the most sensitive to"
This is an interesting insight, that not all service latencies are equal and that improving the overall page latency might actually end up hurting the user experience if it negatively impacts the "above the fold" latency as a result. 💡
This is far more complex than the most complex CD pipeline I have ever had! Just cos it's complex, doesn't mean it's over-engineered though. Given the blast radius, I'm glad they do releases carefully and safely.
If you look closely, beyond all the alpha, beta, gamma environments, it's one-box in a region first then the rest of the region, I assume starting with the least risky regions first.
I've gotten a few questions about Aurora Serverless v2 preview, so here's what I've learnt so far. Please feel free to chime in if I've missed anything important or got any of the facts wrong.
Alright, here goes the 🧵...
Q: does it replace the existing Aurora Serverless offering?
A: no, it lives side-by-side with the existing Aurora Serverless, which will still be available to you as "v1".
Q: Aurora Serverless v1 takes a few seconds to scale up, that's too much for our use case where we get a lot of spikes. Is that the same with v2?
A: no, v2 scales up in milliseconds, during preview the max ACU is only 32 though