X: in light of last week's #AWS outage, should I make my app multi-region?
me: it depends.
X: on what?
me: how much did the outage cost you in lost sales, reputation cost, etc.? And how much are you willing to invest in improving your uptime in case of another region-wide outage?
X: erm... I'm not sure...
me: don't get me wrong, if you're a large enterprise, I expect you to be multi-region already! Hell, I expect you to be doing chaos engineering and proactively finding weaknesses in your architecture before disasters strike and force you into reacting.
me: but as we can see from these AWS outages, modern systems are complex, and even companies who like AWS who have invested heavily into resilience and are doing all the right things, 💩still happens
me: resilience requires continuous investment just to stay the same because your system is also evolving and becoming more capable and more complex at the same time, so your multi-region strategies need to evolve with the architecture
me: if your stack is API Gateway/AppSync + Lambda + DynamoDB then going multi-region active-active is somewhat straight forward - set up DynamoDB global table and then set up fallback on Route53. This excellent post by @adhorn shows you how adhorn.medium.com/multi-region-s…
me: however, most real-world apps are much more than that, and you'd also need to extend that multi-region strategy to your data processing pipelines, and just about everything else you do too, and that has a cost (eng time, maintenance, aws bill, etc.) and impacts your velocity
X: so.. you're saying I shouldn't do it?
me: I'm saying you shouldn't make a knee-jerk reaction and instead make an informed choice based on the cost of such outage, its likelihood of happening, and the cost for moving YOUR architecture to a multi-region active-active setup.
X: what about a compromise and multi-region, active-passive instead?
me: don't bother - it has all the complexities of active-active and the HUGE risk that the passive region is never used day-to-day and might not work the one time you need it
X: ok, so what should I do then?
me: do nothing if you can afford to wait out the outage, add a maintenance page so customers are not left hanging; go multi-region active-active if you must (because another outage like this would be too costly)
me: as I've said time and again, keep it simple, until you can't.
re:Invent starts tomorrow, so let me round up the biggest #serverless related announcements from the last 2 weeks (I know, crazy!) and share a few thoughts on what they mean for you.
Mega 🧵If this gets 168 retweets then I'll add another batch!
1. Lambda released Logs API which works with the Lambda Extensions mechanism that was released in Oct. This lets you subscribe to Lambda logs in a Lambda extension and ship them elsewhere WITHOUT going through CloudWatch Logs
a. it lets you side-step CloudWatch Logs, which often costs more (sometimes 10x more) than Lambda invocations in production apps.
b. it's possible (although not really feasible right now) to ship logs in real-time
I sat down this weekend and had a look at my finances as I'm almost 5 months into my 2nd year as a full-time solo consultant, and noticed that my revenue streams have changed quite a bit over the last 3 years.
This is the result of a conscious effort to reduce my reliance on a few large clients, and also to offset seasonalities and other factors that can affect revenue and create a healthy mix of active and passive income streams.
Overall revenue has grown over time, and my largest client now accounts for less than 20% of my revenue. And I haven't seen too much seasonality to my work yet - summer was quieter because Europeans went on holiday, but it was still OK.
X: when would you NOT use #AppSync?
me: since AppSync gives you managed #GraphQL server as a service, so if you need a REST API instead then you won't use AppSync. Also, you wouldn't use AppSync if you need GraphQL/Apollo features that are not supported by AppSync
X: what sorta features are you talking about?
me: you can't define custom scalar types (e.g. LatLon is a popular one), and implementation-specific features like Apollo federations for schema stitching, or utilities like data loaders github.com/graphql/datalo…
X: ok, do you need them to build a product app?
me: no, you can absolutely build production apps without them, but these features can be very useful in some contexts, for example, Netflix uses federation heavily netflixtechblog.com/how-netflix-sc…
X: what's your opinion on VTL templates vs direct lambda invocations with AppSync?
me: you should use VTL templates (e.g. for DynamoDB) by default until it's either impossible or the VTL is getting too complex
X: but why?
me: well, let's see..
me: you can do whatever you want with Lambda, that gives you a lot of flexibility, but also all the drawbacks
X: such as?
me: cold starts, esp for resolvers that don't see a lot of traffic or the complexity overhead for mitigating them (Provisioned Concurrency or lambda warmer)
me: and you also have to consider the operational limits specific to Lambda, such as the *soft regional concurrency limit, and the *hard limit of 500 new concurrent executions per min after the init burst capacity (3000 concurrent executions)
X: ok, fair point.. anything else?
Editing my conversation with @lajacobsson for @RealWorldSls and there's a nugget of insight that I wanted to share with you about implicit coupling that we often overlook when using SNS with SQS.
The topic is usually owned by the publisher and deployed in the publisher's stack, and the subscriber would reference its ARN via a CloudFormation stack output or something, creating an implicit coupling there.
2/
You need message attributes to do filtering, but the publisher has no idea what the subscriber cares about (nor should it, loose coupling and all). So teams that own the subscriber have to ask the publisher's team to add the message attributes they need.
3/