Aidan W Steele Profile picture
Sep 15, 2022 25 tweets 5 min read Read on X
After using AWS for ~14 years, I've internalised a handful of design patterns that I try to apply to my own software. I'm keen to know if it's the same for other folks.

Roughly: tags, IDs (thrice), limits, pagination.

(I'm not going to use the thread emoji)
1: Tags.

A lot of software has support for tags, but it's usually a set of strings. This is useful in the case of "show me all resources tagged 'engineering'". Or even "show me all resources tagged 'engineering' and 'frontend'".
AWS resources go one step further and implement key-value tags. So in addition to the above, you can ask the question "show me all resources grouped by the 'department' tag key". This is super useful for power users, folks in finance, etc.
2: Resource ID prefixes

Almost all objects in a web service will have some kind of ID. It might be an auto-incrementing integer, or a GUID, a random string, etc.

When debugging, you will often ask a user for the ID of the misbehaving object. You look, but it's not in the DB??
Turns out they gave you a session ID instead of an order ID. Doh!
This is where resource ID *prefixes* are insanely useful. Think EC2 instance IDs: i-abc123 or EBS volumes: vol-def456.

Now when you ask a user for an ID, you'll know immediately if it's the wrong thing entirely.
2b: Fully qualified IDs

Sometimes an object ID by itself isn't enough. You store your data hierarchically on S3, so an invoice ID alone can't find its PDF. You also need the customer ID.

This is where AWS uses ARNs. An ARN should be sufficient to locate the stored object.
E.g. instead of passing around invoice IDs like inv-12345, you pass fully qualified IDs like eu:c-acd223:inv-12345. With that string, your code knows the PDF is in the EU partition, under the c-acd223 customer directory, file name inv-12345.pdf. Much quicker troubleshooting.
AWS doesn't always get this right! When ECS first launched, task ARNs were of the format arn:aws:ecs:${region}:${account}:task/${taskId}. But you could have multiple clusters in a single region - so which cluster do you look up?

They later migrated to :task/${cluster}/${taskId}.
2c: ULIDs > GUIDs >> Integers

This is a bonus one on the theme of IDs, because I don't think AWS does it.

GUIDs are better than auto-incrementing integer IDs because they don't reveal the number of objects in your DB and they don't have contention on the "next ID" counter.
But ULIDs are better than GUIDs because they encode creation timestamps in the prefix. This means they are naturally sortable by creation time. This is useful for databases like DynamoDB or Redshift where data can be stored sorted and you can easily return the latest N objects.
It's also useful in unexpected places. Maybe you think you don't need creation timestamp in your customer IDs? I mean, how often are they created? And they rarely ever need to be sorted.
What about when you migrate data store? Your proxy (in the hot path of every request) can forward any customers created after date X to service B and older ones to service A -- all without needing a roundtrip to the DB to look up if they were created before/after the cutover.
3: Limits.

AWS publishes limits for just about every service. This serves a number of purposes.

* They can build a service with reliable performance because they know there will never be 10,000x more objects than anticipated during system design.
* Customers can be confident that the service can handle their use case, because it's well within the published limits. Their app isn't going to break because it will hit a pathological behaviour in the AWS service.
* Limits can always be raised over time, or in response to a customer request. This means AWS can provision an appropriate level of resourcing.

IMO, limits show a level of professionalism. It means that the service operator understands what their service can -- and cannot -- do.
4: Pagination.

Every AWS service (that I can think of) uses "next page" cursors for pagination, not page indexes. Hell, maybe they *do* use page indexes under the hood, but that is an *implementation detail* that is not exposed in the opaque "next page" cursor.
This gives you the ability to implement pagination with ease. If your data store is DynamoDB, you can return a wrapped DynamoDB pagination token to your users.
You can also migrate data stores. Maybe you're moving from one that supports page indexes to one that doesn't. Now you don't need to break your API contract or introduce a new v2 API with different pagination.
Maybe your pages need an expensive query to a SQL DB. You don't want to have to repeat that query 10x for 10 pages and throw away 90% of the data each time. This makes your DBA sad.
Instead, on page 1 you retrieve the whole dataset and store it in Redis under a GUID. Now your pagination token can be that GUID + an offset into the Redis array. Now your pages are super speedy to load. You have a happy DBA.
Also on pagination: your docs should make explicit that the number of results on a page can be ≤ ${pageSize}.

This means that you can have more predictable performance for your API. You can make sure a page is never more than 1MB of data.
Or that a page doesn't spill over to a second DB partition and take 15x longer to load as the previous page.
PS: don't forget to encrypt your pagination token, or users will reverse engineer the format and start crafting their own. Ideally using AEAD, so they can't tamper with it at all - and the encryption context should include the customer ID to avoid data leaks.
Anyway, that's all I can think of for now. Do you have any others that I missed?

I partially wrote this list for that sweet Twitter fame, but mostly so I have a reference for when my colleague @kmcquade3 asks why I'm implementing something a particular way.
Daniel had a really good bonus one: Access key IDs.

By having long-lived access key IDs in a regex-able format (e.g AKIA[XX…]) it allows you to detect secret leaks.

GitHub even lets software vendors register their credential regexes and get webhooks on leaks == auto-revoke

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aidan W Steele

Aidan W Steele Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @__steele

Sep 22, 2022
I regret to inform that I am extremely back on my bullshit.

I've been thinking about connectivity in unusual places. And I got to thinking: can I establish bidirectional connectivity over the Internet between two EC2 instances in private subnets without a third-party relay? Image
Typically you would expect that connectivity between instances A and B isn't possible - `ping` fails to yield responses after all. But it turns out that an instance with a public IP address in a VPC with an IGW attached can _receive_ traffic - it just can't respond to it.
This is because response packets are routed via the subnet's route table and would transit via the NAT GW with a different IP. That's why TCP handshakes fail and ICMP or UDP responses just get dropped on the floor of cyberspace.
Read 12 tweets
Jan 31, 2022
New thing alert: jwtex. GitHub OIDC federation was a great start, but I want more. Specifically:

a) The ability to use GitHub CI job info as AWS role session tags.

b) CloudTrail entries enriched with a lot more context about the CI job that assumed the role.

1/5 ImageImageImage
github.com/aidansteele/jw…

It's a Lambda app that receives JWTs, transforms them and returns new OIDC-compatible JWTs. Your AWS IAM roles trust this IdP instead of GitHub.

You provide a "mapper" Lambda function to process the JWT. The sample function creates role session tags. 2/5
You store a list of trusted source IdPs (e.g. GitHub, GitLab) in Parameter Store. The provided JWTs are validated before being passed to the mapper.

The payload returned by the mapper is signed by an RSA key stored in AWS KMS, then vended to the client.

3/5
Read 7 tweets
Jan 12, 2022
(1/13) People seemed to like the thread last week about silly IPv6 TOTP possible in AWS EC2. But then @donkersgood said I should do something useful instead.

So here are some useful things that are possible thanks to the AWS Gateway Load Balancer

So far I've seen a few boring firewalls (no offence) and nothing else done with the GWLB. So I made a framework and a few sample apps to demonstrate what's possible.

It's called flowdog because I am bad at names. And logos. But I like dogs. github.com/aidansteele/fl…
What's especially neat about the GWLB is that none of the code in these examples needs to be deployed in *your* VPC.

It's all powered by PrivateLink, so they can be entirely managed Rickroll-as-a-service in a vendor account (or something more useful)
Read 15 tweets
May 26, 2021
I've been using AWS CDK full-time for six months now. My feelings are mixed. First, the requests. I wish AWS CDK had these behaviours out of the box:
First: default tags. It would be very useful to me as an AWS admin if CDK resources automatically had some default tags. Useful ones would be:

• cdk:id
• cdk:path
• cdk:stack-name
For whatever reason, CloudFormation doesn't apply aws:cloudformation:* tags to all taggable resources, so this could be a reason by itself to use CDK!
Read 11 tweets
Mar 8, 2021
I made a thing. stepfn.dev is a site for rapidly iterating on AWS Step Function designs. Change a few characters, hit Cmd+Enter, see result ~300ms later. Much faster feedback loop.

The other use is sharing SFNs on Twitter for when you need help.

1/4
It's the first website I've built in React and most of the buttons in the nav don't actually do anything yet. But I figure YOLO. It's all on GitHub so I'd appreciate some frontend help from anyone with more than my one day of experience.

2/4
For example, anyone know how to fix this padding?! I spent more than an hour trying to fix it and almost tore my hair out. I can't afford to do that, my hair is already thinning out at an unacceptably rapid pace.

3/4
Read 4 tweets
Jul 6, 2020
I like the AWS CodeBuild curated environment Docker images, but I'd like to add a couple of things without having to modify the "install" step across all my projects.

Using the curated Dockerfile adds at least 5 mins to CodeBuild provisioning time. So what can we do to fix it?
I extracted the curated images' layers from the CodeBuild environment and published them to Docker Hub.

Using this as the the base for my own custom images has eliminated the 5+ min delay as CodeBuild no longer needs to gunzip 4GB of layers on every build.
Details here github.com/glassechidna/c… I've only bothered to do the Ubuntu images as that seems to be where AWS is steering us?
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(