It's managed, highly available & scales on-demand with low latencies.
For getting you hooked, at Prime Days 2021 DynamoDB served ๐ด๐ต.๐ฎ ๐บ๐ถ๐น๐น๐ถ๐ผ๐ป ๐ฟ๐ฒ๐พ๐๐ฒ๐๐๐/๐๐ฒ๐ฐ๐ผ๐ป๐ฑ at its peak.
{ 2/38 }
"I've learned other services by exploration and trial & error"
That's totally legit and often works out.
But DynamoDB is different.
You'll save yourself a lot of pain in the future if you dive deep in the beginning!
You can choose between those two types, but can also change them at any time.
โข provisioned - specifying the capacity units for your table & you'll be billed for them
โข on-demand - paying per request
{ 4/38 }
Which one should you pick?
Go with ๐ข๐ป-๐๐ฒ๐บ๐ฎ๐ป๐ฑ if having unpredictable traffic, as it scales on-demand and you're only paying for what you actually use.
With steady load or known patterns, pick ๐ฃ๐ฟ๐ผ๐๐ถ๐๐ถ๐ผ๐ป๐ฒ๐ฑ as it's almost ๐ณ ๐๐ถ๐บ๐ฒ๐ less expensive!
{ 5/xx }
Your traffic patterns can vary with ๐ข๐ป-๐๐ฒ๐บ๐ฎ๐ป๐ฑ as you can create an auto-scaling configuration based on CloudWatch metrics to increase/decrease your capacities!
If you're still on the free tier (your account was created less than 1 year ago) and you're only having low traffic / few tables, always stick to ๐ฃ๐ฟ๐ผ๐๐ถ๐๐ถ๐ผ๐ป๐ฒ๐ฑ as it includes 25 read & write capacity units for free!
In comparison to SQL, a document in DynamoDB doesn't have a fixed schema
What is defined by the table: the primary key, which uniquely identifies each document
A document can also have other ๐ฎ๐๐๐ฟ๐ถ๐ฏ๐๐๐ฒ๐ with different ๐๐๐ฝ๐ฒ๐
{ 8/38 }
๐ฃ๐ฟ๐ถ๐บ๐ฎ๐ฟ๐ ๐๐ฒ๐๐
It's is your ๐๐ป๐ถ๐พ๐๐ฒ identifier & must be provided when inserting a new item
There are two different types of primary keys
โข simple - a single field; also your partition key
โข composite - build-up via your partition and range key
Internally, DynamoDB consists of different partitions where your items will be stored at.
Your partition key will run through a hash function which result will determine the partition.
A good partition key should be equally distributed
{ 10/38 }
Why is that important?
Your provisioned read & write capacity units will be distributed among partitions.
If your items are not well-distributed, it's way easier to get your requests throttled as you'll end up with ๐ต๐ผ๐ ๐ฝ๐ฎ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป๐ (receiving high load).
{ 11/38 }
๐ฅ๐ฎ๐ป๐ด๐ฒ ๐๐ฒ๐๐
Besides only having a partition key as your primary key, you can have a ๐ฐ๐ผ๐บ๐ฝ๐ผ๐๐ถ๐๐ฒ ๐ธ๐ฒ๐
It also spans over the range key (partition + range key has to be unique)
There are a lot of benefits as the range key can be used with Expressions
That's where it gets interesting and you see differences to SQL or other NoSQL solutions.
You can only query on indexes: your partition key & range key, if there's any.
Everything else needs ๐๐ฐ๐ฎ๐ป
{ 14/38 }
How do ๐พ๐๐ฒ๐ฟ๐ถ๐ฒ๐ and ๐๐ฐ๐ฎ๐ป๐ differ?
A scan is just running through your table looking for items that are matching your expression.
You'll be ๐ฏ๐ถ๐น๐น๐ฒ๐ฑ ๐ณ๐ผ๐ฟ ๐๐ต๐ฒ ๐ถ๐๐ฒ๐บ๐ ๐๐ต๐ฎ๐ ๐ฎ๐ฟ๐ฒ ๐๐ฐ๐ฎ๐ป๐ป๐ฒ๐ฑ, not the items that are retrieved.
{ 15/38 }
With ๐พ๐๐ฒ๐ฟ๐, you're only paying for the retrieved items.
It's only looking for the items at a specific partition.
So generally speaking: query is ๐๐ฎ๐ ๐ณ๐ฎ๐๐๐ฒ๐ฟ and ๐ฐ๐ต๐ฒ๐ฎ๐ฝ๐ฒ๐ฟ!
Often, there are possible race conditions due to multi-tenancy.
Example:
โข Process 1 reads Document A
โข Process 2 reads Document A
โข Process 1 writes Document A
โข Process 2 writes Document A
We'll lose our first write!
{ 17/38 }
DynamoDB got you covered by using ๐ฉ๐ฒ๐ฟ๐๐ถ๐ผ๐ป๐
With DynamoDB's data mapper, you can stick to using a field as a version indicator.
Each update will increment its value.
Internally, expressions will be used to check that the version matches our expected ones!
{ 18/38 }
We're ensuring that there are no intermediate writes, which would increase our version number
Intermediate write will throw a ๐๐ผ๐ป๐ฑ๐ถ๐๐ถ๐ผ๐ป๐ฎ๐น๐๐ต๐ฒ๐ฐ๐ธ๐๐ฎ๐ถ๐น๐ฒ๐ฑ๐๐ ๐ฐ๐ฒ๐ฝ๐๐ถ๐ผ๐ป
We can catch it & then handle those conflicts
Example: ๐๐ผ๐ป๐ฑ๐ถ๐๐ถ๐ผ๐ป ๐๐ ๐ฝ๐ฟ๐ฒ๐๐๐ถ๐ผ๐ป๐ check for conditions that have to be met before applying an update to a document.
Build these with the known comparators ๐ฒ๐พ๐๐ฎ๐น๐ (=), ๐ด๐ฟ๐ฒ๐ฎ๐๐ฒ๐ฟ ๐๐ต๐ฎ๐ป (>), or ๐ด๐ฟ๐ฒ๐ฎ๐๐ฒ๐ฟ ๐ผ๐ฟ ๐ฒ๐พ๐๐ฎ๐น ๐๐ต๐ฎ๐ป (>=)
{ 21/38 }
๐๐ป๐ฑ๐ฒ๐ ๐ฒ๐
As we're learned, you can only query on your partition and range keys.
But that can't be everything?
You're right: you can create indexes, which are specifying alternative key structures.
Those can also be used to query your items.
{ 22/38 }
There are two different types of indexes
โข local (๐ocal ๐ฆecondary ๐ndex - ๐๐ฆ๐) - needs to have the same hash/partition key, but an alternative range key
โข global (๐๐ฆ๐) - partition & range key can be different
Both allow us a more flexible query structure
{ 23/38 }
More things to know about Secondary Indexes
โข no uniqueness requirement for the primary keys of your secondary indexes
โข the attributes for your secondary index are optional
โข the number of secondary indexes are limited per table
โข LSI: 5
โข GSI: 20
{ 24/38 }
Also, you can specify which attributes are ๐ฝ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ฒ๐ฑ to your secondary index.
โข ๐๐๐ฌ๐ฆ_๐ข๐ก๐๐ฌ - only the (underlying) keys
โข ๐๐๐ - the full item
โข ๐๐ก๐๐๐จ๐๐ - only specific fields
Put thought into this.
{ 25/38 }
๐ฆ๐๐ฟ๐ฒ๐ฎ๐บ๐
DynamoDB Streams are another great features that allow you to invoke other services if items are ๐ฐ๐ฟ๐ฒ๐ฎ๐๐ฒ๐ฑ, ๐๐ฝ๐ฑ๐ฎ๐๐ฒ๐ฑ, or ๐ฑ๐ฒ๐น๐ฒ๐๐ฒ๐ฑ
๐๐ ๐ฎ๐บ๐ฝ๐น๐ฒ: forward data to ElasticSearch via Lambda!
This also allows you to manipulate or filter!
{ 26/38 }
๐ฆ๐ฒ๐ฐ๐๐ฟ๐ถ๐๐
DynamoDB tables are ๐ฏ๐ ๐ฑ๐ฒ๐ณ๐ฎ๐๐น๐ ๐ฒ๐ป๐ฐ๐ฟ๐๐ฝ๐๐ฒ๐ฑ ๐๐ถ๐๐ต ๐๐ ๐ฆ.
You can also choose to use a customer-managed key (CMK) which you are in control of.
As with other AWS services, access is fully covered via ๐๐๐ .
{ 27/38 }
๐๐ฎ๐ฐ๐ธ๐๐ฝ๐
DynamoDB's managed service and brings its own redundancy, but this does not protect you from making mistakes on your own.
That's why you have to keep backups.
The easiest and cheapest option is to regularly trigger on-demand backups.
Just create a Lambda function that will trigger backups on your table via aws-sdk.
Add an EventBridge rule which invokes your function regularly.
Enabling PITR for your table will allow you to restore your table to any state within the last 35 days. It comes with ๐ฎ๐ฑ๐ฑ๐ถ๐๐ถ๐ผ๐ป๐ฎ๐น ๐ฐ๐ผ๐๐๐.
The first two options do not protect you from table drops
That's why you should export your data to S3
It's also a feature directly offered by DynamoDB for PITR enabled tables
Automate this via Lambda & EventBridge rules
{ 31/38 }
๐๐น๐ผ๐ฏ๐ฎ๐น ๐ง๐ฎ๐ฏ๐น๐ฒ๐
It's likely that you want to have your infrastructure distributed around the globe for redundancy and faster latencies.
With DynamoDB, you can have ๐๐๐ป๐ฐ๐ต๐ฟ๐ผ๐ป๐ถ๐๐ฒ๐ฑ tables in different regions.
Regardless if you're using on-demand or provisioned capacity, you should always know what's going on: how much capacity is used, are there throttling events or spiking latencies, or is everything operating as expected?
{ 33/38 }
๐๐น๐ผ๐๐ฑ๐ช๐ฎ๐๐ฐ๐ต offers a great set of metrics to get a glance at your tables.
You'll see:
โข used read capacity units
โข used write capacity units
โข throttles
You can configure alerts on throttles or if certain thresholds for RCUs/WCUs are crossed.
{ 34/38 }
Third-party tools like @thedashbird help you monitoring your DynamoDB tables as well as giving you well-architected recommendations.
This helps you to find and fix errors & anomalies.
As Dashbirds ๐จโ๐ป๐ฅ: Send me a DM with any question ๐ซ ๐
๐๐ฟ๐ฒ๐ฑ๐ถ๐๐
Content is mostly inspired by ๐ง๐ต๐ฒ ๐๐๐ป๐ฎ๐บ๐ผ๐๐ ๐๐๐ถ๐ฑ๐ฒ by @alexbdebrie!
Have a look & deep dive into DynamoDB - you won't regret it.
Alex does a great job at explaining concepts in detail.
My recommended resource for working with DynamoDB in a professional context:
๐ง๐ต๐ฒ ๐๐๐ป๐ฎ๐บ๐ผ๐๐ ๐๐ผ๐ผ๐ธ - also by @alexbdebrie
Likely, it will save you a lot more money than it costs ๐
I'm still in the early stages & already got a lot of lessons learned โ
๐๐ฎ๐๐ป๐ฐ๐ต ๐ฒ๐ฎ๐ฟ๐น๐
Maybe you've got another dozen ideas for features you think are needed for your MVP.
But until you've launched and you've got actual (paying) users, you've got no guarantee that your business case is even valid.
Intersects with the previous point: don't make the shinest code, with 100% code coverage and the perfect architecture, as it requires way too much effort.
Don't over or underdo it.
Make it work & manageable.
Guarantees to not miss out on new features or services, but also contains interesting statistics and other insights from AWS itself.
Gets updated very regularly, sometimes several times a day.
If you're focusing on keeping up with the new capabilities AWS provides, that's your major source.
You'll learn about improvements to existing services, introductions of new ones as well as region expansions.
A physical server, only utilized by you
โข you have to know or guess the CPU & memory capacities you need
โข high risk of overpaying (underutilized server) or under-provisioning (too much load)
โข you're able to run multiple apps, but need to make sure that you're not causing conflicts by resource sharing
โข you're solely responsible for the security
โข up- or downscaling is tedious & not quickly possible
The concepts are crucial & being confident in them is a necessity.
From basics to advanced concepts ๐งตโ
For seriously working with AWS, there's no way around IAM.
Skipping to understand its core principles will bite you again and again in the future๏ธ ๐ฅ
Take the time to do a deep dive, so you won't be frustrated later.