Tweet

Tobias Schmidt

13 Oct, 40 tweets, 8 min read

📚 AWS 1x1 - 𝗗𝘆𝗻𝗮𝗺𝗼𝗗𝗕 💾

My personal holy grail of database solutions & one of AWS' flagship services

An all-embracing mega-thread 🧵↓

𝗧𝗵𝗿𝗲𝗮𝗱 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄

• Introduction
• Provisioned vs On-Demand Capacity
• Basic Concepts
• Keys & Attributes
• Retrieving Items
• Race Conditions
• Expressions
• Indexes
• Streams
• Security
• Backups
• Global Tables
• Observability

{ 1/38 }

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻

Why should you care about DynamoDB?

It's managed, highly available & scales on-demand with low latencies.

For getting you hooked, at Prime Days 2021 DynamoDB served 𝟴𝟵.𝟮 𝗺𝗶𝗹𝗹𝗶𝗼𝗻 𝗿𝗲𝗾𝘂𝗲𝘀𝘁𝘀/𝘀𝗲𝗰𝗼𝗻𝗱 at its peak.

{ 2/38 }

https://twitter.com/curtiseinsmann/status/1446055548986798087

"I've learned other services by exploration and trial & error"

That's totally legit and often works out.
But DynamoDB is different.

You'll save yourself a lot of pain in the future if you dive deep in the beginning!

https://twitter.com/curtiseinsmann/status/1446055548986798087

{ 3/38 }

𝗣𝗿𝗼𝘃𝗶𝘀𝗶𝗼𝗻𝗲𝗱 𝘃𝘀 𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 𝗖𝗮𝗽𝗮𝗰𝗶𝘁𝘆

You can choose between those two types, but can also change them at any time.
• provisioned - specifying the capacity units for your table & you'll be billed for them
• on-demand - paying per request

{ 4/38 }

Which one should you pick?

Go with 𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 if having unpredictable traffic, as it scales on-demand and you're only paying for what you actually use.
With steady load or known patterns, pick 𝗣𝗿𝗼𝘃𝗶𝘀𝗶𝗼𝗻𝗲𝗱 as it's almost 𝟳 𝘁𝗶𝗺𝗲𝘀 less expensive!

{ 5/xx }

Your traffic patterns can vary with 𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 as you can create an auto-scaling configuration based on CloudWatch metrics to increase/decrease your capacities!

It's not an easy task to do this well though.

{ 6/38 }

𝗙𝗿𝗲𝗲 𝗧𝗶𝗲𝗿 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿

If you're still on the free tier (your account was created less than 1 year ago) and you're only having low traffic / few tables, always stick to 𝗣𝗿𝗼𝘃𝗶𝘀𝗶𝗼𝗻𝗲𝗱 as it includes 25 read & write capacity units for free!

{ 7/38 }

𝗕𝗮𝘀𝗶𝗰 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀

In comparison to SQL, a document in DynamoDB doesn't have a fixed schema
What is defined by the table: the primary key, which uniquely identifies each document

A document can also have other 𝗮𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝘀 with different 𝘁𝘆𝗽𝗲𝘀

{ 8/38 }

𝗣𝗿𝗶𝗺𝗮𝗿𝘆 𝗞𝗲𝘆𝘀

It's is your 𝘂𝗻𝗶𝗾𝘂𝗲 identifier & must be provided when inserting a new item
There are two different types of primary keys
• simple - a single field; also your partition key
• composite - build-up via your partition and range key

{ 9/38 }

𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 𝗞𝗲𝘆𝘀

Internally, DynamoDB consists of different partitions where your items will be stored at.
Your partition key will run through a hash function which result will determine the partition.

A good partition key should be equally distributed

{ 10/38 }

Why is that important?

Your provisioned read & write capacity units will be distributed among partitions.
If your items are not well-distributed, it's way easier to get your requests throttled as you'll end up with 𝗵𝗼𝘁 𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 (receiving high load).

{ 11/38 }

𝗥𝗮𝗻𝗴𝗲 𝗞𝗲𝘆𝘀

Besides only having a partition key as your primary key, you can have a 𝗰𝗼𝗺𝗽𝗼𝘀𝗶𝘁𝗲 𝗸𝗲𝘆
It also spans over the range key (partition + range key has to be unique)

There are a lot of benefits as the range key can be used with Expressions

{ 12/38 }

𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝘀 & 𝗧𝘆𝗽𝗲𝘀

Besides your primary key, your document can contain other fields of different types

Among those:
• String (𝗦)
• Number (𝗡)
• Binary (𝗠)
• Boolean (𝗕𝗢𝗢𝗟)
• List (𝗟)
• Map (𝗠)

𝗘𝘅𝗮𝗺𝗽𝗹𝗲:
"Provider": { "S": "AWS" }

{ 13/38 }

𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗶𝗻𝗴 𝗶𝘁𝗲𝗺𝘀

That's where it gets interesting and you see differences to SQL or other NoSQL solutions.
You can only query on indexes: your partition key & range key, if there's any.

Everything else needs 𝘀𝗰𝗮𝗻

{ 14/38 }

How do 𝗾𝘂𝗲𝗿𝗶𝗲𝘀 and 𝘀𝗰𝗮𝗻𝘀 differ?

A scan is just running through your table looking for items that are matching your expression.
You'll be 𝗯𝗶𝗹𝗹𝗲𝗱 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗶𝘁𝗲𝗺𝘀 𝘁𝗵𝗮𝘁 𝗮𝗿𝗲 𝘀𝗰𝗮𝗻𝗻𝗲𝗱, not the items that are retrieved.

{ 15/38 }

With 𝗾𝘂𝗲𝗿𝘆, you're only paying for the retrieved items.
It's only looking for the items at a specific partition.

So generally speaking: query is 𝘄𝗮𝘆 𝗳𝗮𝘀𝘁𝗲𝗿 and 𝗰𝗵𝗲𝗮𝗽𝗲𝗿!

{ 16/38 }

𝗥𝗮𝗰𝗲 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝘀

Often, there are possible race conditions due to multi-tenancy.
Example:
• Process 1 reads Document A
• Process 2 reads Document A
• Process 1 writes Document A
• Process 2 writes Document A

We'll lose our first write!

{ 17/38 }

DynamoDB got you covered by using 𝗩𝗲𝗿𝘀𝗶𝗼𝗻𝘀

With DynamoDB's data mapper, you can stick to using a field as a version indicator.
Each update will increment its value.

Internally, expressions will be used to check that the version matches our expected ones!

{ 18/38 }

We're ensuring that there are no intermediate writes, which would increase our version number

Intermediate write will throw a 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹𝗖𝗵𝗲𝗰𝗸𝗙𝗮𝗶𝗹𝗲𝗱𝗘𝘅𝗰𝗲𝗽𝘁𝗶𝗼𝗻
We can catch it & then handle those conflicts

github.com/awslabs/dynamo…

{ 19/38 }

𝗘𝘅𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻𝘀

With expressions, you can check for certain conditions that must be met to actually execute your statement.

Types of expressions:
• Condition
• Projection
• Update
• Key Condition
• Filter

{ 20/38 }

Example: 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻 𝗘𝘅𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻𝘀 check for conditions that have to be met before applying an update to a document.

Build these with the known comparators 𝗲𝗾𝘂𝗮𝗹𝘀 (=), 𝗴𝗿𝗲𝗮𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 (>), or 𝗴𝗿𝗲𝗮𝘁𝗲𝗿 𝗼𝗿 𝗲𝗾𝘂𝗮𝗹 𝘁𝗵𝗮𝗻 (>=)

{ 21/38 }

𝗜𝗻𝗱𝗲𝘅𝗲𝘀

As we're learned, you can only query on your partition and range keys.
But that can't be everything?

You're right: you can create indexes, which are specifying alternative key structures.
Those can also be used to query your items.

{ 22/38 }

There are two different types of indexes

• local (𝗟ocal 𝗦econdary 𝗜ndex - 𝗟𝗦𝗜) - needs to have the same hash/partition key, but an alternative range key
• global (𝗚𝗦𝗜) - partition & range key can be different

Both allow us a more flexible query structure

{ 23/38 }

More things to know about Secondary Indexes

• no uniqueness requirement for the primary keys of your secondary indexes
• the attributes for your secondary index are optional
• the number of secondary indexes are limited per table
• LSI: 5
• GSI: 20

{ 24/38 }

Also, you can specify which attributes are 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝗲𝗱 to your secondary index.
• 𝗞𝗘𝗬𝗦_𝗢𝗡𝗟𝗬 - only the (underlying) keys
• 𝗔𝗟𝗟 - the full item
• 𝗜𝗡𝗖𝗟𝗨𝗗𝗘 - only specific fields

Put thought into this.

{ 25/38 }

𝗦𝘁𝗿𝗲𝗮𝗺𝘀

DynamoDB Streams are another great features that allow you to invoke other services if items are 𝗰𝗿𝗲𝗮𝘁𝗲𝗱, 𝘂𝗽𝗱𝗮𝘁𝗲𝗱, or 𝗱𝗲𝗹𝗲𝘁𝗲𝗱

𝗘𝘅𝗮𝗺𝗽𝗹𝗲: forward data to ElasticSearch via Lambda!
This also allows you to manipulate or filter!

{ 26/38 }

𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆

DynamoDB tables are 𝗯𝘆 𝗱𝗲𝗳𝗮𝘂𝗹𝘁 𝗲𝗻𝗰𝗿𝘆𝗽𝘁𝗲𝗱 𝘄𝗶𝘁𝗵 𝗞𝗠𝗦.
You can also choose to use a customer-managed key (CMK) which you are in control of.

As with other AWS services, access is fully covered via 𝗜𝗔𝗠.

{ 27/38 }

𝗕𝗮𝗰𝗸𝘂𝗽𝘀

DynamoDB's managed service and brings its own redundancy, but this does not protect you from making mistakes on your own.
That's why you have to keep backups.

The good part: AWS makes this easy for you.

{ 28/38 }

𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 𝗕𝗮𝗰𝗸𝘂𝗽𝘀

The easiest and cheapest option is to regularly trigger on-demand backups.
Just create a Lambda function that will trigger backups on your table via aws-sdk.
Add an EventBridge rule which invokes your function regularly.

{ 29/38 }

𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗯𝗮𝗰𝗸𝘂𝗽𝘀 𝘃𝗶𝗮 𝗣𝗼𝗶𝗻𝘁-𝗜𝗻-𝗧𝗶𝗺𝗲-𝗥𝗲𝗰𝗼𝘃𝗲𝗿𝘆

Enabling PITR for your table will allow you to restore your table to any state within the last 35 days. It comes with 𝗮𝗱𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗰𝗼𝘀𝘁𝘀.

{ 30/38 }

𝗘𝘅𝗽𝗼𝗿𝘁𝗶𝗻𝗴 𝗯𝗮𝗰𝗸𝘂𝗽𝘀 𝘁𝗼 𝗦𝟯

The first two options do not protect you from table drops
That's why you should export your data to S3
It's also a feature directly offered by DynamoDB for PITR enabled tables
Automate this via Lambda & EventBridge rules

{ 31/38 }

𝗚𝗹𝗼𝗯𝗮𝗹 𝗧𝗮𝗯𝗹𝗲𝘀

It's likely that you want to have your infrastructure distributed around the globe for redundancy and faster latencies.

With DynamoDB, you can have 𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗶𝘇𝗲𝗱 tables in different regions.

{ 32/38 }

𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆

Regardless if you're using on-demand or provisioned capacity, you should always know what's going on: how much capacity is used, are there throttling events or spiking latencies, or is everything operating as expected?

{ 33/38 }

𝗖𝗹𝗼𝘂𝗱𝗪𝗮𝘁𝗰𝗵 offers a great set of metrics to get a glance at your tables.

You'll see:
• used read capacity units
• used write capacity units
• throttles

You can configure alerts on throttles or if certain thresholds for RCUs/WCUs are crossed.

{ 34/38 }

@thedashbird

Third-party tools like @thedashbird help you monitoring your DynamoDB tables as well as giving you well-architected recommendations.

This helps you to find and fix errors & anomalies.

As Dashbirds 👨‍💻🥑: Send me a DM with any question 📫 😊

dashbird.io/event-library/…

{ 35/38 }

@alexbdebrie

𝗖𝗿𝗲𝗱𝗶𝘁𝘀
Content is mostly inspired by 𝗧𝗵𝗲 𝗗𝘆𝗻𝗮𝗺𝗼𝗗𝗕 𝗚𝘂𝗶𝗱𝗲 by @alexbdebrie!
Have a look & deep dive into DynamoDB - you won't regret it.

Alex does a great job at explaining concepts in detail.

dynamodbguide.com

{ 36/38 }

@alexbdebrie

It doesn't end here!
There's a lot more to learn

My recommended resource for working with DynamoDB in a professional context:
𝗧𝗵𝗲 𝗗𝘆𝗻𝗮𝗺𝗼𝗗𝗕 𝗕𝗼𝗼𝗸 - also by @alexbdebrie

Likely, it will save you a lot more money than it costs 🙌

dynamodbbook.com

{ 37/38 }

𝗕𝗼𝗻𝘂𝘀: 𝗗𝘆𝗻𝗮𝗺𝗼𝗗𝗕'𝘀 𝗪𝗵𝗶𝘁𝗲𝗽𝗮𝗽𝗲𝗿

Even if you're not into reading papers, that one is worth reading anyway.
If you got some spare time, take a look.

allthingsdistributed.com/files/amazon-d…

{ 38/38 }

Thank you for reading!
Hopefully, you've learned something new or I've got you hooked about DynamoDB! 🤖

If so, ♻️ or ♥️ of the initial posts are appreciated!

Also, drop me a follow if you're interested in regular AWS & Azure content! 🌤

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Tobias Schmidt

Try unrolling a thread yourself!

More from @tpschmidt_

Tobias Schmidt

Tobias Schmidt

Tobias Schmidt

Tobias Schmidt

Tobias Schmidt

Tobias Schmidt

Did Thread Reader help you today?

Like this author's thread?