Tweet

Maybe Scrolly?

Alex Xu

Follow @alexxubyte

Feb 16 • 11 tweets • 4 min read

Today, let’s design an S3 like object storage system.

Before we dive into the design, let’s define some terms. 1/11

𝐁𝐮𝐜𝐤𝐞𝐭. A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket. 2/11

𝐎𝐛𝐣𝐞𝐜𝐭. An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object. 3/11

An S3 object consists of (Figure 1):
🔹 Metadata. It is mutable and contains attributes such as ID, bucket name, object name, etc.
🔹 Object data. It is immutable and contains the actual data. 4/11

In S3, an object resides in a bucket. The path looks like this: /bucket-to-share/script.txt. The bucket only has metadata. The object has metadata and the actual data. 5/11

The diagram below (Figure 2) illustrates how file uploading works. In this example, we first create a bucket named “bucket-to-share” and then upload a file named “script.txt” to the bucket. 6/11

1. The client sends an HTTP PUT request to create a bucket named “bucket-to-share.” The request is forwarded to the API service.

2. The API service calls the Identity and Access Management (IAM) to ensure the user is authorized and has WRITE permission. 7/11

3. The API service calls the metadata store to create an entry with the bucket info in the metadata database. Once the entry is created, a success message is returned to the client. 8/11

4. After the bucket is created, the client sends an HTTP PUT request to create an object named “script.txt”.

5. The API service verifies the user’s identity and ensures the user has WRITE permission on the bucket. 9/11

6. Once validation succeeds, the API service sends the object data in the HTTP PUT payload to the data store. The data store persists the payload as an object and returns the UUID of the object. 10/11

7. The API service calls the metadata store to create a new entry in the metadata database. It contains important metadata such as the object_id (UUID), bucket_id (which bucket the object belongs to), object_name, etc. 12/12

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alexxubyte

Alex Xu

@alexxubyte

Feb 18

A really cool technique that’s commonly used in object storage such as S3 to improve durability is called 𝐄𝐫𝐚𝐬𝐮𝐫𝐞 𝐂𝐨𝐝𝐢𝐧𝐠. Let’s take a look at how it works. 1/7

Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7

1️⃣ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.

2️⃣ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7

Read 7 tweets

Alex Xu

@alexxubyte

Jan 26

I'm the author of the best-selling book System Design Interview-An Insider’s Guide. 11 days ago, two fraudsters hijacked the "Buy Now" button on Amazon, fulfilling all orders with a different book. I'm helpless to do anything. A sad story on self-publishing: a thread.

How do I know Amazon fulfills pirated copies? I clicked on the “Buy Now” button and bought them. One had similar content but with a different layout and was printed on inferior quality paper. My book has 309 pages: the pirated one only 276 pages and a completely different ISBN.

How bad is the issue? I estimate between 60%-80% of the copies sold in the past 11 days are pirated books fulfilled by Amazon. You can see the “Buy Now” button hijacking in action here: amzn.to/3tX4r4b

Read 10 tweets

Alex Xu

@alexxubyte

Jan 25

One picture is worth more than a thousand words. In this thread, we will take a look at what happens when Alice sends an email to Bob.1/5

1. Alice logs in to her Outlook client, composes an email, and presses “send”. The email is sent to the Outlook mail server. The communication protocol between the Outlook client and mail server is SMTP.2/5

2. Outlook mail server queries the DNS (not shown in the diagram) to find the address of the recipient’s SMTP server. In this case, it is Gmail’s SMTP server. Next, it transfers the email to the Gmail mail server. The communication protocol between the mail servers is SMTP.3/5

Read 5 tweets

Alex Xu

@alexxubyte

Jan 21

Metrics collection is a popular system design interview question. There are two ways metrics data can be collected, pull or push. It is a routine debate. In this post, we will take a look at the pull model. 1/8

Figure 1 shows data collection with a pull model over HTTP. We have dedicated metric collectors which pull metrics values from the running applications periodically. 2/8

In this approach, the metrics collector needs to know the complete list of service endpoints to pull data from. One naive approach is to use a file to hold DNS/IP information for every service endpoint on the “metric collector” servers. 3/8

Read 8 tweets

Alex Xu

@alexxubyte

Jan 19

In a payment system, it’s very important to separate 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐟𝐥𝐨𝐰 𝐚𝐧𝐝 𝐟𝐮𝐧𝐝 𝐟𝐥𝐨𝐰.1/6

In the diagram below, we have three layers:
- Transaction layer: where the online purchases happen
- Payment and clearing layer: where the payment instructions and transaction netting happen
- Settlement layer: where the actual money movement happen 2/6

The first two layers are called information flow, and the settlement layer is called fund flow. 3/6

Read 6 tweets

Alex Xu

@alexxubyte

Jan 18

One picture is worth more than a thousand words. This is what happens when you buy a product using Paypal/bank card under the hood.1/8

To understand this, we need to digest two concepts: 𝐜𝐥𝐞𝐚𝐫𝐢𝐧𝐠 & 𝐬𝐞𝐭𝐭𝐥𝐞𝐦𝐞𝐧𝐭. Clearing is a process that calculates who should pay whom with how much money; while settlement is a process where real money moves between reserves in the settlement bank. 2/8

Let’s say Bob wants to buy an SDI book from Claire’s shop on Amazon.

- Pay-in flow (Bob pays Amazon money):
1.1 Bob buys a book on Amazon using Paypal.
1.2 Amazon issues a money transfer request to Paypal.3/8

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Alex Xu

Try unrolling a thread yourself!

More from @alexxubyte

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?