Alex Xu Profile picture
Feb 16 โ€ข 11 tweets โ€ข 4 min read
Today, letโ€™s design an S3 like object storage system.

Before we dive into the design, letโ€™s define some terms. 1/11 Image
๐๐ฎ๐œ๐ค๐ž๐ญ. A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket. 2/11 Image
๐Ž๐›๐ฃ๐ž๐œ๐ญ. An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object. 3/11 Image
An S3 object consists of (Figure 1):
๐Ÿ”น Metadata. It is mutable and contains attributes such as ID, bucket name, object name, etc.
๐Ÿ”น Object data. It is immutable and contains the actual data. 4/11 Image
In S3, an object resides in a bucket. The path looks like this: /bucket-to-share/script.txt. The bucket only has metadata. The object has metadata and the actual data. 5/11 Image
The diagram below (Figure 2) illustrates how file uploading works. In this example, we first create a bucket named โ€œbucket-to-shareโ€ and then upload a file named โ€œscript.txtโ€ to the bucket. 6/11 Image
1. The client sends an HTTP PUT request to create a bucket named โ€œbucket-to-share.โ€ The request is forwarded to the API service.

2. The API service calls the Identity and Access Management (IAM) to ensure the user is authorized and has WRITE permission. 7/11 Image
3. The API service calls the metadata store to create an entry with the bucket info in the metadata database. Once the entry is created, a success message is returned to the client. 8/11 Image
4. After the bucket is created, the client sends an HTTP PUT request to create an object named โ€œscript.txtโ€.

5. The API service verifies the userโ€™s identity and ensures the user has WRITE permission on the bucket. 9/11 Image
6. Once validation succeeds, the API service sends the object data in the HTTP PUT payload to the data store. The data store persists the payload as an object and returns the UUID of the object. 10/11 Image
7. The API service calls the metadata store to create a new entry in the metadata database. It contains important metadata such as the object_id (UUID), bucket_id (which bucket the object belongs to), object_name, etc. 12/12 Image

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Alex Xu

Alex Xu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alexxubyte

Feb 18
A really cool technique thatโ€™s commonly used in object storage such as S3 to improve durability is called ๐„๐ซ๐š๐ฌ๐ฎ๐ซ๐ž ๐‚๐จ๐๐ข๐ง๐ . Letโ€™s take a look at how it works. 1/7 Image
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1๏ธโƒฃ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.

2๏ธโƒฃ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7 Image
Read 7 tweets
Jan 26
I'm the author of the best-selling book System Design Interview-An Insiderโ€™s Guide. 11 days ago, two fraudsters hijacked the "Buy Now" button on Amazon, fulfilling all orders with a different book. I'm helpless to do anything. A sad story on self-publishing: a thread.
How do I know Amazon fulfills pirated copies? I clicked on the โ€œBuy Nowโ€ button and bought them. One had similar content but with a different layout and was printed on inferior quality paper. My book has 309 pages: the pirated one only 276 pages and a completely different ISBN.
How bad is the issue? I estimate between 60%-80% of the copies sold in the past 11 days are pirated books fulfilled by Amazon. You can see the โ€œBuy Nowโ€ button hijacking in action here: amzn.to/3tX4r4b
Read 10 tweets
Jan 25
One picture is worth more than a thousand words. In this thread, we will take a look at what happens when Alice sends an email to Bob.1/5
1. Alice logs in to her Outlook client, composes an email, and presses โ€œsendโ€. The email is sent to the Outlook mail server. The communication protocol between the Outlook client and mail server is SMTP.2/5
2. Outlook mail server queries the DNS (not shown in the diagram) to find the address of the recipientโ€™s SMTP server. In this case, it is Gmailโ€™s SMTP server. Next, it transfers the email to the Gmail mail server. The communication protocol between the mail servers is SMTP.3/5
Read 5 tweets
Jan 21
Metrics collection is a popular system design interview question. There are two ways metrics data can be collected, pull or push. It is a routine debate. In this post, we will take a look at the pull model. 1/8
Figure 1 shows data collection with a pull model over HTTP. We have dedicated metric collectors which pull metrics values from the running applications periodically. 2/8
In this approach, the metrics collector needs to know the complete list of service endpoints to pull data from. One naive approach is to use a file to hold DNS/IP information for every service endpoint on the โ€œmetric collectorโ€ servers. 3/8
Read 8 tweets
Jan 19
In a payment system, itโ€™s very important to separate ๐ข๐ง๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง ๐Ÿ๐ฅ๐จ๐ฐ ๐š๐ง๐ ๐Ÿ๐ฎ๐ง๐ ๐Ÿ๐ฅ๐จ๐ฐ.1/6
In the diagram below, we have three layers:
- Transaction layer: where the online purchases happen
- Payment and clearing layer: where the payment instructions and transaction netting happen
- Settlement layer: where the actual money movement happen 2/6
The first two layers are called information flow, and the settlement layer is called fund flow. 3/6
Read 6 tweets
Jan 18
One picture is worth more than a thousand words. This is what happens when you buy a product using Paypal/bank card under the hood.1/8
To understand this, we need to digest two concepts: ๐œ๐ฅ๐ž๐š๐ซ๐ข๐ง๐  & ๐ฌ๐ž๐ญ๐ญ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ. Clearing is a process that calculates who should pay whom with how much money; while settlement is a process where real money moves between reserves in the settlement bank. 2/8
Letโ€™s say Bob wants to buy an SDI book from Claireโ€™s shop on Amazon.

- Pay-in flow (Bob pays Amazon money):
1.1 Bob buys a book on Amazon using Paypal.
1.2 Amazon issues a money transfer request to Paypal.3/8
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(