A really cool technique thatโs commonly used in object storage such as S3 to improve durability is called ๐๐ซ๐๐ฌ๐ฎ๐ซ๐ ๐๐จ๐๐ข๐ง๐ . Letโs take a look at how it works. 1/7
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1๏ธโฃ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.
2๏ธโฃ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7
3๏ธโฃ Data d3 and d4 are lost due to node crashes.
4๏ธโฃ The mathematical formula is used to reconstruct lost data d3 and d4, using the known values of d1, d2, p1, and p2. 4/7
How much extra space does erasure coding need? For every two chunks of data, we need one parity block, so the storage overhead is 50% (Figure 2). While in 3-copy replication, the storage overhead is 200% (Figure 2). 5/7
Does erasure coding increase data durability? Letโs assume a node has a 0.81% annual failure rate. According to the calculation done by Backblaze, erasure coding can achieve 11 nines durability vs 3-copy replication can achieve 6 nines durability. 6/7
What other techniques do you think are important to improve the scalability and durability of an object store such as S3? 7/7
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
Today, letโs design an S3 like object storage system.
Before we dive into the design, letโs define some terms. 1/11
๐๐ฎ๐๐ค๐๐ญ. A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket. 2/11
๐๐๐ฃ๐๐๐ญ. An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object. 3/11
I'm the author of the best-selling book System Design Interview-An Insiderโs Guide. 11 days ago, two fraudsters hijacked the "Buy Now" button on Amazon, fulfilling all orders with a different book. I'm helpless to do anything. A sad story on self-publishing: a thread.
How do I know Amazon fulfills pirated copies? I clicked on the โBuy Nowโ button and bought them. One had similar content but with a different layout and was printed on inferior quality paper. My book has 309 pages: the pirated one only 276 pages and a completely different ISBN.
How bad is the issue? I estimate between 60%-80% of the copies sold in the past 11 days are pirated books fulfilled by Amazon. You can see the โBuy Nowโ button hijacking in action here: amzn.to/3tX4r4b
One picture is worth more than a thousand words. In this thread, we will take a look at what happens when Alice sends an email to Bob.1/5
1. Alice logs in to her Outlook client, composes an email, and presses โsendโ. The email is sent to the Outlook mail server. The communication protocol between the Outlook client and mail server is SMTP.2/5
2. Outlook mail server queries the DNS (not shown in the diagram) to find the address of the recipientโs SMTP server. In this case, it is Gmailโs SMTP server. Next, it transfers the email to the Gmail mail server. The communication protocol between the mail servers is SMTP.3/5
Metrics collection is a popular system design interview question. There are two ways metrics data can be collected, pull or push. It is a routine debate. In this post, we will take a look at the pull model. 1/8
Figure 1 shows data collection with a pull model over HTTP. We have dedicated metric collectors which pull metrics values from the running applications periodically. 2/8
In this approach, the metrics collector needs to know the complete list of service endpoints to pull data from. One naive approach is to use a file to hold DNS/IP information for every service endpoint on the โmetric collectorโ servers. 3/8
In a payment system, itโs very important to separate ๐ข๐ง๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง ๐๐ฅ๐จ๐ฐ ๐๐ง๐ ๐๐ฎ๐ง๐ ๐๐ฅ๐จ๐ฐ.1/6
In the diagram below, we have three layers:
- Transaction layer: where the online purchases happen
- Payment and clearing layer: where the payment instructions and transaction netting happen
- Settlement layer: where the actual money movement happen 2/6
The first two layers are called information flow, and the settlement layer is called fund flow. 3/6
One picture is worth more than a thousand words. This is what happens when you buy a product using Paypal/bank card under the hood.1/8
To understand this, we need to digest two concepts: ๐๐ฅ๐๐๐ซ๐ข๐ง๐ & ๐ฌ๐๐ญ๐ญ๐ฅ๐๐ฆ๐๐ง๐ญ. Clearing is a process that calculates who should pay whom with how much money; while settlement is a process where real money moves between reserves in the settlement bank. 2/8
Letโs say Bob wants to buy an SDI book from Claireโs shop on Amazon.
- Pay-in flow (Bob pays Amazon money): 1.1 Bob buys a book on Amazon using Paypal. 1.2 Amazon issues a money transfer request to Paypal.3/8