In many large-scale applications, data is divided into partitions that can be accessed separately. There are two typical strategies for partitioning data.
๐น Vertical partitioning: it means some columns are moved to new tables. Each table contains the same number of rows but fewer columns (see diagram below).
Horizontal partitioning (often called sharding): divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows.
Horizontal partitioning is widely used so letโs take a closer look
๐๐จ๐ฎ๐ญ๐ข๐ง๐ ๐๐ฅ๐ ๐จ๐ซ๐ข๐ญ๐ก๐ฆ
The routing algorithm decides which partition (shard) stores the data.
๐น Range-based sharding. This algorithm uses ordered columns, such as integers, longs, timestamps, to separate the rows. For example, the diagram below uses the User ID column for range partition: User IDs 1 and 2 are in shard 1, User IDs 3 and 4 are in shard 2.
๐น Hash-based sharding. This algorithm applies a hash function to one column or several columns to decide which row goes to which table. For example, the diagram below uses ๐๐ฌ๐๐ซ ๐๐ ๐ฆ๐จ๐ 2 as a hash function. User IDs 1 and 3 are in shard 1, User IDs 2 and 4 are in shard 2
๐๐๐ง๐๐๐ข๐ญ๐ฌ
๐น Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load.
๐น Shorten response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly.
๐๐ซ๐๐ฐ๐๐๐๐ค๐ฌ
๐น The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application's code.
๐น Uneven distribution. Some shards may contain more data than others (this is also called the hotspot).
This topic is very big and Iโm sure I missed a lot of important details. What else do you think is important for data partitioning?
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
You probably heard about ๐๐๐๐ ๐. What is SWIFT? What role does it play in cross-border payments? Let's take a look.
The Society for Worldwide Interbank Financial Telecommunication (SWIFT) is the main secure ๐ฆ๐๐ฌ๐ฌ๐๐ ๐ข๐ง๐ ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ that links the worldโs banks. 1/9
The Belgium-based system is run by its member banks and handles millions of payment messages per day. The diagram below illustrates how payment messages are transmitted from Bank A (in New York) to Bank B (in London). 2/9
Step 1: Bank A sends a message with transfer details to Regional Processor A in New York. The destination is Bank B. 3/9
In modern architecture, systems are broken up into small and independent building blocks with well-defined interfaces between them. Message queues provide communication and coordination for those building blocks. Today, letโs discuss at-most once, at-least once, and exactly once.
๐๐ญ-๐ฆ๐จ๐ฌ๐ญ ๐จ๐ง๐๐
As the name suggests, at-most once means a message will be delivered not more than once. Messages may be lost but are not redelivered. This is how at-most once delivery works at the high level.
Use cases: It is suitable for use cases like monitoring metrics, where a small amount of data loss is acceptable.
๐๐ญ-๐ฅ๐๐๐ฌ๐ญ ๐จ๐ง๐๐
With this data delivery semantic, itโs acceptable to deliver a message more than once, but no message should be lost.
A really cool technique thatโs commonly used in object storage such as S3 to improve durability is called ๐๐ซ๐๐ฌ๐ฎ๐ซ๐ ๐๐จ๐๐ข๐ง๐ . Letโs take a look at how it works. 1/7
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1๏ธโฃ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.
2๏ธโฃ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7
Today, letโs design an S3 like object storage system.
Before we dive into the design, letโs define some terms. 1/11
๐๐ฎ๐๐ค๐๐ญ. A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket. 2/11
๐๐๐ฃ๐๐๐ญ. An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object. 3/11
I'm the author of the best-selling book System Design Interview-An Insiderโs Guide. 11 days ago, two fraudsters hijacked the "Buy Now" button on Amazon, fulfilling all orders with a different book. I'm helpless to do anything. A sad story on self-publishing: a thread.
How do I know Amazon fulfills pirated copies? I clicked on the โBuy Nowโ button and bought them. One had similar content but with a different layout and was printed on inferior quality paper. My book has 309 pages: the pirated one only 276 pages and a completely different ISBN.
How bad is the issue? I estimate between 60%-80% of the copies sold in the past 11 days are pirated books fulfilled by Amazon. You can see the โBuy Nowโ button hijacking in action here: amzn.to/3tX4r4b
One picture is worth more than a thousand words. In this thread, we will take a look at what happens when Alice sends an email to Bob.1/5
1. Alice logs in to her Outlook client, composes an email, and presses โsendโ. The email is sent to the Outlook mail server. The communication protocol between the Outlook client and mail server is SMTP.2/5
2. Outlook mail server queries the DNS (not shown in the diagram) to find the address of the recipientโs SMTP server. In this case, it is Gmailโs SMTP server. Next, it transfers the email to the Gmail mail server. The communication protocol between the mail servers is SMTP.3/5