In modern architecture, systems are broken up into small and independent building blocks with well-defined interfaces between them. Message queues provide communication and coordination for those building blocks. Today, let’s discuss at-most once, at-least once, and exactly once.
𝐀𝐭-𝐦𝐨𝐬𝐭 𝐨𝐧𝐜𝐞
As the name suggests, at-most once means a message will be delivered not more than once. Messages may be lost but are not redelivered. This is how at-most once delivery works at the high level.
Use cases: It is suitable for use cases like monitoring metrics, where a small amount of data loss is acceptable.
𝐀𝐭-𝐥𝐞𝐚𝐬𝐭 𝐨𝐧𝐜𝐞
With this data delivery semantic, it’s acceptable to deliver a message more than once, but no message should be lost.
Use cases: With at-least once, messages won’t be lost but the same message might be delivered multiple times.
While not ideal from a user perspective, at-least once delivery semantics are usually good enough for use cases where data duplication is not a big issue or deduplication is possible on the consumer side. For example, with a unique key in each message, a message can be rejected.
𝐄𝐱𝐚𝐜𝐭𝐥𝐲 𝐨𝐧𝐜𝐞
Exactly once is the most difficult delivery semantic to implement. It is friendly to users, but it has a high cost for the system’s performance and complexity.
Use cases: Financial-related use cases (payment, trading, accounting, etc.). Exactly once is especially important when duplication is not acceptable and the downstream service or third party doesn’t support idempotency.
Question: what is the difference between message queues vs event streaming platforms such as Kafka, Apache Pulsar, etc?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You probably heard about 𝐒𝐖𝐈𝐅𝐓. What is SWIFT? What role does it play in cross-border payments? Let's take a look.
The Society for Worldwide Interbank Financial Telecommunication (SWIFT) is the main secure 𝐦𝐞𝐬𝐬𝐚𝐠𝐢𝐧𝐠 𝐬𝐲𝐬𝐭𝐞𝐦 that links the world’s banks. 1/9
The Belgium-based system is run by its member banks and handles millions of payment messages per day. The diagram below illustrates how payment messages are transmitted from Bank A (in New York) to Bank B (in London). 2/9
Step 1: Bank A sends a message with transfer details to Regional Processor A in New York. The destination is Bank B. 3/9
In many large-scale applications, data is divided into partitions that can be accessed separately. There are two typical strategies for partitioning data.
🔹 Vertical partitioning: it means some columns are moved to new tables. Each table contains the same number of rows but fewer columns (see diagram below).
Horizontal partitioning (often called sharding): divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows.
Horizontal partitioning is widely used so let’s take a closer look
A really cool technique that’s commonly used in object storage such as S3 to improve durability is called 𝐄𝐫𝐚𝐬𝐮𝐫𝐞 𝐂𝐨𝐝𝐢𝐧𝐠. Let’s take a look at how it works. 1/7
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1️⃣ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.
2️⃣ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7
Today, let’s design an S3 like object storage system.
Before we dive into the design, let’s define some terms. 1/11
𝐁𝐮𝐜𝐤𝐞𝐭. A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket. 2/11
𝐎𝐛𝐣𝐞𝐜𝐭. An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object. 3/11
I'm the author of the best-selling book System Design Interview-An Insider’s Guide. 11 days ago, two fraudsters hijacked the "Buy Now" button on Amazon, fulfilling all orders with a different book. I'm helpless to do anything. A sad story on self-publishing: a thread.
How do I know Amazon fulfills pirated copies? I clicked on the “Buy Now” button and bought them. One had similar content but with a different layout and was printed on inferior quality paper. My book has 309 pages: the pirated one only 276 pages and a completely different ISBN.
How bad is the issue? I estimate between 60%-80% of the copies sold in the past 11 days are pirated books fulfilled by Amazon. You can see the “Buy Now” button hijacking in action here: amzn.to/3tX4r4b
One picture is worth more than a thousand words. In this thread, we will take a look at what happens when Alice sends an email to Bob.1/5
1. Alice logs in to her Outlook client, composes an email, and presses “send”. The email is sent to the Outlook mail server. The communication protocol between the Outlook client and mail server is SMTP.2/5
2. Outlook mail server queries the DNS (not shown in the diagram) to find the address of the recipient’s SMTP server. In this case, it is Gmail’s SMTP server. Next, it transfers the email to the Gmail mail server. The communication protocol between the mail servers is SMTP.3/5