Popular interview question: how to diagnose a mysterious process thatโs taking too much CPU, memory, IO, etc?
The diagram below illustrates helpful tools in a Linux system.
๐นโvmstatโ - reports information about processes, memory, paging, block IO, traps, and CPU activity.
๐นโiostatโ - reports CPU and input/output statistics of the system.
๐นโnetstatโ - displays statistical data related to IP, TCP, UDP, and ICMP protocols.
๐นโlsofโ - lists open files of the current system.
๐นโpidstatโ - monitors the utilization of system resources by all or specified processes, including CPU, memory, device IO, task switching, threads, etc.
My book โSystem Design Interview - Volume 2โ will be available on Amazon in about 10 days. There will be no pre-order. If you are interested, please subscribe to our mailing list and I will send you an email when the book is live. Thank you.
My new book System Design Interview - An Insiderโs Guide (Volume 2) will be available on Amazon soon! It is a continuation of the system design interview book series.
Some stats about the book:
๐น 13 NEW real system design interviews with detailed solutions.
๐น 300+ diagrams to explain how different systems work.
๐น 400+ pages.
๐น took 1.5 years to make.
My co-author @sahnlam and I have spent countless nights and weekends on the book. Our goal is to make complex systems easy to understand.
Design stock exchange. Letโs trace the life of an order through various components in the diagram to see how the pieces fit together.
First, we follow the order through the trading flow. This is the critical path with strict latency requirements. Everything has to happen fast in the flow:
Step 1: A client places an order via the brokerโs web or mobile app.
Step 2: The broker sends the order to the exchange.
Step 3: The order enters the exchange through the client gateway. The client gateway performs basic gatekeeping functions such as input validation, rate limiting, authentication, normalization, etc. The client gateway then forwards the order to the order manager.
You probably heard about ๐๐๐๐ ๐. What is SWIFT? What role does it play in cross-border payments? Let's take a look.
The Society for Worldwide Interbank Financial Telecommunication (SWIFT) is the main secure ๐ฆ๐๐ฌ๐ฌ๐๐ ๐ข๐ง๐ ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ that links the worldโs banks. 1/9
The Belgium-based system is run by its member banks and handles millions of payment messages per day. The diagram below illustrates how payment messages are transmitted from Bank A (in New York) to Bank B (in London). 2/9
Step 1: Bank A sends a message with transfer details to Regional Processor A in New York. The destination is Bank B. 3/9
In modern architecture, systems are broken up into small and independent building blocks with well-defined interfaces between them. Message queues provide communication and coordination for those building blocks. Today, letโs discuss at-most once, at-least once, and exactly once.
๐๐ญ-๐ฆ๐จ๐ฌ๐ญ ๐จ๐ง๐๐
As the name suggests, at-most once means a message will be delivered not more than once. Messages may be lost but are not redelivered. This is how at-most once delivery works at the high level.
Use cases: It is suitable for use cases like monitoring metrics, where a small amount of data loss is acceptable.
๐๐ญ-๐ฅ๐๐๐ฌ๐ญ ๐จ๐ง๐๐
With this data delivery semantic, itโs acceptable to deliver a message more than once, but no message should be lost.
In many large-scale applications, data is divided into partitions that can be accessed separately. There are two typical strategies for partitioning data.
๐น Vertical partitioning: it means some columns are moved to new tables. Each table contains the same number of rows but fewer columns (see diagram below).
Horizontal partitioning (often called sharding): divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows.
Horizontal partitioning is widely used so letโs take a closer look
A really cool technique thatโs commonly used in object storage such as S3 to improve durability is called ๐๐ซ๐๐ฌ๐ฎ๐ซ๐ ๐๐จ๐๐ข๐ง๐ . Letโs take a look at how it works. 1/7
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1๏ธโฃ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.
2๏ธโฃ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7