/1 How do you decide which type of database to use?
There are hundreds or even thousands of databases available today, such as Oracle, MySQL, MariaDB, SQLite, PostgreSQL, Redis, ClickHouse, MongoDB, S3, Ceph, etc. How do you select the architecture for your system?
/2 My short summary is as follows:
🔹Relational database. Almost anything could be solved by them.
🔹In-memory store. Their speed and limited data size make them ideal for fast operations.
🔹Time-series database. Store and manage time-stamped data.
/3 🔹Graph database. It is suitable for complex relationships between unstructured objects.
🔹Document store. They are good for large immutable data.
🔹Wide column store. They are usually used for big data, analytics, reporting, etc., which needs denormalized data.
/4 👉 Over to you: Obviously, I did not cover every type of database. Is there anything else you often use, and why do you choose it?
/5 Subscribe to my weekly system design newsletter (10-min read): lnkd.in/gqFQ49AV
Based on the Lucene library, Elasticsearch provides search capabilities. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface. The diagram below shows the outline.
/2 Features of ElasticSearch:
🔹 Real-time full-text search
🔹 Analytics engine
🔹 Distributed Lucene
ElasticSearch use cases:
🔹 Product search on an eCommerce website
🔹 Log analysis
🔹 Auto completer, spell checker
🔹 Business intelligence analysis
🔹 Full-text search
/3 🔹 Full-text search on StackOverflow
The core of ElasticSearch lies in the data structure and indexing. It is important to understand how ES builds the 𝐭𝐞𝐫𝐦 𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐚𝐫𝐲 using 𝐋𝐒𝐌 𝐓𝐫𝐞𝐞 (Log-Strucutured Merge Tree).
/1 𝐓𝐢𝐦𝐞-𝐒𝐞𝐫𝐢𝐞𝐬 𝐃𝐁 (TSDB) in 20 lines. What is 𝐓𝐢𝐦𝐞-𝐒𝐞𝐫𝐢𝐞𝐬 𝐃𝐁 (TSDB)? How is it 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 from Relational DB?
/2 A couple of weeks ago, I had a great discussion with the CEO of TDengine @jhtao about time-series databases. This sparked my interest in learning more about this topic. The diagram below shows the 𝐢𝐧𝐭𝐞𝐫𝐧𝐚𝐥 𝐝𝐚𝐭𝐚 𝐦𝐨𝐝𝐞𝐥 of a typical Time-Series DB.
/3 A TSDB is a database optimized for time series data.
🔹 From the users’ perspective, the data looks similar to the relational DB table. But behind the scenes, the weather table is stored in 4 TSMs (Time-Structured Merge Trees) in the format of [Measurement, Tag, Field Name].
k8s is a container orchestration system. It is used for container deployment and management. Its design is greatly impacted by Google’s internal system Borg.
/2 A k8s cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node. [1]
/3 The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. [1]
1/ What are the differences between monolithic and microservice architecture?
The diagram compares monolithic and microservice architecture in the ideal world.
2/ Suppose we have an eCommerce website that needs to handle the functions below:
🔹 User Management
🔹 Procurement Management
🔹 Order Management
🔹 Inventory Management
🔹 Payments
🔹 Logistics
3/ In a monolithic architecture, all the components are deployed in one single instance. The service calls are within the same process, and no RPCs. The data tables relating to each component are usually deployed in the same database.
1/10 Session, cookie, JWT, token, SSO, and OAuth 2.0 - what are they?
2/10 These terms are all related to user identity management. When you log into a website, you declare who you are (identification). Your identity is verified (authentication), and you are granted the necessary permissions (authorization).
3/10 Many solutions have been proposed in the past, and the list keeps growing.
/1 Last week, Ticketmaster halted public ticket sales of Taylor Swift’s tour due to extraordinarily high demands on ticketing systems.
/2 It’s an interesting problem, so we did some research on this topic. The diagram below shows the evolution of the online China Train ticket booking system.
/3 The China Train tickets booking system has 𝐬𝐢𝐦𝐢𝐥𝐚𝐫 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞𝐬 as the Ticketmaster system:
1️⃣ Very high concurrent visits during peak hours.
2️⃣ The QPS for checking remaining tickets and orders is very high
3️⃣ A lot of bots