The diagram below shows the evolution of message storage at Discord:
MongoDB ➡️ Cassandar ➡️ ScyllaDB
/2 In 2015, the first version of Discord was built on top of a single MongoDB replica. Around Nov 2015, MongoDB stored 100 million messages and the RAM couldn’t hold the data and index any longer. The latency became unpredictable.
/3 In 2017, Discord had 12 Cassandra nodes and stored billions of messages.
At the beginning of 2022, it had 177 nodes with trillions of messages. At this point, latency was unpredictable, and maintenance operations became too expensive.
There are several reasons for the issue:
/4
- Cassandra uses the LSM tree for the internal data structure. The reads are more expensive than the writes. There can be many concurrent reads on a server with hundreds of users, resulting in hotspots.
- Maintaining clusters, such as compacting SSTables, impacts performance.
/5
- Garbage collection pauses would cause significant latency spikes
ScyllaDB is Cassandra compatible database written in C++. Discord redesigned its architecture to have a monolithic API, a data service written in Rust, and ScyllaDB-based storage.
/6 The p99 read latency in ScyllaDB is 15ms compared to 40-125ms in Cassandra. The p99 write latency is 5ms compared to 5-70ms in Cassandra.
👉 Over to you: What kind of NoSQL database have you used? How do you like it?
/1 Want to know the secret to optimizing your SQL queries? Understanding the execution order is key.
/2 SQL statements are executed by the database system in several steps, including:
- Parsing the SQL statement and checking its validity
- Transforming the SQL into an internal representation, such as relational algebra
/3
- Optimizing the internal representation and creating an execution plan that utilizes index information
- Executing the plan and returning the results
/1 What distinguishes MVC, MVP, MVVM, MVVM-C, and VIPER architecture patterns from each other?
Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): blog.bytebytego.com
/2 These architecture patterns are among the most commonly used in app development, whether on iOS or Android platforms. Developers have introduced them to overcome the limitations of earlier patterns. So, how do they differ?
/3 🔹 MVC, the oldest pattern, dates back almost 50 years
🔹 Every pattern has a "view" (V) responsible for displaying content and receiving user input
🔹 Most patterns include a "model" (M) to manage business data
/1 Almost every software engineer has used Git before, but only a handful know how it works :) Let's dive in.
/2 To begin with, it's essential to identify where our code is stored. The common assumption is that there are only two locations - one on a remote server like Github and the other on our local machine.
/3 However, this isn't entirely accurate. Git maintains three local storages on our machine, which means that our code can be found in four places:
/1 I read something unbelievable today: Levels. fyi scaled to millions of users using 𝐆𝐨𝐨𝐠𝐥𝐞 𝐒𝐡𝐞𝐞𝐭𝐬 𝐚𝐬 𝐚 𝐛𝐚𝐜𝐤𝐞𝐧𝐝!
They started off on Google Forms and Sheets, which helped them reach millions of monthly active users before switching to a proper backend.
/2 To be fair, they do use serverless computing, but using Google Sheets as the database is an interesting choice.
Why do they use Google Sheets as a backend?
/3 Using their own words: "It seems like a pretty counterintuitive idea for a site with our traffic volume to not have a backend or any fancy infrastructure, but our philosophy to building products has always been, start simple and iterate.
/2 🔹 1. REST
Proposed in 2000, REST is the most used style. It is often used between front-end clients and back-end services. It is compliant with 6 architectural constraints. The payload format can be JSON, XML, HTML, or plain text.
/3 🔹 2. GraphQL
GraphQL was proposed in 2015 by Meta. It provides a schema and type system, suitable for complex systems where the relationships between entities are graph-like.