A really cool technique thatโs commonly used in object storage such as S3 to improve durability is called ๐๐ซ๐๐ฌ๐ฎ๐ซ๐ ๐๐จ๐๐ข๐ง๐ . Letโs take a look at how it works. 1/7
Erasure coding deals with data durability differently from replication. It chunks data into smaller pieces and creates parities for redundancy. In the event of failures, we can use chunk data and parities to reconstruct the data. 4 + 2 erasure coding is shown in Figure 1. 2/7
1๏ธโฃ Data is broken up into four even-sized data chunks d1, d2, d3, and d4.
2๏ธโฃ The mathematical formula is used to calculate the parities p1 and p2. To give a much simplified example, p1 = d1 + 2*d2 - d3 + 4*d4 and p2 = -d1 + 5*d2 + d3 - 3*d4. 3/7
3๏ธโฃ Data d3 and d4 are lost due to node crashes.
4๏ธโฃ The mathematical formula is used to reconstruct lost data d3 and d4, using the known values of d1, d2, p1, and p2. 4/7
How much extra space does erasure coding need? For every two chunks of data, we need one parity block, so the storage overhead is 50% (Figure 2). While in 3-copy replication, the storage overhead is 200% (Figure 2). 5/7
Does erasure coding increase data durability? Letโs assume a node has a 0.81% annual failure rate. According to the calculation done by Backblaze, erasure coding can achieve 11 nines durability vs 3-copy replication can achieve 6 nines durability. 6/7
What other techniques do you think are important to improve the scalability and durability of an object store such as S3? 7/7
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
Model Context Protocol (MCP) is a new system introduced by Anthropic to make AI models more powerful.
It is an open standard (also being run as an open-source project) that allows AI models (like Claude) to connect to databases, APIs, file systems, and other tools without needing custom code for each new integration.
MCP follows a client-server model with 3 key components:
1 - Host: AI applications like Claude that provide the environment for AI interactions so that different tools and data sources can be accessed. The host runs the MCP Client.
Kubernetes (K8S) is an open-source container orchestration platform originally developed by Google and now maintained by CNCF.
Hereโs how developers interact with Kubernetes:
1 - Developers create manifest files describing the application.
2 - Kubernetes takes these manifest files, validates them, and deploys the applications across its cluster of worker nodes.
3 - Kubernetes manages the entire lifecycle of the application.
Kubernetes is made up of two main components:
1 - Control Plane: It is like the brain of Kubernetes and consists of the following parts:
- API Server: It receives all incoming requests from users or CLI.
1 - Collaboration Tools
Software development is a social activity. Learn to use collaboration tools like Jira, Confluence, Slack, MS Teams, Zoom, etc.
2 - Programming Languages
Pick and master one or two programming languages. Choose from options like Java, Python, JavaScript, C#, Go, etc.
3 - API Development
Learn the ins and outs of API Development approaches such as REST, GraphQL, and gRPC.
4 - Web Servers and Hosting
Know about web servers as well as cloud platforms like AWS, Azure, GCP, and Kubernetes
5 - Authentication and Testing
Learn how to secure your applications with authentication techniques such as JWTs, OAuth2, etc. Also, master testing techniques like TDD, E2E Testing, and Performance Testing
6 - Databases
Learn to work with relational (Postgres, MySQL, and SQLite) and non-relational databases (MongoDB, Cassandra, and Redis).
Twitter has enforced very strict rate limiting. Some people cannot even see their own tweets.
Rate limiting is a very important yet often overlooked topic. Let's use this opportunity to take a look at what it is and the most popular algorithms.
A thread.
#RateLimitExceeded
What is rate limiting? Rate limiting controls the rate at which users or services can access a resource. Here are some examples:
- A user can send a message no more than 2 per second
- One can create a maximum of 10 accounts per day from the same IP address
Fixed Window Counter
The algorithm divides the timeline into fixed-size time windows and assigns a counter for each window. Each request increments the counter by some value. Once the counter reaches the threshold, subsequent requests are blocked until the new time window begins