How does Netflix scale push messaging for millions of devices?
This post draws from an article published on Netflix’s engineering blog. Here’s my understanding of how the online streaming giant’s system works.
𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭𝐬 & 𝐬𝐜𝐚𝐥𝐞
- 220 million users
- Near real-time
- Backend systems need to send notifications to various clients
- Supported clients: iOS, Android, smart TVs, Roku, Amazon FireStick, web browser
𝐓𝐡𝐞 𝐥𝐢𝐟𝐞 𝐨𝐟 𝐚 𝐩𝐮𝐬𝐡 𝐧𝐨𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 1. Push notification events are triggered by the clock, user actions, or by systems.
2. Events are sent to the event management engine.
3. The event management engine listens to specific events and forward events to different queues. The queues are populated by priority-based event forwarding rules
4. The “event priority-based processing cluster” processes events and generates push notifications data for devices
5. A Cassandra database is used to store the notification data.
6. A push notification is sent to outbound messaging systems.
7. For Android, FCM is used to send push notifications. For Apple devices, APNs are used. For web, TV, and other streaming devices, Netflix’s homegrown solution called ‘Zuul Push’ is used.
Over to you: if you wanted to support every kind of device, which delivery model would work better, push or pull-based notifications?
Subscribe to our weekly newsletter to learn something new every week ⇩:
Model Context Protocol (MCP) is a new system introduced by Anthropic to make AI models more powerful.
It is an open standard (also being run as an open-source project) that allows AI models (like Claude) to connect to databases, APIs, file systems, and other tools without needing custom code for each new integration.
MCP follows a client-server model with 3 key components:
1 - Host: AI applications like Claude that provide the environment for AI interactions so that different tools and data sources can be accessed. The host runs the MCP Client.
Kubernetes (K8S) is an open-source container orchestration platform originally developed by Google and now maintained by CNCF.
Here’s how developers interact with Kubernetes:
1 - Developers create manifest files describing the application.
2 - Kubernetes takes these manifest files, validates them, and deploys the applications across its cluster of worker nodes.
3 - Kubernetes manages the entire lifecycle of the application.
Kubernetes is made up of two main components:
1 - Control Plane: It is like the brain of Kubernetes and consists of the following parts:
- API Server: It receives all incoming requests from users or CLI.
1 - Collaboration Tools
Software development is a social activity. Learn to use collaboration tools like Jira, Confluence, Slack, MS Teams, Zoom, etc.
2 - Programming Languages
Pick and master one or two programming languages. Choose from options like Java, Python, JavaScript, C#, Go, etc.
3 - API Development
Learn the ins and outs of API Development approaches such as REST, GraphQL, and gRPC.
4 - Web Servers and Hosting
Know about web servers as well as cloud platforms like AWS, Azure, GCP, and Kubernetes
5 - Authentication and Testing
Learn how to secure your applications with authentication techniques such as JWTs, OAuth2, etc. Also, master testing techniques like TDD, E2E Testing, and Performance Testing
6 - Databases
Learn to work with relational (Postgres, MySQL, and SQLite) and non-relational databases (MongoDB, Cassandra, and Redis).
Twitter has enforced very strict rate limiting. Some people cannot even see their own tweets.
Rate limiting is a very important yet often overlooked topic. Let's use this opportunity to take a look at what it is and the most popular algorithms.
A thread.
#RateLimitExceeded
What is rate limiting? Rate limiting controls the rate at which users or services can access a resource. Here are some examples:
- A user can send a message no more than 2 per second
- One can create a maximum of 10 accounts per day from the same IP address
Fixed Window Counter
The algorithm divides the timeline into fixed-size time windows and assigns a counter for each window. Each request increments the counter by some value. Once the counter reaches the threshold, subsequent requests are blocked until the new time window begins