THREAD: How does the scheduler work in Kubernetes?
The scheduler is in charge of deciding where your pods are deployed in the cluster.
It might sound like an easy job, but it's rather complicated!
Let's dive into it.
1/8
Every time a Pod is created, it also added to the Scheduler queue.
The scheduler process Pods 1 by 1 through two phases:
1. Scheduling phase (what node should I pick?) 2. Binding phase (let's write to the database that this pod belongs to that node)
2/8
The Scheduler phase is divided into two parts. The Scheduler:
1. Filters relevant nodes (using a list of functions call predicates) 2. Ranks the remaining nodes (using a list of functions called priorities)
Let's make an example.
3/8
You want to deploy a Pod that requires some GPU. You submit the pod to the cluster and:
1. The scheduler filters all Nodes that don't have GPUs 2. The scheduler ranks the remaining nodes and picks the least utilised node 3. The pod is scheduled on the node
4/8
At this moment, the filtering phase has 13 predicates.
That's 13 functions to decide whatever the scheduler should discard the node as a possible target from the pod.
Even the scoring phase has 13 priorities.
Those are 13 functions to decide how to score and rank nodes.
5/8
How can you influence the scheduler's decisions?
- nodeSelector
- Node affinity
- Pod affinity/anti-affinity
- Taints and toleration
And what if you want to customise the scheduler?
6/8
You can write your plugins for the scheduler. You can customise any block in the Scheduling phase.
The binding phase doesn't expose any public API, though.
You probably know there are some iptables somewhere, but do you know the exact sequence of chains involved in routing traffic to a ClusterIP?
What about a NodePort? Is that different?
🧵
1/
Services relies on the Linux kernel's networking stack and the Netfilter framework to modify and redirect network traffic. The Netfilter framework provides hooks at different stages of the networking stack where rules can be inserted to filter, change, or redirect packets
2/
The Netfilter framework offers five hooks to modify network traffic: PRE_ROUTING, INPUT, FORWARD, OUTPUT, and POST_ROUTING. These hooks represent different stages in the networking stack, allowing you to intercept and modify packets at various points in their journey
How does Pod to Pod communication work in Kubernetes?
How does the traffic reach the pod?
Let's dive into how low-level networking works in Kubernetes.
1/
When you deploy a Pod, the following things happen:
➀ The pod gets its own network namespace
➁ An IP address is assigned
➂ Any containers in the pod share the same networking namespace and can see each other on localhost
2/
A pod must first have access to the node's root namespace to reach other pods
This is achieved using a virtual eth pair connecting the 2 namespaces: pod and root
The bridge allows traffic to flow between virtual pairs and traverse through the common root namespace