Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Daniele Polencic — @danielepolencic@hachyderm.io

Sep 24, 2020 • 9 tweets • 5 min read • Read on X

Scrolly

THREAD: How does the scheduler work in Kubernetes?

The scheduler is in charge of deciding where your pods are deployed in the cluster.

It might sound like an easy job, but it's rather complicated!

Let's dive into it.

1/8

Every time a Pod is created, it also added to the Scheduler queue.

The scheduler process Pods 1 by 1 through two phases:

1. Scheduling phase (what node should I pick?)
2. Binding phase (let's write to the database that this pod belongs to that node)

2/8

The Scheduler phase is divided into two parts. The Scheduler:

1. Filters relevant nodes (using a list of functions call predicates)
2. Ranks the remaining nodes (using a list of functions called priorities)

Let's make an example.

3/8

You want to deploy a Pod that requires some GPU. You submit the pod to the cluster and:

1. The scheduler filters all Nodes that don't have GPUs
2. The scheduler ranks the remaining nodes and picks the least utilised node
3. The pod is scheduled on the node

4/8

At this moment, the filtering phase has 13 predicates.

That's 13 functions to decide whatever the scheduler should discard the node as a possible target from the pod.

Even the scoring phase has 13 priorities.

Those are 13 functions to decide how to score and rank nodes.

5/8

How can you influence the scheduler's decisions?

- nodeSelector
- Node affinity
- Pod affinity/anti-affinity
- Taints and toleration

And what if you want to customise the scheduler?

6/8

You can write your plugins for the scheduler. You can customise any block in the Scheduling phase.
The binding phase doesn't expose any public API, though.

7/8

You can learn more about the scheduler here:

- Kubernetes scheduler kubernetes.io/docs/concepts/…
- Scheduling policies kubernetes.io/docs/reference…
- Scheduling framework kubernetes.io/docs/concepts/…

https://twitter.com/danielepolencic/status/1298543151901155330

8/8

Did you like this thread?

You might enjoy the previous threads too! You can find all of them here:

https://twitter.com/danielepolencic/status/1298543151901155330

What would you like to see next? Please reply and let me know!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @danielepolencic

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Dec 1

Your container does not have GPU drivers installed
So, how does PyTorch inside it actually use the host's GPU?

Let me explain 🧵

First, understand the host side
The NVIDIA kernel driver exposes GPUs as device files: /dev/nvidia0, /dev/nvidiactl, etc

This is how ANY application talks to the GPU — through these device files

PyTorch doesn't call the driver directly
It uses the CUDA Runtime () — a high-level API that handles memory management, kernel launches, and synchronization

This runtime library lives inside your container libcudart.so

Read 9 tweets

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Dec 2, 2024

Kubernetes CPU limits are not intuitive at all.

You think you are protecting against CPU abuse from other apps, but you risk incurring latency spikes.

Some people will say that you shouldn't use them. Let's learn why and whether you should join them.

1/

Kubernetes limits are an abstraction over the Linux Kernel, particularly the Completely Fair Scheduler (CFS)

The scheduler decides how CPU time is allocated to a process. In general, each process is given some processing time over some time

2/

For example, a limit of 0.1 vCPU or 100 millicores means you can use 100 millicores every 0.1 seconds

Read 17 tweets

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Oct 28, 2024

How do Kubernetes Services work?

You probably know there are some iptables somewhere, but do you know the exact sequence of chains involved in routing traffic to a ClusterIP?

What about a NodePort? Is that different?

🧵

1/

Services relies on the Linux kernel's networking stack and the Netfilter framework to modify and redirect network traffic. The Netfilter framework provides hooks at different stages of the networking stack where rules can be inserted to filter, change, or redirect packets

2/

The Netfilter framework offers five hooks to modify network traffic: PRE_ROUTING, INPUT, FORWARD, OUTPUT, and POST_ROUTING. These hooks represent different stages in the networking stack, allowing you to intercept and modify packets at various points in their journey

Read 15 tweets

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Mar 11, 2024

Having multiple tenants sharing a Kubernetes cluster makes sense from a cost perspective, but what's the overhead?

How much should you invest to keep the tenant isolated, and how does it compare to running several clusters?

We ran three experiments and recorded the costs.

Before examining the costs, let's look at the scale of the problem.

Most teams partition their cluster by environment.

For example, ten teams might have three environments each (i.e. dev, test and prod).

If you partition the cluster by environment and team, you will have 30 distinct slices.

What happens when you scale to 50 teams?

You will end up with 150 slices, of course.

But what are the consequences of this decision?

Read 18 tweets

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Oct 3, 2023

By default, Kubernetes doesn't recompute and rebalance workloads.

You could have a cluster with fewer overutilized nodes and others with a handful of pods

How can you fix this?

🚨 Spoiler: you can watch Chris talking about this next week:

Continues…👇 bit.ly/k8s-optimize-3

1/

Let's consider a cluster with a single node that can host 2 Pods

You maxed out all available resources so you can scale the cluster to have a second node and spread the load

2/

You provision a second node; what happens next?

Does Kubernetes notice that there's a space for your Pod?

Does it move the second Pod and rebalance the cluster?

Unfortunately, it does not

But why?

Read 19 tweets

Daniele Polencic — @danielepolencic@hachyderm.io

@danielepolencic

Sep 5, 2023

What if you could choose the best node for your Kubernetes cluster before writing any code?

Imagine being able to estimate:

- Utilization.
- Overcommitment.
- Wasted resources.
- Costs.

And compare the results for multiple setups.

Let me show you how.

1/

First, not all resources in worker nodes can be used to run workloads

You need to account for memory and CPU used by kubelet, kube-proxy, operating system, etc

2/

Assuming you have accounted for those, instance types come in all shapes and sizes

How do you pick the best?

That's a tricky question, so I usually take a different approach: What's the best worker node for my workload?

Read 18 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Daniele Polencic — @danielepolencic@hachyderm.io

Try unrolling a thread yourself!

More from @danielepolencic

Daniele Polencic — @danielepolencic@hachyderm.io

Daniele Polencic — @danielepolencic@hachyderm.io

Daniele Polencic — @danielepolencic@hachyderm.io

Daniele Polencic — @danielepolencic@hachyderm.io

Daniele Polencic — @danielepolencic@hachyderm.io

Daniele Polencic — @danielepolencic@hachyderm.io

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!