, 9 tweets, 4 min read
Kubernetes Borg/Omega history topic 12: A follow-on to the PodDisruptionBudget topic: the descheduler (github.com/kubernetes-inc…). Descheduler is more appropriate than the original term "rescheduler", because its job is to decide which pods to kill, not to replace or schedule them
In Kubernetes, when running on a cloud provider such as in GKE, in the case of pending pods with no existing available space to be placed, either cluster autoscaling or even node autoprovisioning (github.com/kubernetes/aut…, cloud.google.com/kubernetes-eng…) can create new nodes for them
In Borg, the rescheduler was created to defragment nodes to make room. It selected tasks to evict so that the new tasks could schedule, while also ensuring the replacements for the evicted tasks could also find new homes so as not to just cause unnecessary churn
In K8s, the purpose of the descheduler is mainly to reshuffle pods to improve the overall distribution of pods across nodes. After some churn in a cluster due to pod terminations due to pod autoscaling, pod updates, pods for batch/CI tasks, etc., pod layout can become uneven
A simple example: Say the cluster autoscaler (github.com/kubernetes/aut…) added a new node for new pods. If those pods were due to creation of a new Deployment or ReplicaSet, they could all land on the new node if there weren't enough space on existing nodes
From the experience in Borg, we knew the descheduler would be needed from the beginning of the Kubernetes project. I think it was first mentioned when discussing the addition of liveness and readiness probes: github.com/kubernetes/kub…
This enabled us to establish a clear separation of concerns between pod creation and replacement by workload controllers, horizontal scaling by HPA, placement by the scheduler, and rebalancing across nodes and failure domains by the descheduler, which would respect PDB
That division was discussed when designing eviction for unresponsive nodes (github.com/kubernetes/kub…) and then in issues.k8s.io/12140. The design docs can be found at github.com/kubernetes/com… and github.com/kubernetes/com…
Note that if churn in the cluster is sufficiently high and eviction is highly constrained due to PodDisruptionBudgets, it may not be possible for the descheduler to keep up. This is one reason why it may not be possible to achieve an "optimal" layout
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Brian Grant

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!