, 16 tweets, 5 min read
Kubernetes Borg/Omega history topic 14: Computational Quality of Service (QoS) and oversubscription. What are they, why would you want them, and how is QoS different than priority? On the last point, it's distinguishing importance and urgency.
QoS is something that wouldn't matter if just one process were running per host system or if all the processes steadily used a constant amount of cpu, memory, and other resources. Because they vary, reserving max capacity needed for each would leave systems poorly utilized
Oversubscription mitigates that by packing more applications onto a system than could all fit at their peak requirements. It's kind of like a bank: not everyone can withdraw at the same time. The question is then: what happens when apps demand more resources than they can get?
With time-division multiplexing, many CPU threads can be interleaved. They can be blocked and queued by the OS, typically at the cost of context switches and waiting a few time slices. Thus there is no fixed limit to how many can be packed onto a machine. CPU is compressible
OTOH, swapping memory pages, even to local SSD, is painfully expensive. This is why systems hosting services that need to respond with subsecond latency disable swap. Memory is considered an incompressible resource. For simplicity, I'll ignore resources other than CPU and memory
Compressible resources like CPU can be made available quickly by the kernel with low impact to the threads that were interrupted, provided it knows which threads urgently need the resources and which ones don't. We call this latency sensitive and latency tolerant respectively
Borg used an explicit attribute to indicate this, called appclass, which is described by the Borg paper: ai.google/research/pubs/…. This was translated to scheduling latency in LMCTFY: github.com/google/lmctfy/…. In Kubernetes, it's inferred from resource requests and limits
In order to reallocate incompressible resources quickly, threads need to be killed, which is obviously not low impact. (For memory, in Linux this is done by the OOM killer.) This was why Borg used priority (production priority vs not) to make memory oversubscription decisions
Borg's resource reclamation approach is described by the paper: reservations based on observed usage were computed and oversubscribed resources (latency-tolerant cpu and non-production memory) were tallied against reservations whereas guaranteed ones used limits. Complicated.
Vertical autoscaling (VA) added even more complexity. VA changed limits, but left its own padding to provide slack for reaction time and observation of demand. Ad hoc mechanisms were added to disable limit enforcement for each resource, creating a notion similar to request in K8s
In K8s, I wanted something simpler, to directly convey the desire for oversubscription and bursting flexibility. The discussion started way back in issues.k8s.io/147 and issues.k8s.io/168. The model we settled on was determined by looking at limits and requests
Request==Limit implies guaranteed resources (not oversubscribed). Request<Limit implies burstable (oversubscribed). Zero request implies best effort. Borg scheduled best effort using reservation, but no throughput guarantees could be made in practice
This is described in the resource model design (github.com/kubernetes/com…) and the QoS proposal (github.com/kubernetes/com…), including the mapping to OOM scores. The mapping to cgroup cpu shares is described in the pod resource design (github.com/kubernetes/com…).
Some work on Vertical Pod Autoscaling for Kubernetes has started: github.com/kubernetes/com…. There have been proposals to implement oversubscription also (github.com/kubernetes/enh…). As for horizontal scaling, resource monitoring infrastructure is a prerequisite
If managing cluster-level sharing using ResourceQuota and LimitRange, oversubscription can be done at that level also. The original designs were described by github.com/kubernetes/com… and github.com/kubernetes/com…, with improvements in github.com/kubernetes/com…
Ok, this topic doesn't fit into a Twitter form factor very well. Maybe some day I'll get around to writing this up more in long form. For now, that's about all I have time for, but questions are welcome
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Brian Grant

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!