Read on Twitter

12,399 views

Brian Grant

@bgrant0607

, 18 tweets, 4 min read Read on Twitter

Kubernetes Borg/Omega history topic 4: Workload controllers. Before I get into the history, some conceptual background may be useful, since the underpinnings come up in many Cloud Native contexts. The key is to explicitly model state such that it can be externally manipulated.

Around the time Kubernetes was open sourced, there were a number of libraries and tools created to start containers on multiple machines. The original libswarm was one. The problem with such imperative, client-side approaches was that it was hard to add automation.

One could inject a scheduler in between, by emulating a single machine's remote API, but that would still lack an explicit model of what the user was trying to instantiate, the equivalent of the Pod template in Kubernetes.

That was added by some other tools, but the lack of also modeling the set of instances, with an explicit replica count, was an obstacle to higher-level automation, such as horizontal autoscaling and progressive rolling updates, both of which Kubernetes added in 1.1 and 1.2.

Kubernetes originally just supported one workload controller, the ReplicationController, which was designed for elastic stateless workloads with fungible replicas. Shortly after we open-sourced Kubernetes we started to discuss how to add support for additional kinds of workloads

In issues.k8s.io/1518, we started to discuss what became DaemonSet. The key decision was whether to add more functionality to ReplicationController or to create new resource types. Users of other systems were concerned about the complexity of using multiple types

Borg had supported just one workload "controller", the Job. (I'll address the differences between Borg's synchronous state machine and the Kubernetes async controllers later.) It's described well by the Borg paper: ai.google/research/pubs/…

Job, an array of Tasks, is used for elastic services, agents that ran on every node, batch workloads, and stateful workloads. Consequently, it has a large number of settings, and additional, external controllers are needed in order to support these different workloads.

For instance, for the daemon use case, a special controller / autoscaler is needed to ensure that the Job has a sufficient number of Tasks to cover all the machines, and cases where machines are removed from the middle of the array require special handling.

And not only is Job is the first-class primitive rather than Tasks, but each Task has a stable identity, as with StatefulSet in Kubernetes. That overly constraining not just for daemons, but also for autoscaled workloads, CI workloads, graceful termination, debugging, etc.

Job also includes published BNS records for tasks, which is the rough equivalent of Endpoints in Kubernetes. BNS records are stored in Chubby, where they can be watched. (I'll cover watch in K8s more generally later.)

The decoupling of Pod, workload controllers, and Endpoints, and a precedent for multiple workload controllers in Kubernetes has proven very flexible, for supporting many, many types of workloads. There are now many application-specific workload controllers (aka Operators).

Explicitly representing the PodTemplate as a separate object, as proposed in issues.k8s.io/170, may also have been useful for these third-party controllers, but in practice the lack of support for that hasn't been a huge obstacle. (Well, the API exists, but is unused.)

I proposed the idea of modeling workload controllers as loosely coupled sets of instances grouped using a label selector in June 2013, based on an 11-page analysis of Borg Job use cases, around the same timeframe as the original labels proposal.

That partly inspired replicapool.googleapis.com also, though the lack of labels in GCE at the time made implementing the full model infeasible.

Aside on "template": a "template" is a pattern used to make copies of the same shape. I think the Kubernetes "Pod template" usage is true to that colloquial definition, but typical CS usage implies parameterization and/or macro expansion, so maybe "prototype" would be better

The idea of explicitly modeling state so that it can be externally controlled and observed is a key principle of Cloud Native. I originally included it in a longer form of the definition I wrote for CNCF: github.com/cncf/toc/blob/…

The principle can also be applied to workflow systems and configuration management (e.g., see github.com/kubernetes/com…). Embodying these as code is powerful, but with great power comes great responsibility, since it obstructs external tooling and automation

Like this thread? Get email updates or save it to PDF!

Subscribe to Brian Grant

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Brian Grant

This content may be removed anytime!

Try unrolling a thread yourself!

More from @bgrant0607 see all

Related threads

Trending hashtags

Did Thread Reader help you today?