Brian Grant Profile picture
Original lead architect of Kubernetes. Emeritus Kubernetes Steering Committee and CNCF TOC. Former Chief Architect of PeakStream (HPC on GPUs).
Lei Zhang (Harry) Profile picture Adel Zaalouk Profile picture _ Profile picture techvedi Profile picture Ronak Kogta Profile picture 10 subscribed
Jun 10, 2022 7 tweets 2 min read
Trying a short talk by Twitter. :-) Systems are automated using APIs. Automation is interoperable with user interface surfaces, like GUIs and CLIs. Image Configuration variations are often expressed using templates, and configuration tools often assume exclusive actuation. They are not interoperable with user interface surfaces, except for reads. Image
May 13, 2022 19 tweets 3 min read
We've wrapped API-driven systems with human-edited text files and scripts, and cumbersome human-in-the-loop workflows. We need to be able to operate on configuration as data. I agree with "infrastructure as software" approaches that generators for complex configuration are best built using general-purpose languages, but they don't need to be exclusive, non-interoperable, and monolithic.
Feb 18, 2022 9 tweets 5 min read
@danielbryantuk's thread about platform engineering got me reflecting on the tools available for building platforms on Kubernetes these days. Not the big components, but the scaffolding and surfaces. I don't normally bother with emojis, but:🧵 For a GUI/Console/Dashboard, one could build backstage.io plugins. For a CLI, one could build a kubectl plugin (krew.sigs.k8s.io/plugins/), though the fooctl pattern still appears to be prevalent.
Feb 4, 2022 5 tweets 1 min read
There are many config languages: HCL, Jsonnet, Starlark, Dhall, CUE, ... And GP languages used for config: Ruby, Python, Typescript, Java, ... And templates: Go templates, Jinja, envsubst, ... And data formats: YAML, JSON, XML, ... xkcd.com/927/ The choice will always be dominated by familiarity, preference, legacy, tool and library support, and factors other than technical differences.
Jan 21, 2022 4 tweets 2 min read
Part 1 of the Kubernetes documentary is out! You can see what we open sourced in the first commit: github.com/kubernetes/kub…. It was a docker-based orchestrator with a REST API, but was not K8s as we know it today. It took another year to shape the API, kubectl, architecture, etc. One aspect that didn't really come out in the video was that we had a need for a container-based system inspired by Borg/Omega in order to fill the gap between IaaS / VMs (heavy, opaque) and PaaS (restrictive, opinionated) and we saw that docker could be a good base for that
Apr 1, 2020 10 tweets 3 min read
A lightweight tool to facilitate config reuse, designed for GitOps, config as data, and composition with other tools and formats. Configuration was noted as a hard, open problem in our Borg, Omega, and Kubernetes paper: research.google/pubs/pub44843/, and there continues to be demand for innovation in the realm of declarative config/deployment automation. There are now more than 120 tools: docs.google.com/spreadsheets/d…
Jul 22, 2019 16 tweets 5 min read
Kubernetes Borg/Omega history topic 14: Computational Quality of Service (QoS) and oversubscription. What are they, why would you want them, and how is QoS different than priority? On the last point, it's distinguishing importance and urgency. QoS is something that wouldn't matter if just one process were running per host system or if all the processes steadily used a constant amount of cpu, memory, and other resources. Because they vary, reserving max capacity needed for each would leave systems poorly utilized
Jul 1, 2019 14 tweets 4 min read
Kubernetes Borg/Omega history topic 13: Priority and preemption. Some work is more important and/or urgent than other work. Borg represented this as an integer value: priority. A higher value meant a task was more important than a lower value, and should be able to displace it. When choosing a machine for a task, the scheduler ignored lower-priority tasks for determining whether/where a task would fit, but considered the number of tasks that would have to be preempted as part of the ranking function for choosing the best machine.
Jun 17, 2019 9 tweets 4 min read
Kubernetes Borg/Omega history topic 12: A follow-on to the PodDisruptionBudget topic: the descheduler (github.com/kubernetes-inc…). Descheduler is more appropriate than the original term "rescheduler", because its job is to decide which pods to kill, not to replace or schedule them In Kubernetes, when running on a cloud provider such as in GKE, in the case of pending pods with no existing available space to be placed, either cluster autoscaling or even node autoprovisioning (github.com/kubernetes/aut…, cloud.google.com/kubernetes-eng…) can create new nodes for them
May 31, 2019 14 tweets 6 min read
Kubernetes Borg/Omega history topic 11: PodDisruptionBudget. Google constantly performs software and hardware maintenance in its datacenters: firmware updates, kernel and image updates, disk repairs, switch updates, battery tests, etc. etc. More and more kinds over time. Even though Borg tasks are designed to be resilient, this could get pretty disruptive. Rate-limiting maintenance tasks independently isn't efficient if you have dozens of them, and it's not always feasible to perform all types of maintenance at the same time.
May 22, 2019 17 tweets 7 min read
Kubernetes Borg/Omega history topic 10: In honor of #KubeConEU and the 5th anniversary of open-sourcing Kubernetes, I’ll add more perspective from the Borg and Omega teams to the origin story Internally, Google puts a lot of emphasis on both resource efficiency and engineering efficiency. For both reasons, back in June 2013, a few months before GCE was ready to GA (cloudplatform.googleblog.com/2013/12/google…), the Borg and GCE teams started to work more closely to improve both
May 8, 2019 9 tweets 3 min read
Kubernetes Borg/Omega history topic 9: Scheduling constraints. I have volumes more to write about configuration, but will move on with history topics for now. Borg's set of constraints grew organically over time. It started with just required memory, before multicore and NPTL Other resources were added: cpu, disks. Hard and soft constraints on key/value machine attributes, and “attribute limits” to limit the number of tasks per failure domain. Automatically injected anti-constraints were used to implement dedicated machines
May 1, 2019 20 tweets 7 min read
Kubernetes Borg/Omega history topic 8: Declarative configuration and Apply. Inside Google, the most used configuration approach for Borg is the Turing-complete Borg Configuration Language (BCL). You can see a snippet of BCL on slide 7 in this deck: inf.ed.ac.uk/teaching/cours… Millions of lines of BCL have been written. A fair amount of BCL was devoted to configuring application command-line flags, which was the most common way to figure server binaries, which is crazy IMO, but the practice sadly carried over to Kubernetes components
Apr 24, 2019 15 tweets 4 min read
Kubernetes Borg/Omega history topic 7: The Kubernetes Resource Model: why we (eventually) made it uniform and declarative. A topic even deeper than watch. More details can be found here: github.com/kubernetes/com… Like most internal Google services, Borgmaster had an imperative, unversioned, monolithic RPC API built using the precursor to grpc.io, Stubby. It exposed an ad hoc collection of operations, like CreateJob, LookupPackage, StartAllocUpdate, and SetMachineAttributes
Apr 16, 2019 11 tweets 3 min read
Kubernetes Borg/Omega history topic 6: Watch. This is a deep topic. It's a follow-up to the controller topic. I realized that I forgot to link to the doc about Kubernetes controllers: github.com/kubernetes/com… Borgmaster had 2 models: built-in logic used synchronous edge-triggered state machines, while external components were asynchronous and level-based. More on level vs. edge triggering: hackernoon.com/level-triggeri…
Mar 29, 2019 7 tweets 2 min read
Kubernetes Borg/Omega history topic 5: Asynchronous controllers. Borgmaster had synchronous, transactional, edge-triggered state machines. We had challenges scaling, evolving, and extending them. High-cardinality resource instances could exceed what could be done in a single transaction. Addition of new states broke clients. Unobserved changes could cause unexpected state transitions. Adding new resource types was hard, and would have had to be added to monolithic files.
Mar 22, 2019 18 tweets 4 min read
Kubernetes Borg/Omega history topic 4: Workload controllers. Before I get into the history, some conceptual background may be useful, since the underpinnings come up in many Cloud Native contexts. The key is to explicitly model state such that it can be externally manipulated. Around the time Kubernetes was open sourced, there were a number of libraries and tools created to start containers on multiple machines. The original libswarm was one. The problem with such imperative, client-side approaches was that it was hard to add automation.
Mar 15, 2019 5 tweets 2 min read
Kubernetes Borg/Omega history topic 3: Annotations. Borg's Job type had a single notes field. Like the DNS TXT record, that proved insufficient. For example, layers of client libraries and tools wanted to attach additional information. Some users found other creative places to carry information, such as scheduling preferences, in which arbitrary key/value strings could be stored. Arbitrary protobuf extensions were eventually supported.
Mar 9, 2019 7 tweets 2 min read
Kubernetes Borg/Omega history topic 2: Borg had Machine key/value attributes that could be used in scheduling constraints. Borgmon had target labels to convey application topology, environment, and locale. But Jobs themselves didn't originally have k/v labels. So Borg users would embed attribute values in Job names, separated by dots and dashes, up to 180 characters long, and then parse them out in other systems and tools using complex regular expressions.
Mar 3, 2019 6 tweets 1 min read
Kubernetes Borg/Omega history topic 1: In Borg, Job Tasks were scheduled into Alloc instances, but almost everyone pinned groups of tasks into each instance. Often these were sidecars, such as for logging or caching. It was clear that using such groups as the explicit primitive would be simpler. We called these "Scheduling Units". They were prototyped in Borg, but it was too hard to introduce new concepts. They became "SUnits" in Omega, and then Pods, as in a pod of peas or of whales, in K8s