When I started using containers back in 2015, I thought they were tiny virtual machines with a subsecond startup time.
It was easy to follow tutorials from the Internet on how to put your Python or Node.js app into a container...
But thinking of containers as of VMs is an extremely leaking abstraction. It doesn't allow you to judge:
- what's doable and what's not
- what's idiomatic and what's not
- what's safe enough and what's not
So, I started looking for the Docker implementation details.
Docker is a behemoth doing many different things. There is plenty of materials on Docker but:
- it's either shallow and introductory container tutorials
- it's something deeply technical and hard to follow for a newbie
So, it took me a while to structure things in my head.
I find the following learning order helpful:
1. Containers (low-level Linux impl) 2. Images (why do you need them) 3. Managers (many containers, one host) 4. Orchestrators (many hosts, one app)
1. Containers are not VMs.
A container is an isolated (namespaces) and restricted (cgroups, capabilities, seccomp) process.
To start a process you need to fork/exec an executable. But to start a containerized process, you need to prepare a box first - create namespaces, configure cgroups, etc.
Container Runtime is a special kind of (lower-level) software to run containers.
- a JSON file with container params (path to executable, env vars, etc)
- a folder with an executable and other files (if any) to put into a container fs
Oftentimes, a bundle contains files that closely resemble a typical Linux distribution (/var, /usr, /lib, /etc, ...)
When runc launches a container with such a bundle, the process inside sees its filesystem as a typical OS.
But there is a dedicated piece of software for each such task individually.
Checkout
- podman
- buildah
- skopeo
- kaniko
- etc
4. Coordinating containers running on different hosts is hard.
Remember Docker Swarm? Docker was already quite monstrous when the multi-host container orchestration was added into the same daemon.
One more Docker's responsibility...
Omitting the issue with the bloated daemon, Docker Swarm seemed nice.
But another orchestrator won the competition. Kubernetes!
Docker Swarm is either obsolete or in maintenance mode since ~2020.
Kubernetes joins multiple servers (nodes) into a coherent cluster.
Every such node has a container manager on it. It used to be dockerd, but it's deprecated now. containerd and cri-o are two popular choices of slimmer container managers nowadays.
There is a lot of tasks for the container orchestrator.
How to group containers into higher-level primitives (pods)?
How to interconnect nodes with running containers into a common network?
Grasping Kubernetes Pods, Deployments, and Services 🧵
...through the lens of "old school" Virtual Machines.
Before the rise of Cloud Native:
- A VM was a typical deployment unit (a box)
- A group of VMs would form a service
- Everyone would build their own Service Discovery
Then, Docker containers showed up.
A container attempted to become a new deployment unit...
However, Docker's restriction of having a single process per container was too limiting. Many apps weren't built that way, and people needed more VM-ish boxes.
Kubernetes got the deployment unit right.
In Kubernetes, a minimal runnable thing is a Pod - a group of semi-fused containers.
Now, you can run (and scale!) the main app and its satellite daemons (sidecars) as a single unit.
Docker relies on containerd, a lower-level container runtime, to run its containers. It is possible to use containerd from the command line directly, but the UX might be quite rough at times.
1. Network namespaces - a Linux facility to virtualize network stacks.
Every container gets its own isolated network stack with (virtual) network devices, a dedicated routing table, a scratch set of iptables rules, and more.
2. Virtual Ethernet Devices (veth) - a means to interconnect network namespaces.
Container's network interfaces are invisible from the host - the latter runs in its own (root) network namespace.
To punch through a network namespace, a special Virtual Ethernet Pair can be used.
3. The need for a (virtual) switch device.
When multiple containers run in the same IP network, leaving the host ends of the veth devices dangling in the root namespaces will make the routes clash. So, you won't be able to reach (some of) the containers.
What is Service Discovery - in general, and in Kubernetes 🧵
Services (in Kubernetes or not) tend to run in multiple instances (containers, pods, VMs). But from the client's standpoint, a service is usually just a single address.
How is this single point of entry achieved?
1⃣ Server-Side Service Discovery
A single load balancer, a.k.a reverse proxy in front of the service's instances, is a common way to solve the Service Discovery problem.
It can be just one Nginx (or HAProxy) or a group of machines sharing the same address 👇
2⃣ Client-Side Service Discovery
The centralized LB layer is relatively easy to provision, but it can become a bottleneck and a single point of failure.
An alternative solution is to distribute the rosters of service addresses to every client and let them pick an instance.