Cool post from Gitlab on what they've learned 1y into their Kubernetes migration: about.gitlab.com/blog/2020/09/1…
As a Kube noob who's been cut by a few sharp edges, this type of battle report was super useful to me :) Some stuff I learned:
Their backend is a monolith but they route different collections of endpoints to different nodepools—this is a clever way I'd never thought about to limit the blast radius of performance issues (not Kube specific either, and may be a common practice I'd just never heard of!)
GKE regional clusters incur big bandwidth charges for cross-AZ traffic; you can avoid by using multiple zonal clusters
TBH it doesn't look *that* awful from the chart—the egress it shows costs <=5k/mo and I'd guess Git storage is near-pathological for this—but useful warning
They, like me, got resource requests vs limits wrong to start with.
After thinking for a bit, I came to the conclusion that (request != limit) is probably a bad idea for production traffic bc it leads to pods getting an unpredictable amount of resources that can change...
depending on where the pod is scheduled and who its "neighbors" are. That unpredictability makes it hard to size workloads well.
But I couldn't find anyone else giving this advice; lots of blog posts explain what requests/limits *do*, but few cover good usage patterns!
Autoscaling + slow pod startup times + spiky workloads (+ overcommitted CPU?) = 😢
We also have slow startup times (fortunately <2m) so another useful warning!
(I think "reserved pod capacity" = using low-priority "pause" pods as described here: replex.io/blog/kubernete…)
btw, I would love other pointers to good posts about lessons learned using Kube in anger! Google seems to be inundated with tutorials, or maybe I just had bad keywords
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
