Next up: Wojciech Tyczyński of SIG Scalability on the necessary k8s and etcd changes to scale to clusters with 15k+ nodes.

First off, scalability isn't really number of nodes, but a lot of factors that may scale with cluster size.

#KubeCon #CloudNativeCon Image
Starting with real life usecases: @TwitterEng currently running on mesos clusters, which can handle 40-60k nodes in a single cluster

But scalability work affects everyone! This makes all clusters more reliable and performant.

#KubeCon #CloudNativeCon
You can't talk about K8s scalability without etcd. Minimized the number of blocking read/write operations. This can happen if you have a lot of custom resources in a smaller cluster too (now available in @etcdio 3.4)

#KubeCon #CloudNativeCon Image
Another bottleneck is apimachinery and how it bookmarks requests, speeds up issues with latency (GA in @kubernetesio 1.17)

#KubeCon #CloudNativeCon Image
Networking with a medium-sized node cluster, apiserver has to send a lot of data due to Endpoints storing data per pod, quickly becomes an issue for deploys of larger services. EndpointSlices introduced to mitigate this (Beta in @kubernetesio 1.19)

#KubeCon #CloudNativeCon Image
Next up, Storage: Immutability to secret and confimap API so that kubelet does not need to watch for changes, reduces load on control plane (also @kubernetesio 1.19)

Golang memory allocation was also a bottleneck (!!!), benefits the whole ecosystem

#KubeCon #CloudNativeCon
Very important slide, K8s scalability is not isolated. The rest of your infrastructure needs to scale too (Note: I will also emphasis network integration!!!)

#KubeCon #CloudNativeCon Image
By the way, some of the K8s limits/thresholds are documented here: github.com/kubernetes/com…

#KubeCon #CloudNativeCon
Next up: SIG App-Delivery presents how to solve everyday problems with the landscape (CI/CD, image build/app def, scheduling/orchestration) @resouer @AloisReitbauer #CloudNativeCon #KubeCon Image
BTW I'm also keeping an eye on the Kata Containers Performance on ARM talk! I'm very excited by the recent developments in lightweight VMs like @katacontainers as well as ARM arriving in the cloud via Graviton at @awscloud

#CloudNativeCon #KubeCon Image
I love interactive demos like podtato-head (github.com/cncf/podtato-h…), because I am a kinesthetic learner-- so staring at slides does not work well for me! You can also get a feel for the different UX of each technology

@resouer @AloisReitbauer
#CloudNativeCon #KubeCon
"What if I want Heroku but on K8s?"

KubeVela is the recommendation, using Open application Model (OAM) with developer-centric primitives, but is also highly extensible. These are the kind of abstractions I like!

@resouer @AloisReitbauer
#CloudNativeCon #KubeCon Image
Next, evaluating more complex use cases (where all the challenges are!) Stateful workloads, databases, external dependencies, etc, all exist in the real world. Maybe I should check out SIG App-Delivery? 🤔

@resouer @AloisReitbauer
#CloudNativeCon #KubeCon Image
@lbernail gives more stories on K8s scalability issues and complex problems with "How the OOM-killer Deleted my Namespace", this is gonna be fun :)

#CloudNativeCon #KubeCon Image
`helm upgrade` remove a resource from chart deletes it from the cluster, but why did it takes 4 days? 🤔

#CloudNativeCon #KubeCon @lbernail Image
If the Namespace controller can't discover all resources in the cluster, it does nothing.

Controller-manager throws an error: cannot find metrics-server

Metrics-server was OOM-killed, got restarted, then starts working again.
#CloudNativeCon #KubeCon @lbernail Image
But why was metrics-server failing in the first place?

Metrics-server deployed on cluster with old and new nodes, issue with image change. Scheduled onto "bad" node.

#CloudNativeCon #KubeCon @lbernail ImageImage
Next issue: Rolling out a new image, and Nodes became NotReady after a few days. Weird because not related to kubelet or runtime.

Why? `describe nodes` shows container runtime is down, but containerd is fine.

After looking at the kubelet logs, they find an error
@lbernail Image
Noticed that there is lock logic in CNI and it is hanging.

Take a goroutine dump to find blocked goroutines.

Able to reproduce the issue with a blocked Delete command and tracked it to a bug in the netlink library used by the Lyft CNI plugin (wow, so low level!)

@lbernail ImageImageImage
How a single app can overwhelm the control plane (this one sounds familiar to me...)

Starts with Apiservers unhealthy and cannot reach etcd, etcd getting OOM-killed. There was a spike in requests, esp. list calls (very expensive)

@lbernail #KubeCon #CloudNativeCon ImageImageImage
List calls when there is NO resourceVersion bypasses the cache and reads directly against etcd (o no...)

Setting RV=0 (much faster, filters on cache) vs RV=""

Note: Informers are much better than List b/c of cache behavior

@lbernail #KubeCon #CloudNativeCon ImageImage
Wow, maybe `kubectl get` should allow setting RV=0? 🤔

BACK TO THE INCIDENT: WHODUNNIT?

K8s audit logs: a single user is responsible: nodegroup controller, with a recent upgrade that checks pods for deletion protection (with LIST...) 🙃

@lbernail #KubeCon #CloudNativeCon Image
Learnings:
- apiservice extensions are sus
- weird low-level issues are sus
- large clusters are sus

@lbernail #KubeCon #CloudNativeCon Image
I wasn't able to watch @robertjscott's talk on AZ-Aware routing but read the slides; this is Very Important if you are running in multiple zones/regions!

1. Standardized topology labels
2. Topology Aware Routing
3. Performance and Scalability (via EndpointSlice)

#KubeCon Image
Next talk: 10 More Weird Ways to Blow Up Your Kubernetes @AirbnbEng with Jian Cheung and Joseph Kim

More on the theme of learning from incidents :)

First issue: 56k replicasets in one namespace

We were testing a new feature (related to zone topology)

#KubeCon ImageImage
Mutating time bomb:

Something weird going on HPA and replicasets

Back to initial error: patch bug with mutating admission controller

#KubeCon ImageImage
Zero to Autoscaling AKA Autoscaling Edges Cases:

When migrating to HPA, you can scale down to zero:
- when current replicas is 0, 1
- when migrating deployment logic (e.g. to Spinnaker)
- when using alternative "deploy" methods (e.g. kubectl scale)

#KubeCon #CloudNativeCon ImageImageImage
K8s Master Issues:
- Do NOT drain the masters, unless you want to play 52 card pickup
- Running out of memory -> alerts firing everywhere -> everyone ssh loses access to nodes (o no...)

A single large service deploy that is CrashLooping (esp. with high maxSurge) is problematic
Meanwhile, keeping an eye on KubeVela, this looks really legit. Has anyone used it?

#KubeCon #CloudNativeCon Image
Next Blow up: kube2iam blows us up, again

AWS Auth & Kubernetes Do Not Always Play Nice

#KubeCon #CloudNativeCon Image
Even if you use all the CPU cores on a machine, you can still get throttled 🙃

Doesn't affect performance too much, but we learned a lot about throttled periods and what to actually measure.

#KubeCon #CloudNativeCon Image
One of my favorite outages: JVM Garbage collection behavior & CPU throttling leads to catastrophic OOMs

Lots of learnings from JVM tuning, adjusting CPU throttling behavior/metrics, to adjusting backpressure (esp. in service proxy / envoy layer)

#KubeCon #CloudNativeCon
Summary slide!

#KubeCon #CloudNativeCon
Also @jiancheung is on twitter now :) Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Melanie Cebula 🎤 #QConLondon

Melanie Cebula 🎤 #QConLondon Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @MelanieCebula

18 Nov
I'll be live tweeting #KubeCon #CloudNativeCon, and taking this year off for speaking. Excited to sit back and actually watch the content 😊. What talks are you attending?
@Lemonjet here to give a keynote on K8s @ Apple. They have MASSIVE data center scale. Looked to K8s for the pluggability, extensibility, and ecosystem. Unsurprisingly, they had to consider the learning curve and platform support to drive adoption. #kubecon #CloudNativeCon
Apple started by breaking down different users and workloads. Application developers, SRE (Note: easy to forget that infra teams are also your customers!), hardware, machine learning / batch, and finance / payments jobs. #kubecon #CloudNativeCon @Lemonjet
Read 38 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!