After 16 episodes of #Klustered and 35 broken @kubernetesio clusters, here’s my top tips for anyone looking to sit their CKA, CKAD, or anyone operating Kubernetes.
If you don’t have any API server, it’s always one of three things.

1. You’ve not exported KUBECONFIG
2. Your KUBECONFIG has the wrong URL for the API server
3. Your static manifests, /etc/kubernetes/manifests, need fixed.
The first two are easy fixes and the third is a rabbit hole of potential problems. As we’re all working with Kubeadm clusters these days, you can expect the static pod directory to contain manifests for the API server, etcd, controller managers, and your scheduler.
Let’s assume etcd is running, because that’s another huge rabbit hole we’ll cover next week. You need to ensure your API server is happy so we can use kubectl. Your cluster can be pretty happy even without a controller manager and scheduler, so focus on them last.
Getting your API server running is mostly checking for obvious flags within the manifest and ensuring port numbers are correct. Remember that all the logs for containers exist in /var/log/containers, including static pods.
Tip: seeing “port: 0” is pretty common and you can 9/10 times ignore them, though they will catch your eye when debugging.
Tip: Kubernetes loves to use “-“ for negation. This means “don’t run this controller” in these manifests or “don’t accept this authentication method” etc. Watch out for them!
Admission controllers can screw you other and they don’t always live within the cluster. Watch for static admission controllers in the API server manifests too.
Watching for these things can get you access to kubectl again and get you back to the tools you’re more comfortable working with.

Best of luck!
Got more tips? I’d love to hear them
I’ll cover containerd, controllers, and scheduling next week 😀
Maybe we can convince the @CloudNativeFdn to scrap the CKA and instead issue certificates to people that fix a cluster on #klustered 😂
If you do have access to the API server, you’re kind of lucky; but now you’ve got a much larger vector of problems to debug.

A common attack vector on Klustered and very little a real production problem is quota and resource management.
Remember that your pods can fail to schedule because:

- Limit Ranges
- Quotas
- Resource Saturation
- No scheduler 😅

Use the events, Luke. ALWAYS find the relevant event by describing your resource and know which component is blocking your workload.
Resource saturation could be rogue pods or processes on the host, or your pods are requesting too much that’s going unused. Limit Ranges and quotas could be too strict. And if you’ve got no scheduler, try manually setting the node in the pod spec. This hack bypasses the scheduler
Oh, and remember this. If what you see during “kubectl get thing” isn’t the same as what you expected; it’s ALWAYS …

kubectl get mutatingwebhookconfigurations

It’s debugging gold when you finally realise something is modifying your manifests🏅

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with David McKay

David McKay Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(