Tweet

David McKay

10 Jul, 16 tweets, 4 min read

@kubernetesio

After 16 episodes of #Klustered and 35 broken @kubernetesio clusters, here’s my top tips for anyone looking to sit their CKA, CKAD, or anyone operating Kubernetes.

If you don’t have any API server, it’s always one of three things.

1. You’ve not exported KUBECONFIG
2. Your KUBECONFIG has the wrong URL for the API server
3. Your static manifests, /etc/kubernetes/manifests, need fixed.

The first two are easy fixes and the third is a rabbit hole of potential problems. As we’re all working with Kubeadm clusters these days, you can expect the static pod directory to contain manifests for the API server, etcd, controller managers, and your scheduler.

Let’s assume etcd is running, because that’s another huge rabbit hole we’ll cover next week. You need to ensure your API server is happy so we can use kubectl. Your cluster can be pretty happy even without a controller manager and scheduler, so focus on them last.

Getting your API server running is mostly checking for obvious flags within the manifest and ensuring port numbers are correct. Remember that all the logs for containers exist in /var/log/containers, including static pods.

Tip: seeing “port: 0” is pretty common and you can 9/10 times ignore them, though they will catch your eye when debugging.

Tip: Kubernetes loves to use “-“ for negation. This means “don’t run this controller” in these manifests or “don’t accept this authentication method” etc. Watch out for them!

Admission controllers can screw you other and they don’t always live within the cluster. Watch for static admission controllers in the API server manifests too.

Watching for these things can get you access to kubectl again and get you back to the tools you’re more comfortable working with.

Best of luck!

Got more tips? I’d love to hear them

I’ll cover containerd, controllers, and scheduling next week 😀

@CloudNativeFdn

Maybe we can convince the @CloudNativeFdn to scrap the CKA and instead issue certificates to people that fix a cluster on #klustered 😂

If you do have access to the API server, you’re kind of lucky; but now you’ve got a much larger vector of problems to debug.

A common attack vector on Klustered and very little a real production problem is quota and resource management.

Remember that your pods can fail to schedule because:

- Limit Ranges
- Quotas
- Resource Saturation
- No scheduler 😅

Use the events, Luke. ALWAYS find the relevant event by describing your resource and know which component is blocking your workload.

Resource saturation could be rogue pods or processes on the host, or your pods are requesting too much that’s going unused. Limit Ranges and quotas could be too strict. And if you’ve got no scheduler, try manually setting the node in the pod spec. This hack bypasses the scheduler

Oh, and remember this. If what you see during “kubectl get thing” isn’t the same as what you expected; it’s ALWAYS …

kubectl get mutatingwebhookconfigurations

It’s debugging gold when you finally realise something is modifying your manifests🏅

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

David McKay

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?