Tweet

Daniel Bryant

Mar 4 • 19 tweets • 7 min read

@rawkode

In preparation for fixing broken Kubernetes clusters live on @rawkode's #Klustered event, I reminded myself of some of the core K8s debugging commands and techniques

Here are my top 10 tips for platform engineers debugging Kubernetes and the machinery underneath the covers 🧵 👇

First off, use kubectl to take a look at the cluster infra:

$ kubectl get nodes
$ kubectl cluster-info dump

These commands typically give you a good idea of where to start debugging, e.g. broken nodes, infra issues, resources

If kubectl doesn't work, it's time for some Linux debugging! SSH to the control plane node and run:

$ ps aux | grep kube

This will show you if core components such as the kublet, kube-apiserver, kube-scheduler are up and running

@Didicodes

If you're not sure about the Kubernetes control plane architecture, my awesome colleague, @Didicodes, has you covered: blog.getambassador.io/ckad-tips-kube…

Checking out the kubelet logs will also generally provide pointers to underlying infra issues e.g. on a machine with systemd:

$ journalctl -xeu kubelet.service

In addition, you can typically find the logs to core components under /var/log/kube-apiserver.log or with kubeadm initialized clusters /var/log/containers/kube-...

Tail or less is your friend here:

$ tail /var/log/kube-apiserver.log
$ less /var/containers/kube-apiserver-...log

@containerd

The @containerd command-line tool can also be useful to see if the control plane containers are up and running e.g.

$ ctr -n k8s.io containers ls

This blog post by @iximiuz is super useful: iximiuz.com/en/posts/conta…

Once you've got the control plane up and running, I like to look at everything running in the cluster (if my cluster workloads are small enough to not overwhelm my terminal). Look for CrashLoopBacks or pods not starting:

$ kubectl get pods -A

If you find anything, then it's time to go spelunking in the associated Pod or Deployment config and/or logs.

$ kubectl describe pod my-pod -n my-namespace
$ kubectl logs my-pod -c my-container
$ kubectl describe deploy my-deploy -n my-namespace

Getting a list of events can also be useful (and you will have seen some events with using the describe commands above)

$ kubectl get events --sort-by=.metadata.creationTimestamp

Look for obvious issues (image missing, resource issues e.g. OOM kills, network connectivity)

If you have access to the original manifests, the diff command can be super useful to see if someone has been tampering with your config in the cluster (accidentally or otherwise)

$ kubectl diff -f ./my-manifest.yaml

I could write an entire other thread on debugging networking and storage issues, but I'll save this for another day

As for general reference, the Kubernetes docs have some very useful additional guidelines for debugging cluster, too: kubernetes.io/docs/tasks/deb…

And I found Ioannis Moustakis's Certified Kubernetes Administrator (CKA) exam cheatsheet super useful for commands (even if you're not studying for the exam!): faun.pub/cka-kubernetes…

@kelseyhightower

A lot of my Kubernetes infrastructure knowledge comes from following @kelseyhightower's "Kubernetes The Hard Way" github.com/kelseyhightowe…

I learned the basics of K8s here when it was first created back in 2016, and also used it as a teaching tool when working as a consultant:

https://twitter.com/danielbryantuk/status/1492893332850237444

If you are an app developer looking for guidance on debugging apps running in K8s, check out my other thread:

https://twitter.com/danielbryantuk/status/1492893332850237444

@Didicodes

And if you're studying for the CKAD exam, @Didicodes has created this series for you: blog.getambassador.io/ckad-cka-exam-…

I hope this list was helpful to you! If it is, let me know! And please share your tips, too! 🙏

@Didicodes

And I almost forgot! Here is the link to the YouTube recording of #Klustered live stream where @Didicodes, @alex_gervais, and me take on the @fairwindsops team

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Daniel Bryant

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @danielbryantuk

Daniel Bryant

Daniel Bryant

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?