Kube360 Cluster Debugging

This page provides helpful commands and procedures for debugging and troubleshooting various components of your Kube360 cluster.

K3s Service Troubleshooting

When investigating issues with the K3s control plane or node, checking the systemd service is a good starting point.

Check Service Status

You can check if the K3s service is active and running:

systemctl status k3s.service

Example Output:

● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-02-25 04:42:07 UTC; 2 weeks 6 days ago
       Docs: https://k3s.io
...

View Service Configuration

To see how the service is configured and which environment files it loads:

cat /etc/systemd/system/k3s.service

Inspect Environment Variables

To check the specific environment variables passed to the K3s service (like K3S_CONFIG_FILE):

sudo cat /etc/systemd/system/k3s.service.env

Example Output:

K3S_CONFIG_FILE='/home/ubuntu/k3s/k3s/replica/config-replica-two.yaml'

View Service Logs

To continuously monitor or review the K3s systemd logs for errors:

journalctl -u k3s.service -f

Network Troubleshooting

Testing API Server Connectivity

If a node is having trouble joining or communicating with the cluster, SSH into the node and verify that it can reach the Kubernetes API on port 6444:

nc -vz 192.168.1.10 6444
nc -vz 192.168.1.11 6444
nc -vz 192.168.1.12 6444

Example Output:

Connection to 192.168.1.10 6444 port [tcp/sge-qmaster] succeeded!

Cleaning Up the Flannel Network

Sometimes the Flannel network can become corrupted or misconfigured, commonly when you set a new name for a node. Since K3s comes with a built-in Flannel CNI, you can reset its state manually.

Follow these steps to clean up the Flannel network:

1. Stop the K3s service

sudo systemctl stop k3s

2. Remove stale network state

sudo ip link delete cni0
sudo ip link delete flannel.1
sudo rm -rf /var/lib/cni/

3. Delete the old node from the cluster

(Run this from a node where kubectl is still functioning)

kubectl delete node <old-node-name>

4. Start the K3s service back up

sudo systemctl start k3s

Warning regarding Pod Networks: When running ip a, you may notice multiple veth... interfaces. These belong to pods currently running on the node. When you delete the cni0 interface in Step 2, those specific pods will lose their network connection.

After starting K3s back up with the new node name, you will need to manually delete the affected pods (e.g., CoreDNS, Longhorn) so Kubernetes can recreate them and attach them to the new cni0 interface.

5. Restart services on other nodes

Finally, go to the other nodes in your cluster and restart their respective K3s services to ensure network routes are fully refreshed:

sudo systemctl restart k3s
# OR for agent nodes:
sudo systemctl restart k3s-agent

FP Complete Internal Engineering Docs