Node Hardening

This page documents the steps that have been taken on the kube360 cluster to harden the nodes.

No Public access

To verify that internal services are not exposed externally, you can check the active listening ports on the nodes:

ubuntu@replica-1:~$ sudo netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 192.168.1.12:6444       0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:2382          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:2379       0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:2381       0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:2380       0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      1864383/systemd-res
tcp        0      0 127.0.0.1:6445          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:6444          0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:10010         0.0.0.0:*               LISTEN      2826561/containerd
tcp        0      0 127.0.0.1:10256         0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:10258         0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:10259      0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:10257      0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 192.168.1.12:10250      0.0.0.0:*               LISTEN      2826533/k3s server
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1/systemd
tcp        0      0 127.0.0.54:53           0.0.0.0:*               LISTEN      1864383/systemd-res
tcp        0      0 192.168.1.200:6443      0.0.0.0:*               LISTEN      2458184/haproxy
tcp6       0      0 ::1:6444                :::*                    LISTEN      2826533/k3s server
tcp6       0      0 :::22                   :::*                    LISTEN      1/systemd

Important: As seen in the output above, with the exception of SSH (port 22), no other port should be exposed to the public Internet. All other services must bind to local interfaces (127.0.0.1) or private network IPs (e.g., 192.168.1.12).

Hardening SSH Access

1. Disable SSH Access via Password

First, open the SSH daemon configuration file for editing:

sudo env TERM=xterm nano /etc/ssh/sshd_config

Apply the following changes to ensure only public key authentication is allowed and password authentication is disabled:

# Make sure public key authentication is enabled
PubkeyAuthentication yes

# This is the main setting to disable passwords
PasswordAuthentication no

2. Restart the SSH Service

After saving the configuration changes, restart the SSH service to apply them:

sudo systemctl restart ssh

You must ensure that appropriate system updates (like kernel updates and OS patches) are applied periodically. These updates often require a node reboot. Periodic reboots are recommended best practices: treat your nodes as ephemeral.

Note: For Kube360, we do not use automated tools like kured because we require more manual control over the reboot process, particularly for Postgres data plane nodes.

Checking if a Reboot is Required

To see if a node requires a reboot, SSH into the node and run the following command:

ubuntu@ns5019222:~$ cat /var/run/reboot-required
*** System restart required ***

If the file does not exist, no reboot is currently required.

Upgrade Procedure

The upgrade process differs slightly depending on whether the node is part of the control plane or the data plane. Always upgrade nodes one by one. Start with control plane nodes and then move to data plane.

Control Plane Nodes

Verify if a reboot is needed: Check /var/run/reboot-required as shown above. If not required, you may skip the reboot step.

Cordon and drain the node (from your local machine):

kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Install security and OS updates (via SSH on the node):
```
sudo apt-get update
sudo apt-get upgrade -y
```
Reboot the node (via SSH on the node):
```
sudo reboot
```
Uncordon the node (from your local machine): To see when the node is back live, you can use hwatch nc -vz <node-ip> 22 (e.g., hwatch nc -vz 51.222.105.111 22) from your local machine to check when the SSH port becomes accessible again.

Once the node is up, wait for it to return to the Ready state, then uncordon it:
```
kubectl get nodes -w
kubectl uncordon <node-name>
```

Data Plane Nodes

Data plane nodes follow the exact same procedure as control plane nodes, but with an important prerequisite for database nodes.

Prerequisite for CloudNativePG nodes: Before draining a node that hosts Postgres, you must modify the Postgres cluster resources to disable Pod Disruption Budgets (PDB) and place them in node maintenance mode.

Update the cluster specification for all your application databases, as well as Kube360's internal k3dash database, to include:

enablePDB: false

Once this prerequisite is met, follow steps 1-5 from the Control Plane Nodes section above. After the node is successfully upgraded and uncordoned, remember to re-enable the PDBs by setting enablePDB: true.

FP Complete Internal Engineering Docs