How to Safely Shut Down a Bare-Metal K8s Cluster
Unlike managed Kubernetes services (like EKS or GKE) where underlying infrastructure is abstracted, managing a bare-metal cluster means you are directly responsible for stateful workloads and hardware power states. Hard-powering off nodes can lead to corrupted etcd data or orphaned Persistent Volumes.
In our homelab setup, we use a "Two-Layer" graceful shutdown strategy automated via the safe-shutdown.sh script.
The Shutdown Strategy
The shutdown process operates in two distinct phases to ensure software state is saved before hardware state is altered.
Layer 1: Software Drain (kubectl)
Before turning off any machines, we must tell Kubernetes to safely evict running applications.
- Identify Workers: We query the Kubernetes API to list all nodes that are not the Control Plane.
- Cordon and Drain: For each worker node, we run: This commands the Kubelet to cleanly terminate pods and gives them time to finish processing requests, rather than just pulling the plug.
Layer 2: Hardware Power-Off (Ansible)
Once the cluster is safely drained, we issue hardware shutdown commands over SSH. Since we manage configuration via Ansible, we leverage its ad-hoc commands.
- Shutdown Workers: We target the
workersgroup in our Ansible inventory: - Shutdown Control Plane: We shut down the Control Plane node last. This ensures the Kubernetes API remains available to manage the worker drains up until the final moment.
Running the Process
To safely turn off the Homelab cluster, navigate to the homelab repository on your dev-station and run:
You will be prompted to confirm the action. Once confirmed, the script checks for your kubeconfig, safely drains the workers, and issues the shutdown commands. Wait a few moments for the physical machines to halt before unplugging power.