Kubernetes
Installation Process
Note on Infrastructure as Code: The steps below originally mapped to a bash script (
scripts/install-k8s.sh). They have since been migrated to a declarative Ansible playbook (ansible/playbooks/02-install-k8s.yaml). The underlying theory remains exactly the same.
Here is a detailed breakdown of exactly what must happen to install Kubernetes components on Debian (homelab environment) from scratch:
1. Root User Check
Why: Installing packages and adding system-level repositories requires administrator privileges. This check ensures the script doesn't fail halfway through because it wasn't run with sudo.
2. Bypassing Debian 13 Signature Policy (Sequoia v3)
The Root Cause: Debian 13 ships with Sequoia, a modern Rust-based OpenPGP implementation, as its default signature verifier (via sqv). In early 2026, Sequoia enforced a long-announced deprecation: OpenPGP v3 signature packets are no longer accepted as of 2026-02-01T00:00:00Z. The Kubernetes apt repository still signs its InRelease files with a v3 signature packet (the older, pre-RFC 4880 format), causing Sequoia to hard-reject it.
if command -v sqv &>/dev/null; then
if [ ! -f /usr/bin/sqv.real ]; then
mv /usr/bin/sqv /usr/bin/sqv.real
fi
cat > /usr/bin/sqv <<'EOF'
#!/usr/bin/env bash
exec /usr/bin/sqv.real --policy-as-of 2025-01-01T00:00:00Z "$@"
EOF
chmod +x /usr/bin/sqv
fi
trap '[ -f /usr/bin/sqv.real ] && mv /usr/bin/sqv.real /usr/bin/sqv' EXIT
How this wrapper fixes the issue:
command -v sqvchecks whethersqvis present (more portable than checking a config file path).- The real binary is renamed to
sqv.real(only on the first run, to avoid double-renaming on reruns). - A shell wrapper is written in its place. It prepends
--policy-as-of 2025-01-01T00:00:00Zto every invocation, which tellssqvto evaluate the policy as of a pre-deprecation date, and forwards all original arguments with"$@". execreplaces the shell process withsqv.realdirectly (no subshell overhead, clean process table).- The
trap ... EXITruns on any exit (success, failure, orCtrl+C). This ensures the realsqvbinary is always restored to its original state so the system isn't left with a patched binary after the script finishes.
(Note: We use this wrapper because apt doesn't support passing custom flags to sqv directly, and we don't want to disable security entirely by falling back to [trusted=yes].)
3. Installing Prerequisites
Why: Out of the box, Debian's package manager (apt) might not be fully equipped to download packages securely over HTTPS or to verify custom digital signatures.
apt-transport-httpsandca-certificatesallowaptto securely connect to the Kubernetes servers.curlis used to download the security keys.gpgis used to process those security keys.
4. Adding the Official Kubernetes Repository
mkdir -p /etc/apt/keyrings
rm -f /etc/apt/keyrings/kubernetes-apt-keyring.gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg --yes
chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /" | tee /etc/apt/sources.list.d/kubernetes.list > /dev/null
Why: Debian's default software repositories do not include Kubernetes. We have to tell Debian exactly where to download it from.
- First, we download the GPG signing key from Google/Kubernetes. This ensures that the packages we download haven't been tampered with by a malicious third party. We store it in
/etc/apt/keyrings/which is the modern secure location. - Second, we add the actual URL for the
v1.31repository into Debian's list of software sources (sources.list.d) and pin it strictly to the downloaded key usingsigned-by=.
5. Installing the Core Components
Why: This installs the holy trinity of Kubernetes cluster building:
kubelet: This is the primary "node agent" that runs on every single machine in the cluster. It talks to your container runtime (containerdfrom Phase 1) and makes sure your containers are actually running.kubeadm: This is the "bootstrap" tool. You will use this to runkubeadm initon the ROG (to create the cluster) andkubeadm joinon the Dell (to connect it to the ROG).kubectl: This is the command-line interface. It's how you talk to the cluster once it's built to tell it to deploy applications, check logs, etc.
6. Pinning the Package Versions (Extremely Important)
Why: If you ever run apt-get upgrade on your servers in the future, Debian will automatically upgrade all installed software. You do not want Debian to automatically upgrade Kubernetes.
Upgrading a Kubernetes cluster must be done deliberately and carefully (one node at a time). If apt upgraded kubelet randomly in the background, it could break your cluster. apt-mark hold tells Debian: "Never upgrade these three packages unless I explicitly remove this hold."
7. Enabling the Kubelet Service
Why: This tells systemd (Debian's service manager) to ensure that the kubelet process starts automatically every time the server reboots. (Note: The kubelet will actually crash loop right now if you check its status, which is normal—it's waiting for you to run kubeadm to tell it what to do!).
8. Verification
Once the script completes, you can verify that the client components were installed correctly:
Why: This confirms that kubectl is installed and in your system's PATH. (Note: It only checks the client version right now because the cluster control plane hasn't been initialized yet.)
Cluster Bootstrap (Homelab - Phase 2)
After installing the core components on the nodes, the next step is to initialize the Control Plane and prepare the cluster for workloads.
1. The kube-vip "Chicken and Egg" Problem
We use kube-vip to provide a highly-available Virtual IP (VIP) for the Kubernetes API server (192.168.1.50). However, starting in Kubernetes 1.29+, stricter RBAC rules create a deadlock when deploying kube-vip as a static pod:
kubeadm initneeds to talk to the VIP to initialize the cluster.kube-vipneedskubeadmto finish so the RBAC super-admin rules exist before it can bind the VIP via leader election.
The Solution (init-control-plane.sh):
We solve this by manually binding the VIP to the network interface before running kubeadm init.
# 1. Manually add the VIP so kubeadm can reach the API server locally during bootstrap
ip addr add "192.168.1.50/32" dev "enp4s0" || true
# 2. Initialize cluster with kubeadm using the VIP
kubeadm init --control-plane-endpoint "192.168.1.50:6443" --upload-certs --pod-network-cidr "10.244.0.0/16"
# 3. Generate the kube-vip static pod manifest AFTER kubeadm init
ctr run --rm --net-host "ghcr.io/kube-vip/kube-vip:v0.8.0" vip /kube-vip manifest pod \
--interface "enp4s0" \
--address "192.168.1.50" \
--controlplane --services --arp --leaderElection > /etc/kubernetes/manifests/kube-vip.yaml
Once the static pod starts, kube-vip takes over management of the VIP automatically.
2. Node Labeling (label-nodes.sh)
Nodes should be semantically labeled so workloads can be scheduled intelligently (e.g., ensuring a database pod only runs on a node with an SSD).
The label-nodes.sh script applies labels using the --overwrite flag. This makes the script fully idempotent, meaning it can be run multiple times safely without throwing an error if the label already exists.
kubectl label node k8s-worker-01 node-role.kubernetes.io/worker=worker --overwrite
kubectl label node k8s-worker-01 disk=ssd --overwrite
3. Remote Management
You should rarely run kubectl directly from the cluster nodes. Instead, manage the cluster from your admin workstation (e.g., a MacBook).
- Ensure
kubectlis installed on your workstation (e.g., viabrew install kubectl). - Copy the
admin.conffrom the Control Plane to your local machine:
- Run
kubectl get nodesfrom your workstation to verify connectivity to the Virtual IP.
4. Storage Prerequisites (Longhorn)
When deploying Longhorn for persistent storage, make sure all participating nodes are using SSDs. Longhorn on spinning disks is technically possible but practically painful. There should be no throwaway steps or "migration later".
Before committing to the installation, verify your disk types on each node (see how to check if a disk is an SSD):
- Worker-01 (Dell): SSD only — good.
- CP-01 (ROG): Two disks (e.g.,
sdais SSD,sdbis HDD). Likelysdais the OS drive andsdbis a secondary HDD probably for bulk storage (media, etc.). Make sure Longhorn is configured to use the SSD (sda) path on the ROG, not the HDD. You can tell Longhorn which path to use per node when you set it up.
Cluster Baseline (Homelab - Phase 3)
1. The CNI Path Mismatch (Flannel vs Containerd)
When installing a Container Network Interface (CNI) like Flannel on a Debian system, you may encounter an issue where pods become permanently stuck in the ContainerCreating state.
If you run kubectl describe pod <name>, you will see a Sandbox error from the kubelet:
failed to find plugin "flannel" in path [/usr/lib/cni]
The Root Cause: There is a strict path mismatch between the OS package manager and the upstream project.
- Debian's
containerdpackage is compiled to look for CNI plugins in/usr/lib/cni/. - The
kube-flannelDaemonSet (and standard CNI networking plugins) install their binaries into/opt/cni/bin/.
The Solution (fix-cni-paths.sh):
Rather than modifying the global containerd configuration on every node (which can be overwritten during upgrades), we use a script to generate a symbolic link linking the two directories across the cluster:
Once containerd can follow the symlink to find the flannel executable, it successfully provisions the network namespace and the pod transitions to Running.
2. Verifying Overlay Networking
To definitively prove that your CNI is routing packets correctly across physical nodes, you can explicitly force two pods to run on two different nodes and test the connection.
1. Schedule a targeted pod on the Control Plane:
By default, standard pods aren't scheduled on the control plane. We can bypass the scheduler using a nodeName override to force a pod onto k8s-cp-01:
kubectl run test-ping --image=busybox --restart=Never --overrides='{"spec": { "nodeName": "k8s-cp-01" }}' -- sleep 3600
2. Ping a pod on the Worker Node:
If you have another pod (like test-nginx) running on k8s-worker-01 with IP 10.244.1.3, you can ping it directly from the control plane's test-ping pod:
If you see 0% packet loss, your Flannel overlay network is correctly encapsulating traffic, sending it out the physical enp4s0 interface, routing it over the 192.168.1.0/24 network to the worker node, and decapsulating it back to the pod.
3. SRE: Diagnosing NotReady and NodeStatusUnknown
When a node drops out of the cluster, Kubernetes reports its condition in kubectl describe node <name>. Two common conditions explain completely different failures:
NotReady (with NetworkPluginNotReady)
- Symptom: The node is alive, the kubelet is running, but the node refuses to accept pods. The reason given is
cni plugin not initialized. - Root Cause: The
kubeletis waiting for the container runtime (containerd) to confirm that the network is ready. Ifcontainerdcannot find the CNI plugins (e.g., because of the Debian/usr/lib/cnipath mismatch mentioned above), it reportsNetworkReady=false. - The Gotcha:
containerdcaches its CNI plugin paths on startup! Even if you fix the symlink, the node will stayNotReadyforever until you explicitly restart containerd (sudo systemctl restart containerd) so it rescans the directory.
NodeStatusUnknown (with KubeletStopped)
- Symptom: The node was working, but suddenly the Control Plane reports
NodeStatusUnknownand stops receiving heartbeat pings. - Root Cause: The
kubeletagent on the worker node has completely died or crash-looped. - The Gotcha (The CRI Dependency): Often, the
kubeletconfiguration is perfectly fine, but the container runtime (containerd) has crashed (perhaps due to a disk error or read-only filesystem lock). Thekubeletstrictly depends on the CRI (Container Runtime Interface) socket located at/var/run/containerd/containerd.sock. If containerd is dead, the socket disappears, and the kubelet intentionally crash-loops until containerd comes back online.
NotReady (with KubeletStopped due to Swap)
- Symptom: The
kubeletcrashes instantly on boot with the errorrunning with swap on is not supported. - Root Cause: Even if you previously disabled swap (
swapoff -a), Debian's systemd auto-generator will dynamically remount swap partitions (like/dev/sda3) on the next reboot if they are still listed in/etc/fstab. Kubernetes strictly forbids swap memory to guarantee accurate pod resource scheduling. - The Fix: You must explicitly remove the swap entry from
/etc/fstab(e.g.,sed -i '/swap/d' /etc/fstab) to ensure it stays dead across reboots.
4. Bare-Metal Load Balancing (MetalLB)
In a managed cloud environment (AWS, GCP), creating a Service of type: LoadBalancer automatically triggers a cloud API call to provision a physical load balancer and assign a public IP to your cluster.
On bare-metal (like a homelab), this API does not exist. Out-of-the-box Kubernetes does not provide network load balancing. If you create a LoadBalancer service, it will remain in a Pending state indefinitely.
The Solution (MetalLB): MetalLB bridges the gap between Kubernetes and your physical network router.
- You allocate a pool of unused IP addresses on your local subnet (e.g.,
192.168.1.200-250) that your router's DHCP server will never assign. - MetalLB is configured with this
IPAddressPool. - When a
LoadBalancerservice is created, MetalLB claims an IP from the pool. - Using an
L2Advertisement, MetalLB broadcasts ARP packets to the local network, announcing that one of the physical cluster nodes "owns" that IP. The router then correctly forwards traffic to the bare-metal node.
5. Advanced Networking Traps
When deploying complex multi-pod applications (like the Media Automation Stack), you will likely encounter these two common networking traps:
Trap 1: Internal vs. External DNS
- Symptom: Pod A (e.g., Radarr) tries to connect to Pod B (e.g., qBittorrent) using its external Ingress URL (
http://qbittorrent.homelab.local). The connection fails withUnable to connect. - Root Cause: External domains like
.homelab.localare mapped via your workstation's/etc/hostsfile or external DNS router. Pods running inside the cluster do not read your workstation's host file. - The Fix: Pods within the same cluster should always communicate using Internal Kubernetes DNS. Instead of the external domain, simply use the name of the Kubernetes
Service(e.g.,http://qbittorrent:80). CoreDNS automatically resolves service names to their internal ClusterIPs.
Trap 2: The externalTrafficPolicy: Local Blackhole
- Symptom: You deploy an application with a
LoadBalancerservice, but when you navigate to the IP address from your browser, the connection times out. However, if you checkkubectl get pods, the pod is perfectly healthy. - Root Cause: When a Service is configured with
externalTrafficPolicy: Local, it instructs the networking layer (kube-proxy and MetalLB) to only route traffic to a pod if it is running on the exact physical node that received the traffic. If the traffic hitsworker-01, but the pod is running onworker-02, the packet is dropped immediately. - The Fix: Change the policy to
externalTrafficPolicy: Cluster. This restores default behavior, allowing the receiving node to forward the traffic across the overlay network (Flannel) to whichever node is actually hosting the pod.
Hardware and Storage Extensions
Out-of-the-box Kubernetes only understands CPU, RAM, and basic ephemeral disk space. To utilize advanced hardware and persistent storage, the cluster must be extended.
Hardware Accelerators (GPUs)
Kubernetes cannot natively schedule workloads onto physical GPUs. Instead, it relies on a Device Plugin architecture.
- You install a vendor-specific plugin (like the
nvidia-container-toolkit) on the host OS and configure the container runtime (containerd) to use it. - You deploy a Device Plugin DaemonSet (e.g.,
nvidia-device-plugin) into the Kubernetes cluster. - The DaemonSet inspects the physical hardware on each node and advertises available resources back to the Kubernetes API server as extended resources (e.g.,
nvidia.com/gpu: 1). - You can then request the GPU in your pod manifests exactly like CPU or RAM:
If a pod requests a GPU but none are available (or the plugin failed to load because the driver was missing), the pod will remain stuck in the Pending state with an Insufficient nvidia.com/gpu error.
Storage Classes and Provisioners
Kubernetes decouples storage from the pods using PersistentVolumes (PV) and PersistentVolumeClaims (PVC). A StorageClass defines how that storage is provisioned dynamically.
Local Path Provisioning
The fastest storage available is the physical SSD attached directly to the node. However, Kubernetes doesn't know how to dynamically provision folders on a node's disk out-of-the-box.
Using the Rancher Local Path Provisioner, you can create a StorageClass that intercepts PVC requests and automatically creates directories on the host's /opt/local-path-provisioner/ path.
- Pros: Blistering fast NVMe/SSD speeds, perfect for databases or media server config directories.
- Cons: The data is physically trapped on that specific node. If the pod is rescheduled to a different node, it loses access to the data.
Network File System (NFS)
To share data across the entire cluster so a pod can access it no matter which node it lands on, you need network-attached storage. A classic approach is deploying an NFS Server on a worker node with massive HDD capacity, and exporting it to the cluster subnet.
- Pros: Pods can be scheduled anywhere. Supports
ReadWriteMany(multiple pods reading/writing the same files simultaneously). - Cons: Significantly slower due to network latency and spinning disk physical limits.
Node Lifecycle Management
Managing a bare-metal Kubernetes cluster requires careful operational procedures, especially when you need to perform physical hardware maintenance.
Safely Evicting Workloads (Draining)
Before you ever pull the physical plug on a bare-metal node, you must safely evict its workloads. If you forcefully power off a node while a stateful pod (like a database) is actively writing to disk, you risk Ext4 filesystem corruption.
The kubectl drain command ensures that all pods are gracefully terminated and rescheduled onto other healthy nodes before the machine goes offline.
- Cordoning: The drain command first cordons the node (marking it as
SchedulingDisabled), preventing new pods from being scheduled there. - Eviction: It then sends a
SIGTERMto all running pods, giving them time to gracefully shut down.
Node Shutdown Procedures and Hangs
If you attempt to gracefully shut down a physical bare-metal node (e.g., using shutdown -h now or poweroff), the system may hang indefinitely and fail to power off. When you force a hard reboot, the node might boot into Emergency Mode with a corrupted or Read-Only Ext4 filesystem.
- The Cause: When
systemdinitiates a shutdown, it aggressively terminates network services. However, Kubernetes components (containerdandkubelet) often hang while trying to cleanly detach pod overlay network namespaces or CNI plugins (like Flannel) because the underlying network is already gone. This forcessystemdto wait for its 90-second or 5-minute timeout. If the node loses power during this ungraceful wait, the Ext4 journal is not cleanly flushed, leaving a "dirty" flag on the filesystem. - The Fix: To ensure a clean unmount of all container overlays and volumes, you must manually stop the Kubernetes services before issuing the halt command:
API Timeouts and Script Degradation
Any automation scripts that interact with your cluster rely heavily on the Kubernetes API Server (hosted on the Control Plane). If the Control Plane is offline or shutting down, kubectl commands will hang indefinitely waiting for a response.
To ensure your scripts degrade gracefully when the API is unreachable, always include a timeout flag on non-critical queries:
If the command fails, your script can catch the error and fall back to manual recovery or raw SSH commands instead of crashing completely.
API Automation with Python
While Bash is excellent for managing the raw Linux nodes and starting/stopping Kubernetes components, it falls short when you need to configure complex applications running inside those pods. Modern applications (like Jellyseerr or Prowlarr) use REST APIs to manage their internal state.
When building a zero-touch homelab, you eventually hit a wall where Kubernetes has successfully started the pod, but the app itself still requires you to open a web browser and click through a setup wizard to connect it to other apps.
Bridging Kubernetes and REST APIs
We use Python (with the requests library) to bridge the gap between the Kubernetes infrastructure API and the Application REST APIs.
Instead of hardcoding API keys in our scripts, Python can dynamically reach into the cluster, extract secrets directly from running pods using kubectl exec, and instantly inject them into another pod's REST API.
The Workflow:
1. Extract State: Python calls subprocess.run("kubectl exec -n media deploy/radarr -- cat /config/config.xml") to steal the auto-generated API key from Radarr.
2. Format Data: Python parses the XML/JSON to isolate the exact key string.
3. Inject State: Python uses requests.post() to send that API key directly to Prowlarr's REST API, instantly authenticating the two services without human intervention.
This pattern elevates the homelab from "automated deployment" to "automated configuration," allowing you to destroy and rebuild the entire media stack in minutes without ever opening a web browser.