Linux

Linux System Administration

The operating system is the foundation. Every setup task — from creating a user to configuring a container runtime — is a core Linux sysadmin skill.

Theory

The Linux filesystem hierarchy is standardized by the FHS. The directories you'll touch constantly:

Path	Purpose	Common usage
`/etc`	System-wide configuration files	hosts, fstab, sudoers, sysctl, modules, containerd
`/home`	User home directories	`leva`'s home, SSH keys
`/var`	Variable data (logs, runtime state)	containerd state
`/proc`, `/sys`	Virtual filesystems exposing kernel state	sysctl reads/writes

User and group management. Linux is a multi-user OS. Every process runs as a user. Key commands:

useradd / adduser — create a user
usermod -aG <group> <user> — add user to a group
groups <user> — list group memberships
id <user> — show UID, GID, and groups

The sudoers system. sudo is not built into the kernel — it's a package. The configuration lives in /etc/sudoers (edited via visudo) and drop-in files in /etc/sudoers.d/. The line:

leva ALL=(ALL) NOPASSWD:ALL

Means: user leva, from any host (ALL), may run commands as any user ((ALL)), without a password (NOPASSWD), for all commands (ALL).

Package management. Debian uses apt (and the lower-level dpkg):

apt-get update — refresh the package index from mirrors
apt-get install -y <pkg> — install without interactive confirmation
apt-mark hold <pkg> — prevent a package from being upgraded (important for K8s tooling later)
dpkg -l | grep <pkg> — check if a package is installed

Systemd is the init system and service manager on modern Debian. Key commands:

systemctl start/stop/restart <service> — control a service
systemctl enable <service> — start on boot
systemctl status <service> — check health
journalctl -u <service> — read service logs

Hardware & Storage Checks. You often need to verify physical hardware attributes (e.g., confirming a disk is an SSD for database or Longhorn deployments):

lsblk -d -o NAME,ROTA — list block devices and show if they are rotational (0 = SSD, 1 = HDD). For more detail, see Checking if a Disk is an SSD or HDD.
lsblk -o NAME,SIZE,MOUNTPOINT — check disk sizes and mount points to identify which disk is which.

Obstacles

Debian minimal doesn't include sudo. This is the first surprise on a netinst install. You must su - to root and install it manually. This was your first troubleshooting entry.
visudo vs. drop-in files. Never edit /etc/sudoers directly with a text editor — a syntax error locks you out of sudo. Use visudo for validation, or use /etc/sudoers.d/ drop-ins which are safer to manage.
apt vs apt-get. apt is the user-friendly CLI (with progress bars). apt-get is the scriptable one (stable output, no prompts with -y). Use apt-get in scripts, apt interactively.

Implementation

troubleshooting.md — sudo: command not found
ansible/playbooks/00-bootstrap-debian.yaml — replaced manual prep-node.sh script to configure hostname and DNS.

Resources

Debian Administrator's Handbook
man sudoers, man apt-get, man systemctl

Kernel Preparation for K8s

Kubernetes makes demands on the Linux kernel that go beyond typical server administration. Preparing a node requires configuring three core kernel-level systems: swap, kernel modules, and sysctl parameters.

Theory

The Linux kernel acts as the core interface between the hardware and your container runtime. To run Kubernetes reliably, you must manually adjust how the kernel handles memory and network traffic.

Swap

Swap is disk space used as overflow when RAM is full. The kernel moves inactive memory pages to swap to free up RAM. This is useful for general-purpose servers, but Kubernetes forbids it.

Why? The kubelet's job is to schedule pods with guaranteed resource limits. If a container requests 512 MB of RAM, the scheduler needs to know that 512 MB is physically available. If the OS silently swaps memory to disk, the scheduler's math becomes a lie — pods appear to fit but actually thrash on slow disk I/O.

swapoff -a                                        # disable now
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab    # disable on reboot

Kernel Modules

Modules are pieces of kernel code loaded on demand. Two are required for container networking:

Module	Purpose
`overlay`	Enables OverlayFS — the filesystem driver that layers container images. Each container sees a merged view of read-only image layers + a writable top layer. Without this, containerd can't unpack images.
`br_netfilter`	Makes bridged network traffic (traffic between containers on the same host via a Linux bridge) visible to `iptables`. Without this, Kubernetes NetworkPolicies and service routing can't inspect or filter inter-container packets.

Load them immediately with modprobe, persist them in /etc/modules-load.d/k8s.conf.

Sysctl Parameters

sysctl exposes tunable kernel parameters via the /proc/sys/ virtual filesystem. Three parameters matter:

Parameter	Value	Why
`net.ipv4.ip_forward`	`1`	Allows the node to forward packets between network interfaces. Without this, pods on one node can't reach pods on another — the kernel drops the packets instead of routing them.
`net.bridge.bridge-nf-call-iptables`	`1`	Bridged IPv4 traffic passes through iptables rules. Required for Kubernetes Services (kube-proxy) and NetworkPolicies to work on bridged traffic.
`net.bridge.bridge-nf-call-ip6tables`	`1`	Same as above for IPv6.

These are persisted in /etc/sysctl.d/k8s.conf and applied with sysctl --system.

Obstacles

"Why does K8s care about the kernel?" — Because Kubernetes doesn't run in a VM. It shares the host kernel with every container. The kernel is the container runtime's execution environment.
Modules not persisting across reboot. modprobe loads a module now. /etc/modules-load.d/ makes it survive reboots. Missing the second step is a classic "it worked until I rebooted" failure.
sysctl changes lost on reboot. Same pattern: sysctl -w is temporary, /etc/sysctl.d/ is permanent.

Implementation

ansible/playbooks/00-bootstrap-debian.yaml — declarative Ansible tasks managing swap, modules, and sysctl, replacing the imperative prep-node.sh.

Resources

Kubernetes docs — Container Runtimes prerequisites
OverlayFS kernel docs
man sysctl, man modprobe

Proprietary Drivers and Kernels

While Kubernetes generally relies on standard kernel features, running hardware-accelerated workloads (like GPUs for transcoding or machine learning) requires proprietary kernel modules.

DKMS (Dynamic Kernel Module Support)

Unlike Windows, where hardware drivers are isolated pre-compiled binaries, Linux drivers are often compiled directly into the kernel or loaded as highly specific modules (.ko files) that must match the exact version of the running kernel.

When you install a proprietary driver (like nvidia-driver on Debian), the package manager downloads the raw C source code. It then uses DKMS to compile that source code into a kernel module locally on your machine.

The Gotcha: DKMS cannot compile the module if it doesn't have the kernel headers (the C header files your specific kernel was built with). If you install the nvidia-driver package without explicitly installing linux-headers-amd64, DKMS will silently fail. sudo dkms status will show the module as added instead of installed, and the hardware will simply fail to initialize on boot.

Secure Boot

Secure Boot is a UEFI firmware feature designed to prevent malicious rootkits from loading during the boot process. It does this by cryptographically verifying the signature of the bootloader, the kernel, and every single kernel module.

The Problem: Debian's core kernel and modules are signed by Microsoft/Debian keys, so the system boots perfectly fine with Secure Boot enabled. However, when DKMS locally compiles the proprietary nvidia.ko module on your machine, it generates an unsigned binary. The Linux kernel will strictly refuse to load this unsigned driver, causing it to fail silently.
The Fix: For homelab and bare-metal environments running proprietary drivers, you must boot into the physical machine's BIOS (usually F2 or DEL) and explicitly disable Secure Boot.

Bare-Metal Hardware & SRE Recovery

When running your own bare-metal servers, you have to act as the hardware technician. Software doesn't just fail because of bad code; it fails because the physical silicon beneath it breaks.

The Read-Only Emergency Lockdown

If an SSD exhausts its write lifespans, or the Linux kernel detects a sudden memory panic or severe file table corruption, the kernel's first response is to protect your data by forcefully remounting the entire / filesystem as Read-Only.

Symptoms of a Read-Only Lockdown: 1. Ansible deployments fail with UNREACHABLE (Failed to create temporary directory). 2. The Kubernetes kubelet crash-loops because it can't write to /var/lib/kubelet. 3. Running a simple command like touch ~/test returns Read-only file system.

How to Recover: 1. The Hard Reboot: Because the filesystem is read-only, standard sudo reboot commands often fail because systemd can't write to the shut-down logs. You must physically hold the power button on the machine to hard-kill it. 2. Automated fsck: When the machine boots back up, the kernel runs fsck (filesystem check). If the corruption was minor, it fixes the journal and boots normally. 3. Manual fsck: If the machine boots but is still read-only, you must SSH in and forcefully repair the unmounted drive or read-only drive: sudo fsck -y /dev/sda1. 4. Hardware Death: If fsck reports the drive is clean, but a remount (sudo mount -o remount,rw /) throws a write-protected hardware lock error, the SSD controller has permanently bricked the drive.

Cryptographic Deprecations (The Sequoia Bug)

Modern Linux distributions frequently update their internal security policies, which can unexpectedly break legacy software repositories.

A prime example is the Debian 13 Sequoia Bug: * Debian 13 switched to a strict OpenPGP verifier called sqv. * sqv enforces modern cryptography, explicitly rejecting older "v3 signature packets". * The official Kubernetes apt repositories were still signing their release files with v3 signatures. * Result: apt-get update suddenly fails cluster-wide with Policy rejected packet type.

The Shell Wrapper Bypass: When the package manager (apt) doesn't give you a flag to bypass a strict security tool, you can hijack the tool in the system $PATH. By renaming /usr/bin/sqv to sqv.real, and placing a bash script at /usr/bin/sqv that intercepts the arguments and appends --policy-as-of 2025-01-01T00:00:00Z before passing them to sqv.real, you can trick the system into accepting the legacy signatures until the upstream repository updates their infrastructure!