Filesystem Recovery & Emergency Mode
A fundamental responsibility of a Linux system administrator (or a bare-metal Kubernetes operator) is maintaining the integrity of the underlying filesystems. The most common filesystem used in Linux is Ext4.
Unlike cloud virtual machines, where a corrupted instance can simply be terminated and replaced with a fresh image in seconds, bare-metal servers require manual filesystem recovery.
The Ext4 Journal
Ext4 is a journaling filesystem. This means that before it writes actual data to the disk, it writes an "intent to commit" into a dedicated area of the disk called the journal.
If the machine loses power in the middle of a write operation, the kernel looks at the journal upon the next boot. It sees the uncompleted transaction and rolls it back, preventing data corruption.
The "Dirty" Journal
If a node crashes, loses power, or undergoes an ungraceful shutdown (e.g., systemd hanging on network teardown and the user forcing a hard reboot), the journal is not flushed cleanly. The filesystem is marked with a "dirty" flag.
Read-Only Lockdowns & Emergency Mode
When the Linux kernel boots, it checks the health of the root (/) filesystem. If it detects the "dirty" flag or unhandled disk errors, it executes a protective lockdown:
- Emergency Mode: The boot process halts completely, dropping you into a minimal root shell. It refuses to start networking, SSH, or Kubernetes components.
- Read-Only Remount: If it does manage to boot, the kernel forcefully remounts the root filesystem as Read-Only (
ro). It does this to prevent any further writes from permanently destroying your data.
Symptoms of a Read-Only Filesystem
- You can SSH into the node, but commands like
touch test.txtfail withRead-only file system. - Ansible playbooks fail with
UNREACHABLEerrors complaining that it cannot create the temporary~/.ansible/tmpdirectory. - Ansible asynchronous ad-hoc tasks (like emergency shutdowns using
-B 1) fail with[Errno 30] Read-only file systembecause they cannot write to/root/.ansible_async. - The Kubernetes
kubeletcrash-loops because it cannot write state to/var/lib/kubelet.
Circumventing Read-Only mode with tmpfs
If your system is locked in Read-Only mode but you must execute an emergency automation script (like safely halting the cluster), you can route your temporary state files into a memory-backed tmpfs partition.
In most Linux distributions, the /tmp directory is mounted in RAM. This means it remains fully writable even when the underlying SSD controller has hard-locked the physical disk.
For example, to execute an Ansible async task on a locked node, tell Ansible to use the writable /tmp directory instead of the locked /root directory:
Recovery via fsck
To recover a node stuck in a Read-Only state, you must manually repair the filesystem using the File System Consistency Check (fsck) utility.
CRITICAL WARNING: You must never run
fsckon a mounted, writable filesystem. Doing so will instantly and irreparably corrupt your data. Because a node locked in Emergency Mode mounts the filesystem as Read-Only, it is generally safe to runfsck.
Step-by-Step Recovery
- Identify the Partition: First, identify the root partition dynamically instead of guessing
/dev/sda1or/dev/nvme0n1p2: - Run
fsck: Execute the repair tool with the-yflag (which automatically answers "yes" to all repair prompts, as thousands of inode errors might be found): - Remount as Read-Write: Once
fsckreports the filesystem is clean, you do not necessarily need to reboot. You can instruct the kernel to remount the filesystem dynamically with Read-Write permissions: - Verify: Test the filesystem by creating a temporary file:
Hardware Death
If fsck reports that the disk is clean, but your attempt to mount -o remount,rw / still fails with a write-protected error, you have experienced a hardware failure.
Modern SSDs have a limited number of write cycles (TBW - Terabytes Written). When the SSD controller detects that its NAND flash memory is exhausted and can no longer reliably hold a charge, it permanently locks the drive at the hardware level. The drive can be read from, but never written to again. You must physically replace the drive.
Rogue Swap Remounts (systemd-fstab-generator)
When recovering a node (especially a Kubernetes worker), you may find that the kubelet suddenly crash-loops on boot with the error running with swap on is not supported.
Even if you previously disabled swap memory in your shell:
Debian'ssystemd-fstab-generator will dynamically read /etc/fstab during the boot sequence. If it finds a swap partition listed there, it automatically creates a systemd mount unit and remounts the swap space, violating Kubernetes requirements.
The Fix: To permanently disable swap and survive reboots, you must explicitly remove it from the filesystem table: