Filesystem Recovery & Emergency Mode

A fundamental responsibility of a Linux system administrator (or a bare-metal Kubernetes operator) is maintaining the integrity of the underlying filesystems. The most common filesystem used in Linux is Ext4.

Unlike cloud virtual machines, where a corrupted instance can simply be terminated and replaced with a fresh image in seconds, bare-metal servers require manual filesystem recovery.

The Ext4 Journal

Ext4 is a journaling filesystem. This means that before it writes actual data to the disk, it writes an "intent to commit" into a dedicated area of the disk called the journal.

If the machine loses power in the middle of a write operation, the kernel looks at the journal upon the next boot. It sees the uncompleted transaction and rolls it back, preventing data corruption.

The "Dirty" Journal

If a node crashes, loses power, or undergoes an ungraceful shutdown (e.g., systemd hanging on network teardown and the user forcing a hard reboot), the journal is not flushed cleanly. The filesystem is marked with a "dirty" flag.

Read-Only Lockdowns & Emergency Mode

When the Linux kernel boots, it checks the health of the root (/) filesystem. If it detects the "dirty" flag or unhandled disk errors, it executes a protective lockdown:

Emergency Mode: The boot process halts completely, dropping you into a minimal root shell. It refuses to start networking, SSH, or Kubernetes components.
Read-Only Remount: If it does manage to boot, the kernel forcefully remounts the root filesystem as Read-Only (ro). It does this to prevent any further writes from permanently destroying your data.

Symptoms of a Read-Only Filesystem

You can SSH into the node, but commands like touch test.txt fail with Read-only file system.
Ansible playbooks fail with UNREACHABLE errors complaining that it cannot create the temporary ~/.ansible/tmp directory.
Ansible asynchronous ad-hoc tasks (like emergency shutdowns using -B 1) fail with [Errno 30] Read-only file system because they cannot write to /root/.ansible_async.
The Kubernetes kubelet crash-loops because it cannot write state to /var/lib/kubelet.

Circumventing Read-Only mode with `tmpfs`

If your system is locked in Read-Only mode but you must execute an emergency automation script (like safely halting the cluster), you can route your temporary state files into a memory-backed tmpfs partition.

In most Linux distributions, the /tmp directory is mounted in RAM. This means it remains fully writable even when the underlying SSD controller has hard-locked the physical disk.

For example, to execute an Ansible async task on a locked node, tell Ansible to use the writable /tmp directory instead of the locked /root directory:

ansible all -b -e ansible_async_dir=/tmp/.ansible_async -m shell -a "shutdown -h now" -B 1 -P 0

Recovery via `fsck`

To recover a node stuck in a Read-Only state, you must manually repair the filesystem using the File System Consistency Check (fsck) utility.

CRITICAL WARNING: You must never run fsck on a mounted, writable filesystem. Doing so will instantly and irreparably corrupt your data. Because a node locked in Emergency Mode mounts the filesystem as Read-Only, it is generally safe to run fsck.

Step-by-Step Recovery

Identify the Partition: First, identify the root partition dynamically instead of guessing /dev/sda1 or /dev/nvme0n1p2:
```
ROOT_DEV=$(findmnt -n -o SOURCE /)
```
Run fsck: Execute the repair tool with the -y flag (which automatically answers "yes" to all repair prompts, as thousands of inode errors might be found):
```
sudo fsck -y $ROOT_DEV
```
Remount as Read-Write: Once fsck reports the filesystem is clean, you do not necessarily need to reboot. You can instruct the kernel to remount the filesystem dynamically with Read-Write permissions:
```
sudo mount -o remount,rw /
```
Verify: Test the filesystem by creating a temporary file:
```
touch /tmp/test-write
```

Hardware Death

If fsck reports that the disk is clean, but your attempt to mount -o remount,rw / still fails with a write-protected error, you have experienced a hardware failure.

Modern SSDs have a limited number of write cycles (TBW - Terabytes Written). When the SSD controller detects that its NAND flash memory is exhausted and can no longer reliably hold a charge, it permanently locks the drive at the hardware level. The drive can be read from, but never written to again. You must physically replace the drive.

Rogue Swap Remounts (`systemd-fstab-generator`)

When recovering a node (especially a Kubernetes worker), you may find that the kubelet suddenly crash-loops on boot with the error running with swap on is not supported.

Even if you previously disabled swap memory in your shell:

sudo swapoff -a

Debian's systemd-fstab-generator will dynamically read /etc/fstab during the boot sequence. If it finds a swap partition listed there, it automatically creates a systemd mount unit and remounts the swap space, violating Kubernetes requirements.

The Fix: To permanently disable swap and survive reboots, you must explicitly remove it from the filesystem table:

sudo sed -i '/swap/d' /etc/fstab