Container Runtimes and the CRI
When learning Kubernetes, it's easy to assume the kubelet is responsible for actually running the containers. This is false. The kubelet is simply an orchestration agent. It delegates the actual creation, starting, stopping, and destruction of containers to a Container Runtime.
The Container Runtime Interface (CRI)
Historically, Kubernetes was hard-coded to work specifically with Docker. As the ecosystem evolved, this tight coupling became a bottleneck. Kubernetes introduced the Container Runtime Interface (CRI), a standard gRPC API that allows the kubelet to communicate with any container runtime that implements the CRI specification.
Today, the most common CRI-compatible runtime is containerd (which was originally spun out from Docker itself).
The Kubelet-Containerd Relationship
- The Kubernetes Control Plane schedules a Pod onto a worker node.
- The
kubeleton that node receives the Pod specification. - The
kubeletcalls the CRI (via a local UNIX socket, usually/var/run/containerd/containerd.sock). containerdreceives the request.containerdtalks to the Linux kernel to provision the cgroups, network namespaces, and OverlayFS layers required to isolate the processes.containerdstarts the container.
The Dependency Death Spiral
Because the kubelet relies 100% on containerd to manage state, the kubelet will intentionally crash-loop if containerd goes offline.
If you ever see a node transition to NodeStatusUnknown and you find the kubelet service restarting every 10 seconds, check containerd. If containerd crashed (e.g., due to a disk error or read-only filesystem lock), the CRI socket disappears, and the kubelet has no way to do its job.
Configuration Caching and the Restart Trap
containerd relies on a central configuration file located at /etc/containerd/config.toml. This file defines critical system behaviors, such as:
- Where to look for CNI (Container Network Interface) plugins.
- Which OCI runtimes are available (e.g., standard runc vs. NVIDIA's nvidia-container-runtime).
The Trap: containerd reads config.toml entirely into memory when the systemd service starts. It does not hot-reload this file.
If you use automation (like Ansible) to patch config.toml—for example, to inject the NVIDIA Container Toolkit runtime class—you must explicitly restart the containerd service:
If you fail to restart the service, containerd will continue operating on its cached configuration. Any Kubernetes Pods attempting to use the new features (like requesting nvidia.com/gpu) will fail to initialize or be stuck in Pending because the runtime doesn't know the new configuration exists.