Kubernetes Fundamentals

Pods: The Atomic Unit

18 min Lesson 3 of 32

Pods: The Atomic Unit

In Kubernetes, every workload — whether a stateless API server, a database, or a batch job — eventually runs inside a Pod. A Pod is not a container. It is a thin wrapper that groups one or more containers into a single schedulable unit with a shared execution environment. Understanding Pod anatomy at this level is non-negotiable: every higher-level abstraction (Deployment, StatefulSet, Job) is ultimately a factory that creates and manages Pods.

Pod Anatomy

A Pod specification (the spec section of a manifest) describes everything the scheduler and kubelet need to run your workload:

  • containers — one or more container specs, each with an image, command, ports, environment variables, and resource requests/limits.
  • volumes — storage volumes that any container in the Pod can mount. Volumes are scoped to the Pod lifetime.
  • initContainers — containers that run to completion before any regular container starts. Used for bootstrapping: running migrations, fetching secrets, waiting for dependencies.
  • restartPolicyAlways (default, for long-running services), OnFailure (for jobs), or Never.
  • serviceAccountName — the RBAC identity the Pod uses to call the Kubernetes API.
  • securityContext — Pod-level security: run as non-root, read-only root filesystem, syscall filters (seccomp), AppArmor profiles.
  • affinity / tolerations / nodeSelector — scheduling constraints that control which Nodes the Pod may land on.

The most important architectural fact about a Pod is its shared network and IPC namespace. Every container inside the same Pod sees the exact same loopback interface (localhost), the same IP address, and the same hostname. If container A binds port 8080, container B can reach it on localhost:8080. This is by design — it enables tightly coupled helper processes (sidecars) without the overhead of a service mesh for intra-Pod communication.

Pod anatomy: shared network namespace, volumes, init containers, and sidecars Pod (IP: 10.244.1.7) Shared Network Namespace (localhost, same IP, same ports) init-migrate Runs DB migration exits 0 → proceeds initContainer api-server image: my-api:v2 port: 8080 main container log-shipper tails /var/log/app forwards to Loki sidecar container emptyDir Volume: /var/log/app Shared between api-server and log-shipper kubelet on Node Starts initContainers first → then all containers in parallel → monitors liveness/readiness
Pod anatomy: init container runs first, then the main container and sidecar start in parallel sharing a network namespace and a volume.

Writing a Real Pod Manifest

You rarely create bare Pods in production (Deployments do that for you), but you must be able to read and write manifests to debug and to understand what higher-level objects generate. Here is a production-grade single-container Pod manifest with the fields you will encounter in real clusters:

# pod-api.yaml apiVersion: v1 kind: Pod metadata: name: api-server namespace: production labels: app: api version: v2 team: platform annotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" spec: serviceAccountName: api-sa securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000 initContainers: - name: init-migrate image: my-api:v2 command: ["python", "manage.py", "migrate", "--noinput"] envFrom: - secretRef: name: db-credentials containers: - name: api-server image: my-api:v2 ports: - containerPort: 8080 name: http - containerPort: 9090 name: metrics envFrom: - secretRef: name: db-credentials - configMapRef: name: api-config resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" readinessProbe: httpGet: path: /healthz/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /healthz/live port: 8080 initialDelaySeconds: 15 periodSeconds: 20 failureThreshold: 3 volumeMounts: - name: app-logs mountPath: /var/log/app volumes: - name: app-logs emptyDir: {} restartPolicy: Always
Resource requests vs limits: requests is what the scheduler uses to place the Pod on a Node with enough spare capacity. limits is the hard ceiling enforced by cgroups at runtime. Setting limits without requests defaults requests to equal limits — correct behaviour. Never set limits without requests in production; it prevents the scheduler from bin-packing the cluster efficiently.

Multi-Container Pods and Sidecars

The single-container Pod is the common case, but Kubernetes explicitly supports multiple containers per Pod. The pattern is called a sidecar. A sidecar is a container that augments the main application container without modifying it. This is powerful because it respects the single-responsibility principle at the container level: your application image does one thing, and a separate team's image adds a capability (logging, metrics, mTLS) as an orthogonal concern.

The three canonical sidecar patterns used at big-tech companies:

  • Log shipper — The app writes structured logs to a shared emptyDir volume. A Fluentd or Promtail sidecar tails that directory and forwards to a central log aggregator (Loki, Elasticsearch, Splunk). The app team owns the app image; the platform team owns the shipper image. Neither needs to know about the other's implementation.
  • Proxy / service mesh — Istio injects an Envoy sidecar (called the data plane) into every Pod automatically via a MutatingAdmissionWebhook. All inbound and outbound traffic flows through Envoy, giving you mTLS, retries, circuit breaking, and distributed tracing without changing a single line of application code.
  • Secret sync — A Vault Agent sidecar authenticates to HashiCorp Vault, retrieves secrets, and writes them to a shared tmpfs volume. The app reads secrets from files rather than environment variables — a security best practice because env vars can be leaked through /proc/PID/environ.
# multi-container pod: api + log-shipper sidecar apiVersion: v1 kind: Pod metadata: name: api-with-sidecar labels: app: api spec: containers: - name: api-server image: my-api:v2 volumeMounts: - name: app-logs mountPath: /var/log/app - name: log-shipper image: grafana/promtail:2.9.0 args: - -config.file=/etc/promtail/config.yaml volumeMounts: - name: app-logs mountPath: /var/log/app readOnly: true - name: promtail-config mountPath: /etc/promtail volumes: - name: app-logs emptyDir: {} - name: promtail-config configMap: name: promtail-config
Prefer init containers over startup scripts. Running database migrations or config rendering inside the application's entrypoint script means a failure there kills the app container in a confusing crash loop. An initContainer makes the failure explicit: kubectl describe pod <name> will clearly show which init container failed and why. The main container never starts, so there is no ambiguity.

Pod Lifecycle

A Pod moves through a defined set of phases during its lifetime. These phases are reported in pod.status.phase and are what you see in kubectl get pods under the STATUS column:

  • Pending — The Pod has been accepted by the API server but has not yet been scheduled to a Node, or is scheduled but its images are still being pulled.
  • Running — The Pod is bound to a Node, all containers have been created, and at least one container is still running (or is in the process of starting or restarting).
  • Succeeded — All containers in the Pod have exited with status code 0 and will not be restarted. This is the terminal state for Jobs.
  • Failed — All containers have exited, and at least one exited with a non-zero status or was killed by the system.
  • Unknown — The state of the Pod cannot be determined, typically because communication with the Node's kubelet was lost. This is a signal of a Node failure or a network partition.

Within the Running phase, individual containers have their own state: Waiting, Running, or Terminated. The reason field on a Waiting or Terminated state is the first place to look when debugging — it will tell you CrashLoopBackOff, OOMKilled, ImagePullBackOff, ContainerCreating, etc.

CrashLoopBackOff is not a phase — it is a reason. Engineers new to Kubernetes often search for documentation on "the CrashLoopBackOff phase." It does not exist as a phase. It is the reason field on a container state of Waiting. It means the container has crashed repeatedly and kubelet is applying an exponential back-off delay (starting at 10s, capping at 5 minutes) before attempting to restart it again. Always run kubectl logs <pod> --previous to get the logs from the previous (crashed) container instance, not the currently-waiting one.

Probes: Liveness, Readiness, and Startup

Kubernetes cannot read your application's mind — it needs explicit signals about health. Three probe types are available:

  • livenessProbe — "Is this container alive?" If it fails failureThreshold times, kubelet kills and restarts the container. Use it for detecting deadlocks: a process that is running but stuck forever responding to no requests.
  • readinessProbe — "Is this container ready to serve traffic?" If it fails, the Pod's IP is removed from the Endpoints object of every Service that selects it. Traffic stops flowing to that Pod, but the container is not killed. Use it to signal during startup warmup or when an upstream dependency is temporarily down.
  • startupProbe — For slow-starting containers (JVM apps, ML model loading). While the startup probe is running, liveness and readiness probes are disabled. This prevents premature restarts during initialization.
The readiness/liveness distinction is critical for zero-downtime deployments. During a rolling update, the new Pod must pass its readiness probe before the old Pod is terminated. If your readiness probe is too aggressive (low timeout, few retries), you will see failed requests during deploys even though your application is perfectly healthy — it just needed 15 more seconds to warm up its connection pool.

Inspecting Pods in Practice

The commands every DevOps engineer runs dozens of times per day:

# List pods in all namespaces with their Node and IP kubectl get pods -A -o wide # Full spec + status dump for a Pod (the most useful debugging command) kubectl describe pod api-server -n production # Stream live logs from the main container kubectl logs -f api-server -n production # Logs from a specific sidecar container kubectl logs -f api-server -c log-shipper -n production # Logs from the PREVIOUS (crashed) container instance kubectl logs api-server --previous -n production # Open a shell inside a running container kubectl exec -it api-server -n production -- /bin/sh # Copy a file out of a Pod for offline inspection kubectl cp production/api-server:/app/logs/error.log ./error.log # Watch Pod events in real time (useful during a deploy) kubectl get events -n production --sort-by='.lastTimestamp' -w