Kubernetes Fundamentals

Pods: The Atomic Unit

18 min Lesson 3 of 32

Pods: The Atomic Unit

In Kubernetes, every workload — whether a stateless API server, a database, or a batch job — eventually runs inside a Pod. A Pod is not a container. It is a thin wrapper that groups one or more containers into a single schedulable unit with a shared execution environment. Understanding Pod anatomy at this level is non-negotiable: every higher-level abstraction (Deployment, StatefulSet, Job) is ultimately a factory that creates and manages Pods.

Pod Anatomy

A Pod specification (the spec section of a manifest) describes everything the scheduler and kubelet need to run your workload:

containers — one or more container specs, each with an image, command, ports, environment variables, and resource requests/limits.
volumes — storage volumes that any container in the Pod can mount. Volumes are scoped to the Pod lifetime.
initContainers — containers that run to completion before any regular container starts. Used for bootstrapping: running migrations, fetching secrets, waiting for dependencies.
restartPolicy — Always (default, for long-running services), OnFailure (for jobs), or Never.
serviceAccountName — the RBAC identity the Pod uses to call the Kubernetes API.
securityContext — Pod-level security: run as non-root, read-only root filesystem, syscall filters (seccomp), AppArmor profiles.
affinity / tolerations / nodeSelector — scheduling constraints that control which Nodes the Pod may land on.

The most important architectural fact about a Pod is its shared network and IPC namespace. Every container inside the same Pod sees the exact same loopback interface (localhost), the same IP address, and the same hostname. If container A binds port 8080, container B can reach it on localhost:8080. This is by design — it enables tightly coupled helper processes (sidecars) without the overhead of a service mesh for intra-Pod communication.

Pod anatomy: init container runs first, then the main container and sidecar start in parallel sharing a network namespace and a volume.

Writing a Real Pod Manifest

You rarely create bare Pods in production (Deployments do that for you), but you must be able to read and write manifests to debug and to understand what higher-level objects generate. Here is a production-grade single-container Pod manifest with the fields you will encounter in real clusters:

# pod-api.yaml
apiVersion: v1
kind: Pod
metadata:
  name: api-server
  namespace: production
  labels:
    app: api
    version: v2
    team: platform
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
spec:
  serviceAccountName: api-sa
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
  initContainers:
    - name: init-migrate
      image: my-api:v2
      command: ["python", "manage.py", "migrate", "--noinput"]
      envFrom:
        - secretRef:
            name: db-credentials
  containers:
    - name: api-server
      image: my-api:v2
      ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: metrics
      envFrom:
        - secretRef:
            name: db-credentials
        - configMapRef:
            name: api-config
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "1000m"
          memory: "512Mi"
      readinessProbe:
        httpGet:
          path: /healthz/ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
      livenessProbe:
        httpGet:
          path: /healthz/live
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 20
        failureThreshold: 3
      volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
  volumes:
    - name: app-logs
      emptyDir: {}
  restartPolicy: Always

Resource requests vs limits: requests is what the scheduler uses to place the Pod on a Node with enough spare capacity. limits is the hard ceiling enforced by cgroups at runtime. Setting limits without requests defaults requests to equal limits — correct behaviour. Never set limits without requests in production; it prevents the scheduler from bin-packing the cluster efficiently.

Multi-Container Pods and Sidecars

The single-container Pod is the common case, but Kubernetes explicitly supports multiple containers per Pod. The pattern is called a sidecar. A sidecar is a container that augments the main application container without modifying it. This is powerful because it respects the single-responsibility principle at the container level: your application image does one thing, and a separate team's image adds a capability (logging, metrics, mTLS) as an orthogonal concern.

The three canonical sidecar patterns used at big-tech companies:

Log shipper — The app writes structured logs to a shared emptyDir volume. A Fluentd or Promtail sidecar tails that directory and forwards to a central log aggregator (Loki, Elasticsearch, Splunk). The app team owns the app image; the platform team owns the shipper image. Neither needs to know about the other's implementation.
Proxy / service mesh — Istio injects an Envoy sidecar (called the data plane) into every Pod automatically via a MutatingAdmissionWebhook. All inbound and outbound traffic flows through Envoy, giving you mTLS, retries, circuit breaking, and distributed tracing without changing a single line of application code.
Secret sync — A Vault Agent sidecar authenticates to HashiCorp Vault, retrieves secrets, and writes them to a shared tmpfs volume. The app reads secrets from files rather than environment variables — a security best practice because env vars can be leaked through /proc/PID/environ.

# multi-container pod: api + log-shipper sidecar
apiVersion: v1
kind: Pod
metadata:
  name: api-with-sidecar
  labels:
    app: api
spec:
  containers:
    - name: api-server
      image: my-api:v2
      volumeMounts:
        - name: app-logs
          mountPath: /var/log/app

    - name: log-shipper
      image: grafana/promtail:2.9.0
      args:
        - -config.file=/etc/promtail/config.yaml
      volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
          readOnly: true
        - name: promtail-config
          mountPath: /etc/promtail

  volumes:
    - name: app-logs
      emptyDir: {}
    - name: promtail-config
      configMap:
        name: promtail-config

Prefer init containers over startup scripts. Running database migrations or config rendering inside the application's entrypoint script means a failure there kills the app container in a confusing crash loop. An initContainer makes the failure explicit: kubectl describe pod <name> will clearly show which init container failed and why. The main container never starts, so there is no ambiguity.

Pod Lifecycle

A Pod moves through a defined set of phases during its lifetime. These phases are reported in pod.status.phase and are what you see in kubectl get pods under the STATUS column:

Pending — The Pod has been accepted by the API server but has not yet been scheduled to a Node, or is scheduled but its images are still being pulled.
Running — The Pod is bound to a Node, all containers have been created, and at least one container is still running (or is in the process of starting or restarting).
Succeeded — All containers in the Pod have exited with status code 0 and will not be restarted. This is the terminal state for Jobs.
Failed — All containers have exited, and at least one exited with a non-zero status or was killed by the system.
Unknown — The state of the Pod cannot be determined, typically because communication with the Node's kubelet was lost. This is a signal of a Node failure or a network partition.

Within the Running phase, individual containers have their own state: Waiting, Running, or Terminated. The reason field on a Waiting or Terminated state is the first place to look when debugging — it will tell you CrashLoopBackOff, OOMKilled, ImagePullBackOff, ContainerCreating, etc.

CrashLoopBackOff is not a phase — it is a reason. Engineers new to Kubernetes often search for documentation on "the CrashLoopBackOff phase." It does not exist as a phase. It is the reason field on a container state of Waiting. It means the container has crashed repeatedly and kubelet is applying an exponential back-off delay (starting at 10s, capping at 5 minutes) before attempting to restart it again. Always run kubectl logs <pod> --previous to get the logs from the previous (crashed) container instance, not the currently-waiting one.

Probes: Liveness, Readiness, and Startup

Kubernetes cannot read your application's mind — it needs explicit signals about health. Three probe types are available:

livenessProbe — "Is this container alive?" If it fails failureThreshold times, kubelet kills and restarts the container. Use it for detecting deadlocks: a process that is running but stuck forever responding to no requests.
readinessProbe — "Is this container ready to serve traffic?" If it fails, the Pod's IP is removed from the Endpoints object of every Service that selects it. Traffic stops flowing to that Pod, but the container is not killed. Use it to signal during startup warmup or when an upstream dependency is temporarily down.
startupProbe — For slow-starting containers (JVM apps, ML model loading). While the startup probe is running, liveness and readiness probes are disabled. This prevents premature restarts during initialization.

The readiness/liveness distinction is critical for zero-downtime deployments. During a rolling update, the new Pod must pass its readiness probe before the old Pod is terminated. If your readiness probe is too aggressive (low timeout, few retries), you will see failed requests during deploys even though your application is perfectly healthy — it just needed 15 more seconds to warm up its connection pool.

Inspecting Pods in Practice

The commands every DevOps engineer runs dozens of times per day:

# List pods in all namespaces with their Node and IP
kubectl get pods -A -o wide

# Full spec + status dump for a Pod (the most useful debugging command)
kubectl describe pod api-server -n production

# Stream live logs from the main container
kubectl logs -f api-server -n production

# Logs from a specific sidecar container
kubectl logs -f api-server -c log-shipper -n production

# Logs from the PREVIOUS (crashed) container instance
kubectl logs api-server --previous -n production

# Open a shell inside a running container
kubectl exec -it api-server -n production -- /bin/sh

# Copy a file out of a Pod for offline inspection
kubectl cp production/api-server:/app/logs/error.log ./error.log

# Watch Pod events in real time (useful during a deploy)
kubectl get events -n production --sort-by='.lastTimestamp' -w