Kubernetes Hardening: Pod Security
Kubernetes Hardening: Pod Security
Every workload running in Kubernetes executes inside a Pod. By default, that Pod inherits the host kernel, can run as root, and can mount arbitrary host paths — a single compromised container can pivot into a full cluster takeover. Pod security is the practice of stripping those defaults away so that a container escape lands in a stripped-down sandbox rather than a root shell with access to every secret in the cluster.
This lesson covers the three-layer model that production teams at large-scale companies rely on: Pod Security Standards (PSS) at the namespace level, securityContext at the Pod and container level, and runtime defaults that enforce secure-by-default behavior even when developers forget.
Pod Security Standards: Policy at the Namespace Level
Pod Security Standards replaced the deprecated PodSecurityPolicy (PSP) in Kubernetes 1.25. PSS defines three named policy levels enforced by a built-in admission controller — no CRDs, no webhooks, no external dependencies.
- Privileged: Completely unrestricted. Reserved for system namespaces like
kube-systemand CNI plugins that legitimately need host access. - Baseline: Prevents the most dangerous escalations (privileged containers,
hostPID,hostIPC, dangerous capabilities likeNET_ADMIN) while allowing most legacy workloads to run unmodified. - Restricted: Heavily constrained — requires non-root UID, drops all capabilities, disallows privilege escalation, enforces a seccomp profile. The gold standard for application workloads.
Each level can be set in three modes: enforce (reject violating Pods at admission), audit (allow but emit a policy violation audit event), and warn (allow but surface a warning to the API client). The production migration pattern is to apply warn and audit in Restricted mode everywhere first, monitor violations for a sprint, fix workloads, then flip namespaces to enforce.
latest. Using enforce-version=latest means a Kubernetes upgrade can start rejecting previously-compliant Pods the moment stricter checks are added to the Restricted profile. In production, pin to a specific version like v1.30 and upgrade deliberately during maintenance windows.
securityContext: Hardening Individual Pods and Containers
PSS sets the policy floor. securityContext is where you implement it per workload. Kubernetes exposes two levels: spec.securityContext (Pod-level, applies to all containers) and spec.containers[].securityContext (container-level, overrides Pod-level for that container).
Here is a production-ready deployment that satisfies the Restricted standard with explicit, documented intent on every field:
Runtime Defaults: Secure Without Developer Effort
Relying solely on developers remembering to set securityContext does not scale. Production platforms use two enforcement mechanisms to make the secure path the default path.
1. Admission Webhooks with OPA/Kyverno
A mutating admission webhook can inject a sane securityContext into every Pod that does not specify one. A validating webhook can then reject Pods that still violate policy after mutation. Kyverno is the simpler option for Kubernetes-native teams; OPA/Gatekeeper offers more flexibility for complex policies shared across clouds.
2. seccomp RuntimeDefault as a Global Default
Since Kubernetes 1.27, you can enable --feature-gates=SeccompDefault=true on the kubelet and set --seccomp-default to apply the RuntimeDefault seccomp profile to all Pods that do not specify one. This is the closest Kubernetes comes to a global safe default at the node level — activate it on new node groups before rolling to existing ones.
RuntimeDefault over Unconfined, always. The RuntimeDefault seccomp profile (provided by containerd or cri-o) blocks ~300 dangerous syscalls including ptrace, mount, and unshare that are the primary tools in container-escape exploit chains. The performance overhead is immeasurable in virtually all workloads. There is no good reason to run Unconfined in production except for dedicated security tooling.
Production Failure Modes and What They Cost You
Understanding why each field exists requires seeing what happens without it:
- Missing
runAsNonRoot: true: An image built without aUSERdirective runs as UID 0 inside the container. If the container runtime has a kernel vulnerability, UID 0 inside maps directly to root outside. CVE-2019-5736 (runc overwrite) required root to exploit. - Missing
allowPrivilegeEscalation: false: Binaries with thesetuidbit set (likesudo,newgrp, orpkexec) can escalate to root even when the container starts as a non-root user. This is how CVE-2021-4034 (PwnKit) worked. - Missing
readOnlyRootFilesystem: true: An attacker who achieves code execution can write persistence tools, exfiltrate data to disk, or drop a reverse shell to the container's writable layer. A read-only filesystem limits the blast radius to in-memory operations. - Missing capability drops: The default set of Linux capabilities granted to a container includes
NET_RAW(craft raw packets, ARP spoofing),SYS_CHROOT, andMKNOD. Drop all and add back only what is provably needed. - Missing
automountServiceAccountToken: false: Every Pod gets a service account token mounted by default. In a compromised container, that token provides API server access and is the primary pivot for lateral movement attacks within the cluster.
runAsUser: 1000 in the deployment manifest but forget to add a USER 1000 directive to the Dockerfile. Kubernetes will happily run a container as the UID you specify — but if the binary in the image was built expecting UID 0, it may crash on missing file permissions. Always build with USER nonroot in the Dockerfile AND enforce it at the Pod level.
Verifying Your Hardening
After applying securityContext settings, validate them without guessing:
Combine these checks with a policy scanner like Trivy (trivy k8s --report summary cluster) or kube-bench for CIS Benchmark compliance. Running these in your CI pipeline — against the rendered manifests, before cluster admission — catches regressions before they reach production.