Advanced Kubernetes Operations

Admission Control & Webhooks

18 min Lesson 2 of 30

Admission Control & Webhooks

Every time you run kubectl apply, your request travels through several layers before a single Pod is created. The final gatekeeping layer — the one that lets Google, Stripe, and other large-scale operators enforce security policies, inject sidecars, and validate resource quotas in real time — is the admission control subsystem. Understanding it deeply means you can both rely on it for cluster safety and debug it when it silently rejects your workloads.

The API Request Lifecycle

Before diving into webhooks, trace the full path a request takes through the API server. This sequence is deterministic and non-negotiable — there are no shortcuts:

Kubernetes API Request Path through Admission Control kubectl / REST Client AuthN Identity AuthZ RBAC / ABAC Mutating Admission Webhooks Validating Admission Webhooks Schema Validation etcd Persist 1 2 3 4 — Mutate 5 — Validate 6 7 Reject → 403 back to client 400/403
The Kubernetes API request path: AuthN → AuthZ → Mutating webhooks → Validating webhooks → Schema validation → etcd.

Key insight: mutating webhooks run before validating webhooks. This ordering is intentional — mutators modify the object first (injecting sidecars, adding labels, defaulting fields), and then validators inspect the final shape. If a validating webhook fires before mutation, policies would reject objects that would have been fixed by the mutator.

Built-in Admission Controllers vs. Webhooks

Kubernetes ships with compiled-in admission controllers enabled by default (e.g., NamespaceLifecycle, LimitRanger, ResourceQuota, PodSecurity). These run before webhooks and handle the most critical invariants. You extend the system with dynamic admission — the two webhook types:

  • MutatingAdmissionWebhook — can modify the object via a JSON Patch response. Used for sidecar injection (Istio, Linkerd), secret encryption defaulting, label stamping.
  • ValidatingAdmissionWebhook — can only allow or deny. Used for policy enforcement (OPA/Gatekeeper, Kyverno), image registry allowlisting, required annotation checks.
Production rule: Webhooks must be highly available. If the webhook server is down and failurePolicy: Fail is set, every matching API request to your cluster will be rejected until the webhook recovers. This has caused major incidents at large companies. Always deploy webhook servers with at least 2 replicas and a PodDisruptionBudget.

Registering a Validating Webhook

Webhooks are registered with ValidatingWebhookConfiguration or MutatingWebhookConfiguration objects. The API server uses these to know which HTTPS endpoint to call, and which resource/operation combinations trigger it.

# validating-webhook.yaml — reject Pods that do not set resource limits apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: require-resource-limits webhooks: - name: require-limits.example.com admissionReviewVersions: ["v1"] clientConfig: service: name: policy-webhook namespace: policy-system path: /validate-pods caBundle: <base64-encoded-CA-cert> rules: - apiGroups: [""] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["pods"] scope: "Namespaced" namespaceSelector: matchExpressions: - key: webhook.policy/ignore operator: DoesNotExist failurePolicy: Fail # Fail | Ignore sideEffects: None # None | NoneOnDryRun timeoutSeconds: 5 # default 10, max 30

The caBundle field is critical — the API server uses it to verify the webhook's TLS certificate. In production, use cert-manager with a Certificate resource and the cainjector to automatically populate caBundle. Rotating this certificate manually is an operational trap.

What the Webhook Server Returns

Your webhook is a plain HTTPS server. The API server sends an AdmissionReview JSON body and expects one back. For a validating webhook, the response is simple:

# Allowed response { "apiVersion": "admission.k8s.io/v1", "kind": "AdmissionReview", "response": { "uid": "<copy from request.uid>", "allowed": true } } # Denied response — shown to the user in kubectl output { "apiVersion": "admission.k8s.io/v1", "kind": "AdmissionReview", "response": { "uid": "<copy from request.uid>", "allowed": false, "status": { "code": 403, "message": "Pod must define resources.limits for all containers" } } }

A mutating webhook returns the same structure but also includes a patch field (base64-encoded JSON Patch) and "patchType": "JSONPatch". The patch can add, remove, or replace fields on the object.

Kyverno and OPA Gatekeeper — Production Policy Engines

Writing raw webhook servers is error-prone at scale. Production clusters at big-tech companies use policy engines that implement webhooks internally and let you write declarative policies instead of Go/Python servers.

  • Kyverno — Kubernetes-native. Policies are CRDs (ClusterPolicy). Simpler YAML syntax, built-in auto-mutation support, audit mode.
  • OPA Gatekeeper — Uses ConstraintTemplate (Rego language). More expressive, better for complex cross-field validation. Standard at Google and large enterprises.
# Kyverno ClusterPolicy — require all containers to have resource limits apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-resource-limits spec: validationFailureAction: Enforce # Audit | Enforce background: true # also audit existing resources rules: - name: check-container-limits match: any: - resources: kinds: [Pod] validate: message: "All containers must specify resources.limits" pattern: spec: containers: - resources: limits: memory: "?*" cpu: "?*"
Start in Audit mode. When rolling out a new Kyverno policy, set validationFailureAction: Audit first. This logs violations without blocking workloads, letting you discover how many existing resources already violate the policy before enforcing it. Enforce blindly in a large cluster and you will block your own CD pipelines.

Failure Modes and Debugging

When a webhook rejects your request, kubectl prints the status.message directly. But webhooks can also fail in subtler ways:

  • Timeout — If the webhook server takes longer than timeoutSeconds, the API server treats it as a failure. With failurePolicy: Fail this blocks the request; with Ignore it silently skips. Either is dangerous.
  • TLS mismatch — An expired or rotated cert that was not propagated to caBundle causes all requests to fail with a TLS verification error, not a helpful policy message.
  • Webhook loop — A mutating webhook that watches its own objects and mutates them again. Always add a namespaceSelector or objectSelector to exclude the webhook server's own namespace.
# Inspect webhook configurations kubectl get validatingwebhookconfigurations kubectl get mutatingwebhookconfigurations # Describe a specific webhook to check caBundle, namespaceSelector, failurePolicy kubectl describe validatingwebhookconfiguration require-resource-limits # Watch API server audit logs for admission decisions (requires audit policy configured) kubectl logs -n kube-system kube-apiserver-<node> | grep admission # Kyverno — see policy violations in audit mode kubectl get policyreport -A kubectl get clusterpolicyreport
Never set failurePolicy: Fail on a webhook that calls an external service (e.g., a remote OPA server, an external secret vault). External dependencies introduce latency and availability risk. If that external service has a 5-minute outage, your cluster cannot create any Pods. Use Ignore for externally-dependent webhooks, and compensate with post-admission audit tooling.

Webhook Best Practices at Scale

  1. Scope narrowly — Use namespaceSelector, objectSelector, and specific rules to match only what you need. A webhook that fires on every object in the cluster multiplies API server latency.
  2. Use cert-manager — Never manage webhook TLS certificates by hand. The cainjector keeps caBundle in sync automatically.
  3. Set sideEffects: None — Required for dry-run support (kubectl apply --dry-run=server). If your webhook has side effects (writes to a DB, calls an API), you must declare NoneOnDryRun and implement dry-run detection.
  4. Exclude system namespaces — Always exclude kube-system and your webhook's own namespace from all webhook rules to avoid control-plane disruption.
  5. Monitor webhook latency — Expose a /metrics endpoint on your webhook server and alert on p99 latency > 2 seconds. The API server's own metrics expose apiserver_admission_webhook_admission_duration_seconds.