Policy as Code Concepts
Policy as Code Concepts
Every organization has rules: no container runs as root, every S3 bucket must have encryption, every Kubernetes Deployment must declare resource limits, every IAM role needs a usage justification tag. Traditionally these rules lived in PDFs, wikis, and tribal knowledge — enforced (or not) during quarterly audits. Policy as Code is the practice of expressing those rules as machine-readable, version-controlled, testable code that runs automatically in the same pipeline as your application. The result is compliance that operates at the speed of CI/CD instead of the speed of an audit cycle.
The Three Operational Modes: Prevent, Detect, Remediate
Any policy enforcement system operates in one or more of three modes. Understanding the tradeoffs of each is essential before you choose tooling or write a single rule.
Prevent (Admission Control)
The policy engine sits in the critical path of a write operation — a kubectl apply, a Terraform plan apply, a pull request merge, a CloudFormation stack update — and rejects non-compliant resources before they ever exist. Nothing violating the policy can be created. This is the highest-value mode: it eliminates entire classes of incidents by making the violation impossible.
The cost of prevention is latency and blast radius. A misconfigured policy that rejects valid resources causes deployment failures and on-call escalations. At Google and Meta, admission webhooks are deployed in dry-run mode first, with metrics on would-have-rejected counts, before being set to enforce. A single overly broad prevent rule can block every deployment across a cluster of thousands of teams.
Detect (Continuous Audit)
The policy engine runs against the current state of your environment — periodically or on-demand — and reports violations without blocking anything. Resources that already exist and violate policy are surfaced in a dashboard or routed to an alert. Detection is appropriate for legacy environments where you cannot yet prevent violations (too many existing exceptions to enumerate) and for policies that are aspirational rather than hard requirements.
Detection without a remediation path is just expensive reporting. Every finding that sits unresolved for more than a sprint degrades the signal-to-noise ratio of your compliance dashboard until engineers stop looking at it. Detection must always feed into a workflow.
Remediate (Auto-Correction)
When a violation is detected, the system automatically mutates the resource back into compliance — patching a missing label, enabling encryption on a storage bucket, quarantining a non-compliant node. This is the most powerful mode and the most dangerous. Automated mutation in production requires extraordinary confidence in the policy logic and a kill switch: a way to disable remediation when it starts a feedback loop.
A well-known failure mode: a remediation controller that patches a resource triggers a reconciliation loop in the application controller that reverts the patch, which triggers the remediation controller again — a tight CPU-burning loop that floods the Kubernetes API server. Always enforce an exponential backoff and a circuit breaker on any remediation controller.
Codifying Rules: What a Policy Actually Is
A policy is a boolean function over an input document. The input is the resource being evaluated — a Kubernetes manifest, a Terraform plan, a CloudFormation template, an IAM role definition, a container image layer list. The output is a decision: allow or deny, with a human-readable reason. Everything else — the runtime, the language, the integration point — is infrastructure for executing that function at the right moment.
A well-codified rule has four components:
- Scope: what resource types and namespaces does this rule apply to?
- Condition: what attribute or combination of attributes triggers a violation?
- Action: prevent, warn, or remediate?
- Message: a human-readable explanation that tells the engineer exactly what to fix and why.
A Concrete Example: Prevent + Detect + Remediate for a Single Rule
Consider the rule: "No Kubernetes Pod may run with privileged: true." Here is how each mode implements it:
The Policy Lifecycle: Shift Left Across the SDLC
The most effective policy implementations operate at multiple stages simultaneously, catching violations as early as possible. The further right a violation is caught (production vs. local IDE), the more expensive it is to fix — by orders of magnitude.
- IDE / pre-commit:
conftest,kube-linter,tfsecrun locally and on every commit. Instant feedback, no infrastructure required. Catches the majority of common mistakes before they ever enter a PR. - CI pipeline: The same tools run as required checks in GitHub Actions, GitLab CI, or Jenkins. The pipeline fails and the PR cannot merge. This is the primary enforcement gate for IaC.
- Admission webhook (runtime): Gatekeeper or Kyverno blocks non-compliant
kubectl applycalls even if CI was bypassed (manualkubectlby an SRE, a Helm chart installed directly). Defense in depth. - Continuous audit: A reconciliation loop compares the live state of every resource against every policy and emits findings to a SIEM or compliance dashboard. Catches configuration drift, manual changes, and violations that predate the policy.
Audit mode to avoid disruption, then never graduate them to Enforce. Audit findings accumulate in a dashboard that nobody is accountable for. Treat every audit-mode policy as having a graduation deadline — 30 or 60 days — after which it must either be enforced or explicitly deferred with a documented exception. Policy debt compounds exactly like technical debt.
Policy as Code Requires Policy Testing
A policy that has never been tested is just structured documentation. Every policy file must have a companion test suite that exercises both the allow path (valid resources that must pass) and the deny path (invalid resources that must be rejected). OPA ships the opa test command; Kyverno has kyverno test. Run these in CI as part of your policy repository's pipeline — breaking a policy test is a build failure just like breaking an application unit test.