Remote State & Backends
Remote State & Backends
In the previous lesson you learned what Terraform state is and why it exists. By default, Terraform writes that state to a file named terraform.tfstate on your local disk. That works for learning — and for absolutely nothing else. The moment a second engineer touches the same infrastructure, or a CI runner executes a plan, local state causes split-brain: two operators each hold a different view of reality, and the next apply can silently destroy resources the other person created. Remote backends solve this by storing state in a shared, durable location and — critically — by adding a locking mechanism so that only one operation can mutate state at a time.
This lesson covers the two backends you will encounter in nearly every production organisation (S3 with DynamoDB locking, and HTTP backends), how state locking works and what happens when it fails, and how to handle the sensitive data that Terraform inevitably writes into state.
Why Remote State Is Non-Negotiable in Teams
Local state breaks in three distinct ways that each take a painful incident to learn:
- No sharing: A second engineer cloning the repo has no state file. Their first
terraform planshows every resource as "will be created" — infrastructure that already exists in the cloud. - No locking: Two CI jobs running simultaneously can both read the same state, both compute a plan, and then both write back — with the second write silently overwriting the first. Resources get orphaned with no record in state.
- No durability: A laptop drive failure or a corrupted
.gitrepo that someone force-pushed state into means the state is gone. Reconciling what Terraform thinks exists versus what the cloud actually has is a multi-day forensic exercise.
The S3 + DynamoDB Backend
This is the de-facto standard for AWS-based infrastructure. State is stored as a JSON object in an S3 bucket (with versioning and server-side encryption enabled). Locking is provided by a DynamoDB table with a single string attribute named LockID. When a Terraform operation starts, it writes a lock item to DynamoDB; when it finishes (success or failure), it deletes the item. Any concurrent operation that tries to write the same lock item gets a DynamoDB conditional-check failure and Terraform exits with an error rather than proceeding without the lock.
With the bucket and table provisioned, configure the backend in your Terraform root module. Backend configuration lives in a terraform {} block and cannot reference variables or locals — the values must be static strings. This is intentional: Terraform needs to resolve the backend before it can evaluate anything else in the configuration.
key parameter is the S3 object path within the bucket. A flat naming scheme (prod.tfstate) becomes unmaintainable at scale. Use a hierarchy that mirrors your service tree: <team>/<service>/<environment>/terraform.tfstate. Many organisations also separate network, compute, and data tiers into distinct state files so a broken compute module cannot corrupt the network state. This is the "state isolation" principle and it is one of the most impactful structural decisions you will make on a Terraform project.State Locking: How It Works and What to Do When It Breaks
Every Terraform command that could modify state — apply, destroy, state mv, import — acquires a lock before starting. Commands that only read state — plan, output, show — do NOT acquire a lock by default (though they can with -lock=true). The lock record stored in DynamoDB contains the operation type, the machine hostname, the Terraform version, and a timestamp.
terraform force-unlock while another apply is genuinely in progress, you have removed the only concurrency guard. The next operation will read stale state, compute an incorrect plan, and could delete or recreate resources that the first operation was in the middle of modifying. Always confirm the locking process is dead (check CI job status, ping the engineer) before unlocking. At minimum, wait 10 minutes past the lock timestamp.HTTP Backends (GitLab, Terraform Cloud, Custom)
The HTTP backend is a generic interface: Terraform performs GET, POST (update), DELETE (unlock) requests against any HTTP server that implements the protocol. GitLab CI/CD has a built-in HTTP state backend (one per project, per environment), which makes it the default choice for organisations already on GitLab. Terraform Cloud and HCP Terraform also use an HTTP-compatible protocol under the hood, though they expose a richer API.
Sensitive Data in State: The Production Reality
Terraform state is not a simple inventory. It stores all attributes of every managed resource — including the ones your cloud provider marks as sensitive. A freshly created RDS instance writes the master password in plaintext to state. An IAM access key writes the secret in plaintext. A TLS certificate resource writes the private key. This is not a Terraform bug; it is an unavoidable consequence of idempotent infrastructure management: Terraform must know what the current value is to decide whether it needs to change.
- Encrypt state at rest: Always enable S3 SSE (AES256 or AWS KMS with a customer-managed key). For highly regulated environments, use a KMS CMK so you can audit and rotate the encryption key independently.
- Restrict access with IAM: Only the roles that run Terraform should have
s3:GetObject,s3:PutObject, anddynamodb:PutItemon the state bucket and lock table. Engineers should NOT have direct S3 access to production state — they should interact via CI pipelines only. - Never commit state to git: Add
*.tfstateand*.tfstate.backupto.gitignoreon every Terraform project. Usegit-secretsor a pre-commit hook to block accidental commits. - Use
sensitive = trueon outputs: Mark any output that contains a secret value as sensitive. Terraform will redact it from CLI output and plan files — but it will still be in state. Sensitivity in Terraform is a UX guard, not a security boundary.
encrypt = true (AES256) with a KMS CMK for production workloads: add kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/mrk-..." to your backend config. This gives you CloudTrail-audited key usage, key rotation, and the ability to revoke access to all historical state by disabling the key — a control that AES256 with AWS-managed keys cannot provide.Migrating Between Backends
When you change the backend configuration — for example, moving from local to S3, or changing the S3 key — Terraform detects the change on the next terraform init and prompts you to migrate existing state. Always run terraform plan immediately after migration to confirm the migrated state matches what the cloud actually has. A successful migration shows zero planned changes.
Summary
Remote state backends are the operational foundation of any team-based Terraform workflow. The S3 + DynamoDB combination provides object storage durability, versioned history, encryption at rest, and atomic locking — covering every failure mode of local state. HTTP backends (GitLab, Terraform Cloud) offer the same guarantees through a standardised protocol. Understanding state locking — how it is acquired, how to safely recover from stale locks, and why concurrent operations without locking cause data corruption — is knowledge that separates engineers who use Terraform from engineers who operate it safely at scale. Finally, treating state as a sensitive artifact (encrypting it, restricting access, never committing it to git) is not optional: your state file is a partial dump of every secret your infrastructure holds.