Custom Resources & Operators
Custom Resources & Operators
Kubernetes ships with built-in resource types — Deployments, Services, ConfigMaps — but its real power is extensibility. Custom Resource Definitions (CRDs) let you teach Kubernetes about your own domain objects: a Database, a TLSCertificate, a KafkaTopic. The Operator pattern wraps a CRD with a control loop that acts on it, encoding the operational knowledge that would otherwise live in a runbook. This is how Datadog, Confluent, MongoDB, and virtually every other data infrastructure company deliver Kubernetes-native products.
What Is a Custom Resource Definition?
A CRD is itself a Kubernetes API object — stored in etcd, managed by the API server — that registers a new REST endpoint under a given group/version/kind. Once you apply a CRD, you can kubectl apply -f objects of that kind just like Pods or Deployments, and the API server validates them against the OpenAPI v3 schema you declared.
The lifecycle is simple: install the CRD once (usually via a Helm chart or an operator install manifest), then create Custom Resources (CRs) — instances of that kind. The CRD defines the shape; the CR is the data.
spec. A schema gives you server-side validation, auto-complete in IDE plugins, and protection against fat-finger mistakes in production.
Once the CRD is applied, creating a CR is identical to any other Kubernetes object:
The Operator Pattern
A CRD alone is inert — the API server stores it, but nothing acts on it. An Operator is a controller that watches CRs and drives the cluster toward the desired state declared in them. This is the same reconciliation loop that powers built-in controllers (the Deployment controller, the StatefulSet controller), but written by you or a vendor to encode domain knowledge.
Reconciliation: Observe, Diff, Act
Every operator implements a Reconcile function that receives a request (a namespace/name pair). The function must be idempotent — it may be called thousands of times. The canonical flow is:
- Fetch the CR from the API server (Get).
- Compute what the cluster should look like given
spec. - Compare against what actually exists (owned resources).
- Act — create, update, or delete owned resources.
- Update status on the CR to reflect real-world state.
Frameworks like controller-runtime (Go, used by kubebuilder and Operator SDK) handle the watch infrastructure, work queues, and leader election. You write only step 1-5.
ownerReference. When the CR is deleted, Kubernetes automatically cascades deletion to all owned resources — no custom finalizer logic needed for most cases.
Building an Operator: Kubebuilder Quickstart
Kubebuilder is the canonical Go-based scaffolding tool, maintained by the Kubernetes SIG API Machinery. It generates the CRD manifest, controller skeleton, webhook scaffolding, and RBAC markers from a single CLI workflow.
When to Build an Operator (and When Not To)
Operators introduce real operational overhead: you own the code, the CRD versioning and conversion, and the upgrade path. The decision framework used at big-tech companies:
- Build an operator when you have a stateful component with complex lifecycle logic (failover, backup, schema migrations, rolling restarts) that cannot be expressed as a Helm chart or standard Kubernetes objects alone.
- Use an existing operator for well-known databases, message queues, and observability stacks — Prometheus Operator, cert-manager, CloudNativePG, Strimzi (Kafka), and KEDA cover the vast majority of use cases.
- Do not build an operator for stateless workloads. A Deployment + HPA + ConfigMap is almost always sufficient. The complexity budget is real.
v1alpha1, you cannot remove that version without a conversion webhook and a migration window. Plan your API surface carefully before GA — model it the same way you would a public REST API.
Status Subresource and Conditions
The status subresource (enabled via subresources: status: {} in the CRD) makes status a separate API endpoint — kubectl patch on /status does not require RBAC on the main resource, and spec updates do not accidentally overwrite status. Always model status using the Conditions pattern (an array of type/status/reason/message/lastTransitionTime entries) — this matches what kubectl wait --for=condition=Ready expects and what SRE dashboards can query.
Mastering CRDs and the operator pattern unlocks the full extensibility of Kubernetes. Every major cloud-native project — cert-manager, Argo CD, Istio, Tekton, KEDA — is an operator at its core. Understanding how they work internally makes you far more effective when debugging them in production and positions you to build your own when the problem genuinely demands it.