Advanced Kubernetes Operations

Custom Resources & Operators

22 min Lesson 3 of 30

Custom Resources & Operators

Kubernetes ships with built-in resource types — Deployments, Services, ConfigMaps — but its real power is extensibility. Custom Resource Definitions (CRDs) let you teach Kubernetes about your own domain objects: a Database, a TLSCertificate, a KafkaTopic. The Operator pattern wraps a CRD with a control loop that acts on it, encoding the operational knowledge that would otherwise live in a runbook. This is how Datadog, Confluent, MongoDB, and virtually every other data infrastructure company deliver Kubernetes-native products.

What Is a Custom Resource Definition?

A CRD is itself a Kubernetes API object — stored in etcd, managed by the API server — that registers a new REST endpoint under a given group/version/kind. Once you apply a CRD, you can kubectl apply -f objects of that kind just like Pods or Deployments, and the API server validates them against the OpenAPI v3 schema you declared.

The lifecycle is simple: install the CRD once (usually via a Helm chart or an operator install manifest), then create Custom Resources (CRs) — instances of that kind. The CRD defines the shape; the CR is the data.

# A minimal CRD that introduces a "Database" kind in group dba.example.com apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: databases.dba.example.com # must be . spec: group: dba.example.com scope: Namespaced # or Cluster names: plural: databases singular: database kind: Database shortNames: [db] versions: - name: v1alpha1 served: true storage: true # only ONE version is storage=true schema: openAPIV3Schema: type: object properties: spec: type: object required: [engine, storage] properties: engine: type: string enum: [postgres, mysql, mariadb] storage: type: string # e.g. "50Gi" replicas: type: integer minimum: 1 default: 1 status: type: object properties: phase: type: string endpoint: type: string subresources: status: {} # enables /status subresource (important!) additionalPrinterColumns: - name: Engine type: string jsonPath: .spec.engine - name: Phase type: string jsonPath: .status.phase - name: Age type: date jsonPath: .metadata.creationTimestamp
Always define an OpenAPI v3 schema. Without it the API server accepts any garbage in spec. A schema gives you server-side validation, auto-complete in IDE plugins, and protection against fat-finger mistakes in production.

Once the CRD is applied, creating a CR is identical to any other Kubernetes object:

# Custom Resource instance — a "Database" object apiVersion: dba.example.com/v1alpha1 kind: Database metadata: name: orders-db namespace: commerce spec: engine: postgres storage: 100Gi replicas: 3 --- # Inspect it like any native resource kubectl get db -n commerce kubectl describe db orders-db -n commerce kubectl get db orders-db -n commerce -o jsonpath='{.status.phase}'

The Operator Pattern

A CRD alone is inert — the API server stores it, but nothing acts on it. An Operator is a controller that watches CRs and drives the cluster toward the desired state declared in them. This is the same reconciliation loop that powers built-in controllers (the Deployment controller, the StatefulSet controller), but written by you or a vendor to encode domain knowledge.

Operator reconciliation loop Engineer kubectl apply API Server validates schema stores CR in etcd emits Watch event Watch Operator Observe (Get CR) Diff desired vs actual Act (create StatefulSet, Service, Secrets…) Update status Create/Patch Cluster Resources StatefulSet, Service Secret, PVC, etc. Reconcile on drift
Operator reconciliation loop: the operator watches CRs, computes a diff, acts on the cluster, and updates status — indefinitely.

Reconciliation: Observe, Diff, Act

Every operator implements a Reconcile function that receives a request (a namespace/name pair). The function must be idempotent — it may be called thousands of times. The canonical flow is:

  1. Fetch the CR from the API server (Get).
  2. Compute what the cluster should look like given spec.
  3. Compare against what actually exists (owned resources).
  4. Act — create, update, or delete owned resources.
  5. Update status on the CR to reflect real-world state.

Frameworks like controller-runtime (Go, used by kubebuilder and Operator SDK) handle the watch infrastructure, work queues, and leader election. You write only step 1-5.

Use owner references for garbage collection. When the operator creates child resources (StatefulSets, Services, Secrets), set the CR as the ownerReference. When the CR is deleted, Kubernetes automatically cascades deletion to all owned resources — no custom finalizer logic needed for most cases.

Building an Operator: Kubebuilder Quickstart

Kubebuilder is the canonical Go-based scaffolding tool, maintained by the Kubernetes SIG API Machinery. It generates the CRD manifest, controller skeleton, webhook scaffolding, and RBAC markers from a single CLI workflow.

# Install kubebuilder CLI (requires Go 1.21+) curl -L -o kubebuilder \ "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)" chmod +x kubebuilder && mv kubebuilder /usr/local/bin/ # Bootstrap a new operator project mkdir database-operator && cd database-operator kubebuilder init --domain dba.example.com --repo github.com/acme/database-operator # Scaffold an API (generates CRD + controller) kubebuilder create api \ --group dba \ --version v1alpha1 \ --kind Database \ --resource \ --controller # Generate CRD manifests from Go type annotations make manifests # Run the controller locally against a real cluster (uses kubeconfig) make run # Deploy to the cluster (builds Docker image, pushes, installs CRDs, deploys controller) make docker-build docker-push IMG=ghcr.io/acme/database-operator:v0.1.0 make deploy IMG=ghcr.io/acme/database-operator:v0.1.0
Operator SDK vs Kubebuilder vs Helm-based operators. The Operator SDK builds on kubebuilder (same result, extra tooling). Helm-based operators wrap a Helm chart — they are quick to build but cannot express complex reconciliation logic. Python/Kopf operators are easy to prototype but rarely reach production at scale. For anything stateful or operationally complex, use Go + kubebuilder.

When to Build an Operator (and When Not To)

Operators introduce real operational overhead: you own the code, the CRD versioning and conversion, and the upgrade path. The decision framework used at big-tech companies:

  • Build an operator when you have a stateful component with complex lifecycle logic (failover, backup, schema migrations, rolling restarts) that cannot be expressed as a Helm chart or standard Kubernetes objects alone.
  • Use an existing operator for well-known databases, message queues, and observability stacks — Prometheus Operator, cert-manager, CloudNativePG, Strimzi (Kafka), and KEDA cover the vast majority of use cases.
  • Do not build an operator for stateless workloads. A Deployment + HPA + ConfigMap is almost always sufficient. The complexity budget is real.
CRD versioning is permanent. Once users deploy CRs of version v1alpha1, you cannot remove that version without a conversion webhook and a migration window. Plan your API surface carefully before GA — model it the same way you would a public REST API.

Status Subresource and Conditions

The status subresource (enabled via subresources: status: {} in the CRD) makes status a separate API endpoint — kubectl patch on /status does not require RBAC on the main resource, and spec updates do not accidentally overwrite status. Always model status using the Conditions pattern (an array of type/status/reason/message/lastTransitionTime entries) — this matches what kubectl wait --for=condition=Ready expects and what SRE dashboards can query.

# Wait for an operator-managed resource to become Ready kubectl wait db/orders-db \ --for=condition=Ready \ --timeout=300s \ -n commerce # Check conditions directly kubectl get db orders-db -n commerce \ -o jsonpath='{.status.conditions[*].type}{"\n"}{.status.conditions[*].status}' # View operator controller logs kubectl logs -n database-operator-system \ -l control-plane=controller-manager \ --tail=100 -f

Mastering CRDs and the operator pattern unlocks the full extensibility of Kubernetes. Every major cloud-native project — cert-manager, Argo CD, Istio, Tekton, KEDA — is an operator at its core. Understanding how they work internally makes you far more effective when debugging them in production and positions you to build your own when the problem genuinely demands it.