Helm & Kubernetes Packaging

Hooks & Tests

18 min Lesson 7 of 28

Hooks & Tests

A Helm release is more than a batch of applied YAML — it is a lifecycle event. Database schemas must be migrated before new pods start. Secrets must be seeded into a vault before the application boots. Smoke tests must pass after an upgrade before traffic is re-routed. Helm exposes all of these integration points through two mechanisms: lifecycle hooks and helm test. Understanding both is what separates teams that confidently ship to production from teams that script around Helm's edges with brittle shell wrappers.

How Hooks Work

A hook is a standard Kubernetes manifest — most often a Job — that carries the annotation helm.sh/hook. Helm intercepts the manifest before the normal apply phase, executes it at the annotated lifecycle point, waits for it to complete, and then continues. The hook manifest is not tracked as a regular release resource; it has its own deletion policy controlled by helm.sh/hook-delete-policy.

The most important hooks in day-to-day production use:

  • pre-install — runs after templates are rendered, before any release resources are created. Use for: initialising a database schema, seeding an admin user, creating a namespace-scoped RBAC binding that the app depends on.
  • post-install — runs after all release resources are created and ready. Use for: sending a Slack notification, populating a cache with warm data, registering the service in an external service registry.
  • pre-upgrade — runs before the upgrade apply. Use for: running database migrations (the most common production use), taking a pre-upgrade snapshot of a stateful resource, draining a queue before the consumer pod is replaced.
  • post-upgrade — runs after upgrade is complete. Use for: running smoke tests inline, invalidating a CDN cache, triggering a downstream pipeline.
  • pre-delete — runs before release resources are deleted. Use for: gracefully terminating long-running workers, archiving audit logs, revoking service credentials.
  • pre-rollback / post-rollback — mirror of the upgrade hooks, rarely needed but valuable for stateful apps that require schema downgrade scripts.
Helm lifecycle hook execution order helm install / helm upgrade lifecycle pre-install / pre-upgrade Apply Release Resources Wait for Readiness post-install / post-upgrade OK Hook Job detail (pre-upgrade: db-migrate) Job Created by Helm Pod Runs migrate.sh Helm Unblocks Job exit 0 Helm Fails Release exit non-0 → rollback helm test runs post-deploy validation pods (annotation: helm.sh/hook: test)
Helm hook execution order for install/upgrade, showing how a pre-upgrade Job gates the release and how failure triggers automatic rollback.

Writing a pre-upgrade Migration Hook

The most production-critical hook pattern is a database migration Job that runs before new application pods are scheduled. Here is a production-grade template:

# templates/hooks/db-migrate.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-db-migrate-{{ .Release.Revision }} labels: {{- include "myapp.labels" . | nindent 4 }} annotations: # Run before the upgrade applies new Deployment pods "helm.sh/hook": pre-upgrade,pre-install # Weight controls ordering when multiple hooks exist; lower runs first "helm.sh/hook-weight": "-5" # Delete the Job after it succeeds (keeps the namespace clean) "helm.sh/hook-delete-policy": hook-succeeded spec: # Retry up to 3 times before failing the release backoffLimit: 3 activeDeadlineSeconds: 300 template: metadata: name: {{ include "myapp.fullname" . }}-db-migrate spec: restartPolicy: Never serviceAccountName: {{ include "myapp.serviceAccountName" . }} containers: - name: migrate image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" command: ["python", "manage.py", "migrate", "--noinput"] env: - name: DATABASE_URL valueFrom: secretKeyRef: name: {{ include "myapp.fullname" . }}-db-secret key: url resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi
Why append .Release.Revision to the Job name? Kubernetes Job names must be unique. Without the revision suffix, the second upgrade attempt fails immediately because the Job from the first attempt still exists (or is in a failed state). Always include the revision or a timestamp in hook Job names.

Hook Weights and Ordering

When a release has multiple hooks of the same type, Helm orders them by the integer value of helm.sh/hook-weight — lowest integer first. Hooks with the same weight execute in alphabetical order by resource name. This matters when you need: (1) create a temporary admin DB user before running migrations, then (2) run migrations, then (3) revoke the admin user — three separate hooks with weights -10, 0, and 10.

Hook Delete Policies

Three values control when Helm deletes a hook resource after it runs:

  • hook-succeeded — delete the Job only if it completes successfully. Failed Jobs are preserved so you can inspect logs. This is the recommended default.
  • hook-failed — delete even on failure. Use when hook resources contain sensitive data (migration credentials) that must not persist.
  • before-hook-creation — delete any previous hook resource of the same name before creating a new one. This solves the duplicate-name problem for teams that prefer a fixed Job name.
Production pitfall — missing delete policy: If you omit helm.sh/hook-delete-policy, Helm uses before-hook-creation by default. That means a failed migration Job is deleted silently on the next deploy attempt, taking its logs with it. Always set hook-succeeded explicitly so failed Jobs stick around for debugging, and drain logs to your centralised log aggregator (Loki, CloudWatch, Datadog) so they survive Pod deletion.

helm test — Post-Deploy Smoke Tests

helm test is a first-class command that runs a set of special hook Pods annotated with "helm.sh/hook": test. These are not unit tests — they are release validation tests that run against the live deployed release to verify it is actually working. Think of them as automated smoke tests that anyone can trigger: helm test <release-name>.

A canonical test Pod checks the HTTP health endpoint of the deployed service:

# templates/tests/test-connection.yaml apiVersion: v1 kind: Pod metadata: name: {{ include "myapp.fullname" . }}-test-connection labels: {{- include "myapp.labels" . | nindent 4 }} annotations: "helm.sh/hook": test "helm.sh/hook-delete-policy": hook-succeeded spec: restartPolicy: Never containers: - name: wget image: busybox:1.36 command: - sh - -c - | set -e echo "Testing HTTP health endpoint..." wget -qO- http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/healthz echo "Testing API liveness..." response=$(wget -qO- http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/v1/status) echo "$response" | grep -q '"status":"ok"' echo "All checks passed." resources: requests: cpu: 50m memory: 32Mi
# Run tests against a deployed release helm test myapp-prod -n production # Stream logs from test pods as they run helm test myapp-prod -n production --logs # Example output: # NAME: myapp-prod # LAST DEPLOYED: Wed Jun 11 14:22:31 2025 # STATUS: deployed # TEST SUITE: myapp-prod-test-connection # Last Started: Wed Jun 11 14:25:00 2025 # Last Completed: Wed Jun 11 14:25:03 2025 # Phase: Succeeded

Testing Patterns at Scale

A production-grade test suite in a Helm chart covers more than a single HTTP check. Organise multiple test Pods by concern, each annotated with "helm.sh/hook": test:

  • Connectivity test — can the service resolve DNS and reach its database? Use psql -c "\l" or redis-cli PING from a sidecar image.
  • Auth test — does the API return 401 on unauthenticated requests and 200 with a valid token? A curl-based Pod with an injected test credential (from a Secret) exercises this path.
  • Data integrity test — after a migration, does querying a known row return the expected schema? A migration hook that writes a _schema_version sentinel row, paired with a test Pod that reads it, gives you schema/app version alignment verification on every deploy.
Big-tech practice: At companies like Netflix and Stripe, the CI/CD pipeline is: build image → push to registry → helm upgrade --install --wait --atomic to a staging namespace → helm test (10–30 seconds) → gate production promotion on test exit code. If helm test fails, the pipeline aborts before the production upgrade even starts. This costs 30 seconds per deploy and has caught hundreds of regressions that would otherwise have reached production.

CRD Hooks and the Special pre-install Pattern

One of the most common reasons charts fail on fresh installs is that a Custom Resource Definition (CRD) is applied after the resources that depend on it. Helm has a dedicated crds/ directory that auto-installs CRDs before everything else, but when you are composing third-party CRDs as a dependency, a pre-install hook Job can apply the CRD manifest explicitly and wait for the API server to register it before proceeding:

# templates/hooks/install-crds.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-install-crds annotations: "helm.sh/hook": pre-install "helm.sh/hook-weight": "-10" "helm.sh/hook-delete-policy": hook-succeeded spec: backoffLimit: 2 template: spec: restartPolicy: Never serviceAccountName: crd-installer-sa # needs cluster-scoped RBAC containers: - name: kubectl image: bitnami/kubectl:1.30 command: - sh - -c - | kubectl apply -f https://raw.githubusercontent.com/org/repo/main/config/crd/bases/myoperator.io_myresource.yaml kubectl wait --for=condition=Established crd/myresources.myoperator.io --timeout=60s

Debugging Hooks

When a hook fails, Helm marks the release as failed and (with --atomic) triggers a rollback. The hook Job remains in the namespace (if you used hook-succeeded as the delete policy). Inspect it like any other failed Job:

# List all hook Jobs in a namespace kubectl get jobs -n production -l "helm.sh/chart=myapp-1.4.2" # Get logs from the failed migration pod kubectl logs -n production -l "job-name=myapp-prod-db-migrate-7" --previous # Describe the Job to see event history and exit codes kubectl describe job myapp-prod-db-migrate-7 -n production # If you need to re-run the hook without a full upgrade: # 1. Delete the failed Job manually kubectl delete job myapp-prod-db-migrate-7 -n production # 2. Run helm upgrade again — hook will re-execute helm upgrade myapp-prod ./mychart -n production --reuse-values

Hooks and tests transform a Helm release from a static manifest apply into a choreographed, self-validating deployment pipeline. The next lesson — lesson 8 — covers publishing and versioning your charts so that other teams can consume them reliably from a registry.