Helm & Kubernetes Packaging

Hooks & Tests

18 min Lesson 7 of 28

Hooks & Tests

A Helm release is more than a batch of applied YAML — it is a lifecycle event. Database schemas must be migrated before new pods start. Secrets must be seeded into a vault before the application boots. Smoke tests must pass after an upgrade before traffic is re-routed. Helm exposes all of these integration points through two mechanisms: lifecycle hooks and helm test. Understanding both is what separates teams that confidently ship to production from teams that script around Helm's edges with brittle shell wrappers.

How Hooks Work

A hook is a standard Kubernetes manifest — most often a Job — that carries the annotation helm.sh/hook. Helm intercepts the manifest before the normal apply phase, executes it at the annotated lifecycle point, waits for it to complete, and then continues. The hook manifest is not tracked as a regular release resource; it has its own deletion policy controlled by helm.sh/hook-delete-policy.

The most important hooks in day-to-day production use:

pre-install — runs after templates are rendered, before any release resources are created. Use for: initialising a database schema, seeding an admin user, creating a namespace-scoped RBAC binding that the app depends on.
post-install — runs after all release resources are created and ready. Use for: sending a Slack notification, populating a cache with warm data, registering the service in an external service registry.
pre-upgrade — runs before the upgrade apply. Use for: running database migrations (the most common production use), taking a pre-upgrade snapshot of a stateful resource, draining a queue before the consumer pod is replaced.
post-upgrade — runs after upgrade is complete. Use for: running smoke tests inline, invalidating a CDN cache, triggering a downstream pipeline.
pre-delete — runs before release resources are deleted. Use for: gracefully terminating long-running workers, archiving audit logs, revoking service credentials.
pre-rollback / post-rollback — mirror of the upgrade hooks, rarely needed but valuable for stateful apps that require schema downgrade scripts.

Helm hook execution order for install/upgrade, showing how a pre-upgrade Job gates the release and how failure triggers automatic rollback.

Writing a pre-upgrade Migration Hook

The most production-critical hook pattern is a database migration Job that runs before new application pods are scheduled. Here is a production-grade template:

# templates/hooks/db-migrate.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "myapp.fullname" . }}-db-migrate-{{ .Release.Revision }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
  annotations:
    # Run before the upgrade applies new Deployment pods
    "helm.sh/hook": pre-upgrade,pre-install
    # Weight controls ordering when multiple hooks exist; lower runs first
    "helm.sh/hook-weight": "-5"
    # Delete the Job after it succeeds (keeps the namespace clean)
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  # Retry up to 3 times before failing the release
  backoffLimit: 3
  activeDeadlineSeconds: 300
  template:
    metadata:
      name: {{ include "myapp.fullname" . }}-db-migrate
    spec:
      restartPolicy: Never
      serviceAccountName: {{ include "myapp.serviceAccountName" . }}
      containers:
        - name: migrate
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          command: ["python", "manage.py", "migrate", "--noinput"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: {{ include "myapp.fullname" . }}-db-secret
                  key: url
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi

Why append .Release.Revision to the Job name? Kubernetes Job names must be unique. Without the revision suffix, the second upgrade attempt fails immediately because the Job from the first attempt still exists (or is in a failed state). Always include the revision or a timestamp in hook Job names.

Hook Weights and Ordering

When a release has multiple hooks of the same type, Helm orders them by the integer value of helm.sh/hook-weight — lowest integer first. Hooks with the same weight execute in alphabetical order by resource name. This matters when you need: (1) create a temporary admin DB user before running migrations, then (2) run migrations, then (3) revoke the admin user — three separate hooks with weights -10, 0, and 10.

Hook Delete Policies

Three values control when Helm deletes a hook resource after it runs:

hook-succeeded — delete the Job only if it completes successfully. Failed Jobs are preserved so you can inspect logs. This is the recommended default.
hook-failed — delete even on failure. Use when hook resources contain sensitive data (migration credentials) that must not persist.
before-hook-creation — delete any previous hook resource of the same name before creating a new one. This solves the duplicate-name problem for teams that prefer a fixed Job name.

Production pitfall — missing delete policy: If you omit helm.sh/hook-delete-policy, Helm uses before-hook-creation by default. That means a failed migration Job is deleted silently on the next deploy attempt, taking its logs with it. Always set hook-succeeded explicitly so failed Jobs stick around for debugging, and drain logs to your centralised log aggregator (Loki, CloudWatch, Datadog) so they survive Pod deletion.

helm test — Post-Deploy Smoke Tests

helm test is a first-class command that runs a set of special hook Pods annotated with "helm.sh/hook": test. These are not unit tests — they are release validation tests that run against the live deployed release to verify it is actually working. Think of them as automated smoke tests that anyone can trigger: helm test <release-name>.

A canonical test Pod checks the HTTP health endpoint of the deployed service:

# templates/tests/test-connection.yaml
apiVersion: v1
kind: Pod
metadata:
  name: {{ include "myapp.fullname" . }}-test-connection
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": test
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  restartPolicy: Never
  containers:
    - name: wget
      image: busybox:1.36
      command:
        - sh
        - -c
        - |
          set -e
          echo "Testing HTTP health endpoint..."
          wget -qO- http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/healthz
          echo "Testing API liveness..."
          response=$(wget -qO- http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/v1/status)
          echo "$response" | grep -q '"status":"ok"'
          echo "All checks passed."
      resources:
        requests:
          cpu: 50m
          memory: 32Mi

# Run tests against a deployed release
helm test myapp-prod -n production

# Stream logs from test pods as they run
helm test myapp-prod -n production --logs

# Example output:
# NAME: myapp-prod
# LAST DEPLOYED: Wed Jun 11 14:22:31 2025
# STATUS: deployed
# TEST SUITE:     myapp-prod-test-connection
# Last Started:   Wed Jun 11 14:25:00 2025
# Last Completed: Wed Jun 11 14:25:03 2025
# Phase:          Succeeded

Testing Patterns at Scale

A production-grade test suite in a Helm chart covers more than a single HTTP check. Organise multiple test Pods by concern, each annotated with "helm.sh/hook": test:

Connectivity test — can the service resolve DNS and reach its database? Use psql -c "\l" or redis-cli PING from a sidecar image.
Auth test — does the API return 401 on unauthenticated requests and 200 with a valid token? A curl-based Pod with an injected test credential (from a Secret) exercises this path.
Data integrity test — after a migration, does querying a known row return the expected schema? A migration hook that writes a _schema_version sentinel row, paired with a test Pod that reads it, gives you schema/app version alignment verification on every deploy.

Big-tech practice: At companies like Netflix and Stripe, the CI/CD pipeline is: build image → push to registry → helm upgrade --install --wait --atomic to a staging namespace → helm test (10–30 seconds) → gate production promotion on test exit code. If helm test fails, the pipeline aborts before the production upgrade even starts. This costs 30 seconds per deploy and has caught hundreds of regressions that would otherwise have reached production.

CRD Hooks and the Special pre-install Pattern

One of the most common reasons charts fail on fresh installs is that a Custom Resource Definition (CRD) is applied after the resources that depend on it. Helm has a dedicated crds/ directory that auto-installs CRDs before everything else, but when you are composing third-party CRDs as a dependency, a pre-install hook Job can apply the CRD manifest explicitly and wait for the API server to register it before proceeding:

# templates/hooks/install-crds.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "myapp.fullname" . }}-install-crds
  annotations:
    "helm.sh/hook": pre-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  backoffLimit: 2
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: crd-installer-sa   # needs cluster-scoped RBAC
      containers:
        - name: kubectl
          image: bitnami/kubectl:1.30
          command:
            - sh
            - -c
            - |
              kubectl apply -f https://raw.githubusercontent.com/org/repo/main/config/crd/bases/myoperator.io_myresource.yaml
              kubectl wait --for=condition=Established crd/myresources.myoperator.io --timeout=60s

Debugging Hooks

When a hook fails, Helm marks the release as failed and (with --atomic) triggers a rollback. The hook Job remains in the namespace (if you used hook-succeeded as the delete policy). Inspect it like any other failed Job:

# List all hook Jobs in a namespace
kubectl get jobs -n production -l "helm.sh/chart=myapp-1.4.2"

# Get logs from the failed migration pod
kubectl logs -n production -l "job-name=myapp-prod-db-migrate-7" --previous

# Describe the Job to see event history and exit codes
kubectl describe job myapp-prod-db-migrate-7 -n production

# If you need to re-run the hook without a full upgrade:
# 1. Delete the failed Job manually
kubectl delete job myapp-prod-db-migrate-7 -n production
# 2. Run helm upgrade again — hook will re-execute
helm upgrade myapp-prod ./mychart -n production --reuse-values

Hooks and tests transform a Helm release from a static manifest apply into a choreographed, self-validating deployment pipeline. The next lesson — lesson 8 — covers publishing and versioning your charts so that other teams can consume them reliably from a registry.