GitOps with ArgoCD & Flux

Flux Architecture

18 min Lesson 6 of 30

Flux Architecture

Flux is the second major GitOps engine for Kubernetes, created by Weaveworks (who also coined the term "GitOps") and now a CNCF Graduated project. Where ArgoCD presents a monolithic control-plane with a rich UI, Flux takes a radically different philosophy: a composable toolkit of small, focused controllers, each responsible for a single concern, all communicating through Kubernetes Custom Resource Definitions (CRDs). Understanding Flux's architecture is essential because it underpins how large-scale GitOps pipelines are built at companies like Weaveworks, Grafana Labs, and Microsoft Azure's own GitOps service.

This lesson covers the three controller families that make up the Flux control loop — source, kustomize, and helm — and the two CRDs you will write daily: GitRepository and Kustomization.

The Toolkit Philosophy

Flux is not a single binary. It is a set of Kubernetes controllers — called the GitOps Toolkit — each installed as a Deployment in the flux-system namespace. The key controllers are:

  • source-controller — watches external sources (Git repos, OCI registries, Helm repos, S3 buckets) and fetches their content. It exposes that content as an in-cluster HTTP artifact server for other controllers to consume.
  • kustomize-controller — reads a Kustomization CRD, pulls the manifests artifact from source-controller, runs kustomize build, and applies the result to the cluster. Despite the name, it can apply plain YAML too — Kustomize is optional.
  • helm-controller — reads a HelmRelease CRD, pulls a HelmChart artifact from source-controller, and performs Helm install/upgrade/rollback operations. It replaces the need to run helm imperatively in CI pipelines.
  • notification-controller — routes events from all other controllers to external systems (Slack, Teams, PagerDuty, GitHub commit statuses, webhooks). This enables rich observability without polluting the reconciliation controllers with notification logic.
  • image-reflector-controller + image-automation-controller — scan image registries for new tags and automatically write updated image references back to the Git repository. This closes the loop for fully automated delivery pipelines.
Key architectural insight: Every controller in Flux speaks through CRDs and Kubernetes events. No controller holds shared state outside etcd. This means each controller can be upgraded, restarted, or even temporarily removed without corrupting the others — a fundamental reliability property that monolithic architectures cannot offer.

Installing Flux: The Bootstrap Command

Flux is bootstrapped with the flux CLI, which creates the flux-system namespace, installs all controller Deployments, and commits the resulting manifests back to your Git repository — so Flux immediately manages itself via GitOps.

# Install the Flux CLI (macOS / Linux) curl -s https://fluxcd.io/install.sh | sudo bash # Verify prerequisites — checks Kubernetes version, API server access, etc. flux check --pre # Bootstrap Flux to a GitHub repository (creates flux-system/ directory in the repo) # Flux creates a Deploy Key on the repo and stores it as a Kubernetes Secret flux bootstrap github \ --owner=my-org \ --repository=gitops-fleet \ --branch=main \ --path=clusters/production \ --personal=false \ --token-auth=false # After bootstrap, verify all controllers are running kubectl -n flux-system get pods # NAME READY STATUS RESTARTS # helm-controller-5d8d5fc6fd-xk7t2 1/1 Running 0 # kustomize-controller-7b7b47f5d9-pl9qr 1/1 Running 0 # notification-controller-79f6d9b4d8-r7kms 1/1 Running 0 # source-controller-6b8d8f7f6c-m2n4p 1/1 Running 0 # Inspect Flux status across all CRDs flux get all

The GitRepository CRD: Source Controller in Depth

A GitRepository object tells source-controller where to fetch content and how often to check for changes. It is the entry point for any Git-backed GitOps pipeline. When source-controller successfully fetches and archives a revision, it updates the object's .status.artifact field with a URL pointing to a tarball on its built-in HTTP server — which other controllers reference to download manifests without needing Git credentials themselves.

# gitrepository.yaml — annotated production-grade example apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: gitops-fleet namespace: flux-system spec: interval: 1m # poll the remote every 60 seconds url: https://github.com/my-org/gitops-fleet ref: branch: main # track a branch # tag: v1.2.3 # or pin to a specific tag # semver: ">=1.0.0" # or a SemVer range across tags secretRef: name: gitops-fleet-auth # SSH key or basic-auth token stored in a Secret timeout: 60s ignore: | # Ignore files that change frequently but do not affect cluster state # Uses .gitignore syntax /docs/ /tests/ **/*.md --- # The matching Secret (SSH private key approach) apiVersion: v1 kind: Secret metadata: name: gitops-fleet-auth namespace: flux-system type: Opaque data: identity: <base64-encoded-ssh-private-key> identity.pub: <base64-encoded-ssh-public-key> known_hosts: <base64-encoded-known-hosts>

After applying, you can inspect the artifact status with flux get source git gitops-fleet. A healthy output shows a non-empty URL in the ARTIFACT REVISION column. If the column is empty or shows failed, check kubectl -n flux-system describe gitrepository gitops-fleet — the Conditions section explains precisely what failed (auth error, SSH host key mismatch, network timeout, etc.).

The Kustomization CRD: Kustomize Controller in Depth

The Kustomization CRD (not to be confused with the Kustomize kustomization.yaml file) instructs kustomize-controller to take an artifact from a source, optionally run it through Kustomize, and apply the result to the cluster. It is the primary reconciliation object you will create for every workload.

# kustomization.yaml — production-grade annotated example apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: apps-production namespace: flux-system spec: interval: 5m # re-reconcile every 5 minutes (drift correction) retryInterval: 30s # backoff interval on failure timeout: 3m # fail fast if apply takes longer than 3 minutes sourceRef: kind: GitRepository name: gitops-fleet # reference the GitRepository defined above path: ./environments/production # sub-path in the repo to process prune: true # delete cluster resources removed from Git (critical) wait: true # health-check all applied resources before marking Ready force: false # do NOT force-apply (would delete+recreate immutable fields) healthChecks: - apiVersion: apps/v1 kind: Deployment name: api-gateway namespace: production postBuild: substitute: ENVIRONMENT: production REPLICA_COUNT: "3" substituteFrom: - kind: ConfigMap name: cluster-vars # inject cluster-specific vars without duplicating YAMLs decryption: provider: sops # decrypt SOPS-encrypted secrets inline during apply secretRef: name: sops-age-key
Production pitfall — prune: true: Without prune: true, resources deleted from Git will remain in the cluster indefinitely. This is the most common source of "ghost" resources in Flux-managed clusters — old Deployments, Services, and CronJobs that nobody remembers but that still consume resources and occasionally break things. Always set prune: true in production. The risk of accidental deletion is mitigated by the fact that the deletion itself is a git commit that can be reverted.
Flux Control Loop: Source, Kustomize, and Helm Controllers Git Remote GitHub / GitLab source-controller GitRepository CRD Artifact HTTP Server kustomize-controller Kustomization CRD kustomize build + kubectl apply helm-controller HelmRelease CRD helm install / upgrade K8s API Server Deployments Services ConfigMaps... poll / clone artifact URL artifact URL apply apply notification-controller Slack / PD / Webhook events events
Flux GitOps Toolkit: source-controller fetches Git content and exposes artifacts; kustomize-controller and helm-controller consume those artifacts and apply manifests to the Kubernetes API Server; notification-controller routes events to external systems.

The Helm Controller and HelmRelease CRD

The helm-controller removes the need to run helm upgrade --install in CI pipelines, which is stateless and difficult to observe. Instead, you declare the desired Helm release state in a HelmRelease CRD and the controller owns the full release lifecycle — install, upgrade, test, rollback, and uninstall — recording history in Helm's release secrets just as a manual helm CLI would.

# HelmRepository CRD: tell source-controller where the Helm chart index lives apiVersion: source.toolkit.fluxcd.io/v1 kind: HelmRepository metadata: name: ingress-nginx namespace: flux-system spec: interval: 10m url: https://kubernetes.github.io/ingress-nginx --- # HelmRelease CRD: declare the desired release state apiVersion: helm.toolkit.fluxcd.io/v2 kind: HelmRelease metadata: name: ingress-nginx namespace: infrastructure spec: interval: 10m chart: spec: chart: ingress-nginx version: "4.x" # SemVer constraint — auto-upgrades within range sourceRef: kind: HelmRepository name: ingress-nginx namespace: flux-system values: controller: replicaCount: 3 resources: requests: cpu: 100m memory: 90Mi limits: cpu: 500m memory: 256Mi metrics: enabled: true serviceMonitor: enabled: true install: remediation: retries: 3 # retry failed install up to 3 times before giving up upgrade: remediation: retries: 3 remediateLastFailure: true # roll back if upgrade fails cleanupOnFail: true rollback: timeout: 5m cleanupOnFail: true
Pro practice — values override pattern: In large organisations, base Helm values live in a shared HelmRelease manifest while environment-specific overrides come from a ConfigMap referenced by valuesFrom. This avoids duplicating entire values blocks across staging and production HelmRelease objects — a common cause of configuration drift when teams copy-paste manifests. Use spec.valuesFrom referencing a ConfigMap or Secret for any values that differ per environment.

Observing Flux Reconciliation in Real Time

Understanding reconciliation status is critical for debugging deployments. Flux provides first-class observability through the CLI and Kubernetes events.

# Show reconciliation status of all Flux objects flux get all -A # Watch a specific Kustomization reconcile in real time flux reconcile kustomization apps-production --with-source --watch # Force an immediate reconciliation (bypass the interval timer) flux reconcile source git gitops-fleet flux reconcile kustomization apps-production # Inspect a failing object — conditions tell the full story kubectl -n flux-system describe kustomization apps-production # Look for Conditions with type=Ready, status=False, reason=ReconciliationFailed # Tail controller logs (include --since=5m to reduce noise) kubectl -n flux-system logs deploy/kustomize-controller --since=5m | tail -50 kubectl -n flux-system logs deploy/source-controller --since=5m | tail -50 # Check events in the flux-system namespace for recent activity kubectl -n flux-system get events --sort-by='.lastTimestamp' | tail -20

Production Failure Modes and How to Diagnose Them

The most common Flux failure patterns in production, and how to resolve each:

  • GitRepository stuck on "GitOperationFailed": Almost always an SSH key rotation or GitHub token expiration. Check the Secret referenced by secretRef, recreate it with the new credential, then flux reconcile source git <name>.
  • Kustomization "health check timeout": A resource applied by Flux failed its own readiness check (e.g., a Deployment that never reaches its desired replica count). The Kustomization itself is healthy — the underlying workload is not. Debug the workload directly: kubectl describe pod, kubectl logs.
  • HelmRelease stuck in "upgrade retries exhausted": The chart values are invalid or the chart has a bug. Flux will have rolled back automatically if remediateLastFailure: true. Inspect the Helm history: helm history ingress-nginx -n infrastructure to see which revision failed and what error Helm returned.
  • Drift not being corrected: Verify interval has elapsed and that the Kustomization's .status.lastAppliedRevision matches the Git HEAD. A mismatch means source-controller has new content that kustomize-controller has not yet consumed — check source-controller logs.
Flux vs. ArgoCD architectural trade-off: Flux is operationally lighter (no UI server, no Redis, no application database) and integrates more naturally into a CLI/IaC-first workflow. ArgoCD provides richer multi-tenant RBAC and a visual UI that operations teams find indispensable. At scale, many organisations run both: Flux for automated infrastructure and cluster-level resources (CRDs, operators, cert-manager, Prometheus), ArgoCD for application team workflows where the UI and RBAC delegation matter. Choose based on your team's operational model, not brand preference.

Summary

Flux's toolkit architecture separates fetching (source-controller), rendering (kustomize-controller), and releasing (helm-controller) into independent, replaceable components. The GitRepository CRD is your entry point for any Git-backed source, and the Kustomization CRD is the reconciliation object that translates Git content into live cluster state. Mastering these two CRDs — and understanding how source-controller's artifact model decouples credential handling from reconciliation — is the foundation for building production-grade Flux pipelines covered in the next lessons.