GitOps with ArgoCD & Flux

Flux Architecture

18 min Lesson 6 of 30

Flux Architecture

Flux is the second major GitOps engine for Kubernetes, created by Weaveworks (who also coined the term "GitOps") and now a CNCF Graduated project. Where ArgoCD presents a monolithic control-plane with a rich UI, Flux takes a radically different philosophy: a composable toolkit of small, focused controllers, each responsible for a single concern, all communicating through Kubernetes Custom Resource Definitions (CRDs). Understanding Flux's architecture is essential because it underpins how large-scale GitOps pipelines are built at companies like Weaveworks, Grafana Labs, and Microsoft Azure's own GitOps service.

This lesson covers the three controller families that make up the Flux control loop — source, kustomize, and helm — and the two CRDs you will write daily: GitRepository and Kustomization.

The Toolkit Philosophy

Flux is not a single binary. It is a set of Kubernetes controllers — called the GitOps Toolkit — each installed as a Deployment in the flux-system namespace. The key controllers are:

source-controller — watches external sources (Git repos, OCI registries, Helm repos, S3 buckets) and fetches their content. It exposes that content as an in-cluster HTTP artifact server for other controllers to consume.
kustomize-controller — reads a Kustomization CRD, pulls the manifests artifact from source-controller, runs kustomize build, and applies the result to the cluster. Despite the name, it can apply plain YAML too — Kustomize is optional.
helm-controller — reads a HelmRelease CRD, pulls a HelmChart artifact from source-controller, and performs Helm install/upgrade/rollback operations. It replaces the need to run helm imperatively in CI pipelines.
notification-controller — routes events from all other controllers to external systems (Slack, Teams, PagerDuty, GitHub commit statuses, webhooks). This enables rich observability without polluting the reconciliation controllers with notification logic.
image-reflector-controller + image-automation-controller — scan image registries for new tags and automatically write updated image references back to the Git repository. This closes the loop for fully automated delivery pipelines.

Key architectural insight: Every controller in Flux speaks through CRDs and Kubernetes events. No controller holds shared state outside etcd. This means each controller can be upgraded, restarted, or even temporarily removed without corrupting the others — a fundamental reliability property that monolithic architectures cannot offer.

Installing Flux: The Bootstrap Command

Flux is bootstrapped with the flux CLI, which creates the flux-system namespace, installs all controller Deployments, and commits the resulting manifests back to your Git repository — so Flux immediately manages itself via GitOps.

# Install the Flux CLI (macOS / Linux)
curl -s https://fluxcd.io/install.sh | sudo bash

# Verify prerequisites — checks Kubernetes version, API server access, etc.
flux check --pre

# Bootstrap Flux to a GitHub repository (creates flux-system/ directory in the repo)
# Flux creates a Deploy Key on the repo and stores it as a Kubernetes Secret
flux bootstrap github \
  --owner=my-org \
  --repository=gitops-fleet \
  --branch=main \
  --path=clusters/production \
  --personal=false \
  --token-auth=false

# After bootstrap, verify all controllers are running
kubectl -n flux-system get pods
# NAME                                       READY   STATUS    RESTARTS
# helm-controller-5d8d5fc6fd-xk7t2           1/1     Running   0
# kustomize-controller-7b7b47f5d9-pl9qr      1/1     Running   0
# notification-controller-79f6d9b4d8-r7kms   1/1     Running   0
# source-controller-6b8d8f7f6c-m2n4p         1/1     Running   0

# Inspect Flux status across all CRDs
flux get all

The GitRepository CRD: Source Controller in Depth

A GitRepository object tells source-controller where to fetch content and how often to check for changes. It is the entry point for any Git-backed GitOps pipeline. When source-controller successfully fetches and archives a revision, it updates the object's .status.artifact field with a URL pointing to a tarball on its built-in HTTP server — which other controllers reference to download manifests without needing Git credentials themselves.

# gitrepository.yaml — annotated production-grade example
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: gitops-fleet
  namespace: flux-system
spec:
  interval: 1m          # poll the remote every 60 seconds
  url: https://github.com/my-org/gitops-fleet
  ref:
    branch: main        # track a branch
    # tag: v1.2.3       # or pin to a specific tag
    # semver: ">=1.0.0" # or a SemVer range across tags
  secretRef:
    name: gitops-fleet-auth  # SSH key or basic-auth token stored in a Secret
  timeout: 60s
  ignore: |
    # Ignore files that change frequently but do not affect cluster state
    # Uses .gitignore syntax
    /docs/
    /tests/
    **/*.md
---
# The matching Secret (SSH private key approach)
apiVersion: v1
kind: Secret
metadata:
  name: gitops-fleet-auth
  namespace: flux-system
type: Opaque
data:
  identity: <base64-encoded-ssh-private-key>
  identity.pub: <base64-encoded-ssh-public-key>
  known_hosts: <base64-encoded-known-hosts>

After applying, you can inspect the artifact status with flux get source git gitops-fleet. A healthy output shows a non-empty URL in the ARTIFACT REVISION column. If the column is empty or shows failed, check kubectl -n flux-system describe gitrepository gitops-fleet — the Conditions section explains precisely what failed (auth error, SSH host key mismatch, network timeout, etc.).

The Kustomization CRD: Kustomize Controller in Depth

The Kustomization CRD (not to be confused with the Kustomize kustomization.yaml file) instructs kustomize-controller to take an artifact from a source, optionally run it through Kustomize, and apply the result to the cluster. It is the primary reconciliation object you will create for every workload.

# kustomization.yaml — production-grade annotated example
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-production
  namespace: flux-system
spec:
  interval: 5m           # re-reconcile every 5 minutes (drift correction)
  retryInterval: 30s     # backoff interval on failure
  timeout: 3m            # fail fast if apply takes longer than 3 minutes
  sourceRef:
    kind: GitRepository
    name: gitops-fleet   # reference the GitRepository defined above
  path: ./environments/production   # sub-path in the repo to process
  prune: true            # delete cluster resources removed from Git (critical)
  wait: true             # health-check all applied resources before marking Ready
  force: false           # do NOT force-apply (would delete+recreate immutable fields)
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: api-gateway
      namespace: production
  postBuild:
    substitute:
      ENVIRONMENT: production
      REPLICA_COUNT: "3"
    substituteFrom:
      - kind: ConfigMap
        name: cluster-vars   # inject cluster-specific vars without duplicating YAMLs
  decryption:
    provider: sops          # decrypt SOPS-encrypted secrets inline during apply
    secretRef:
      name: sops-age-key

Production pitfall — prune: true: Without prune: true, resources deleted from Git will remain in the cluster indefinitely. This is the most common source of "ghost" resources in Flux-managed clusters — old Deployments, Services, and CronJobs that nobody remembers but that still consume resources and occasionally break things. Always set prune: true in production. The risk of accidental deletion is mitigated by the fact that the deletion itself is a git commit that can be reverted.

Flux GitOps Toolkit: source-controller fetches Git content and exposes artifacts; kustomize-controller and helm-controller consume those artifacts and apply manifests to the Kubernetes API Server; notification-controller routes events to external systems.

The Helm Controller and HelmRelease CRD

The helm-controller removes the need to run helm upgrade --install in CI pipelines, which is stateless and difficult to observe. Instead, you declare the desired Helm release state in a HelmRelease CRD and the controller owns the full release lifecycle — install, upgrade, test, rollback, and uninstall — recording history in Helm's release secrets just as a manual helm CLI would.

# HelmRepository CRD: tell source-controller where the Helm chart index lives
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: ingress-nginx
  namespace: flux-system
spec:
  interval: 10m
  url: https://kubernetes.github.io/ingress-nginx

---
# HelmRelease CRD: declare the desired release state
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: infrastructure
spec:
  interval: 10m
  chart:
    spec:
      chart: ingress-nginx
      version: "4.x"          # SemVer constraint — auto-upgrades within range
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: flux-system
  values:
    controller:
      replicaCount: 3
      resources:
        requests:
          cpu: 100m
          memory: 90Mi
        limits:
          cpu: 500m
          memory: 256Mi
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
  install:
    remediation:
      retries: 3             # retry failed install up to 3 times before giving up
  upgrade:
    remediation:
      retries: 3
      remediateLastFailure: true   # roll back if upgrade fails
    cleanupOnFail: true
  rollback:
    timeout: 5m
    cleanupOnFail: true

Pro practice — values override pattern: In large organisations, base Helm values live in a shared HelmRelease manifest while environment-specific overrides come from a ConfigMap referenced by valuesFrom. This avoids duplicating entire values blocks across staging and production HelmRelease objects — a common cause of configuration drift when teams copy-paste manifests. Use spec.valuesFrom referencing a ConfigMap or Secret for any values that differ per environment.

Observing Flux Reconciliation in Real Time

Understanding reconciliation status is critical for debugging deployments. Flux provides first-class observability through the CLI and Kubernetes events.

# Show reconciliation status of all Flux objects
flux get all -A

# Watch a specific Kustomization reconcile in real time
flux reconcile kustomization apps-production --with-source --watch

# Force an immediate reconciliation (bypass the interval timer)
flux reconcile source git gitops-fleet
flux reconcile kustomization apps-production

# Inspect a failing object — conditions tell the full story
kubectl -n flux-system describe kustomization apps-production
# Look for Conditions with type=Ready, status=False, reason=ReconciliationFailed

# Tail controller logs (include --since=5m to reduce noise)
kubectl -n flux-system logs deploy/kustomize-controller --since=5m | tail -50
kubectl -n flux-system logs deploy/source-controller --since=5m | tail -50

# Check events in the flux-system namespace for recent activity
kubectl -n flux-system get events --sort-by='.lastTimestamp' | tail -20

Production Failure Modes and How to Diagnose Them

The most common Flux failure patterns in production, and how to resolve each:

GitRepository stuck on "GitOperationFailed": Almost always an SSH key rotation or GitHub token expiration. Check the Secret referenced by secretRef, recreate it with the new credential, then flux reconcile source git <name>.
Kustomization "health check timeout": A resource applied by Flux failed its own readiness check (e.g., a Deployment that never reaches its desired replica count). The Kustomization itself is healthy — the underlying workload is not. Debug the workload directly: kubectl describe pod, kubectl logs.
HelmRelease stuck in "upgrade retries exhausted": The chart values are invalid or the chart has a bug. Flux will have rolled back automatically if remediateLastFailure: true. Inspect the Helm history: helm history ingress-nginx -n infrastructure to see which revision failed and what error Helm returned.
Drift not being corrected: Verify interval has elapsed and that the Kustomization's .status.lastAppliedRevision matches the Git HEAD. A mismatch means source-controller has new content that kustomize-controller has not yet consumed — check source-controller logs.

Flux vs. ArgoCD architectural trade-off: Flux is operationally lighter (no UI server, no Redis, no application database) and integrates more naturally into a CLI/IaC-first workflow. ArgoCD provides richer multi-tenant RBAC and a visual UI that operations teams find indispensable. At scale, many organisations run both: Flux for automated infrastructure and cluster-level resources (CRDs, operators, cert-manager, Prometheus), ArgoCD for application team workflows where the UI and RBAC delegation matter. Choose based on your team's operational model, not brand preference.

Summary

Flux's toolkit architecture separates fetching (source-controller), rendering (kustomize-controller), and releasing (helm-controller) into independent, replaceable components. The GitRepository CRD is your entry point for any Git-backed source, and the Kustomization CRD is the reconciliation object that translates Git content into live cluster state. Mastering these two CRDs — and understanding how source-controller's artifact model decouples credential handling from reconciliation — is the foundation for building production-grade Flux pipelines covered in the next lessons.