Artifact Management & Release Engineering

Artifact Repositories

18 min Lesson 3 of 28

Artifact Repositories

An artifact repository is the single authoritative store for every binary your organization builds — Docker images, npm packages, Maven JARs, PyPI wheels, Helm charts, raw tarballs. Getting this layer right is not an infrastructure detail; it is the foundation of your entire supply-chain security, reproducibility, and release-promotion model. At Netflix, Uber, and Shopify, a build that cannot push to the artifact repository does not exist in any meaningful operational sense.

Why you cannot skip this layer: Without a repository, teams pull from the public internet in production pipelines — a practice that introduces supply-chain risk (packages are mutable, disappear, or get hijacked), breaks air-gapped environments, and makes reproducibility impossible. A private artifact repository with immutable storage and a proxy cache is table stakes for any organization beyond a single-person side project.

The Major Repository Managers

Three platforms dominate the enterprise space, each with distinct trade-offs:

JFrog Artifactory: The most feature-complete solution. Supports over 30 package types natively, including Docker, Helm, npm, Maven, PyPI, NuGet, Go, Debian, and RPM. First-class support for remote repository proxying (caching public registries), virtual repositories (aggregating multiple repos under one URL), and replication between instances across regions. Enterprise licences include Xray (deep SCA scanning) and Distribution (secure CDN-backed release delivery). Used by Google, Amazon, and most large banks.
Sonatype Nexus Repository: The open-source-friendly option. The OSS edition is free and supports Docker, Maven, npm, PyPI, NuGet, and Helm. Pro edition adds Firewall (policy enforcement on inbound packages), IQ Server integration (component intelligence), and staging repositories for promotion workflows. Widely used in Java-heavy shops because of its Maven Central proxy and staging/release workflow.
Cloud-Native Registries: AWS ECR, GCP Artifact Registry, Azure ACR — these are the lowest-friction choice when you are already committed to one cloud. They integrate natively with IAM/RBAC, lifecycle policies, and the CI services of the respective cloud. GCP Artifact Registry is the most general-purpose (Docker, npm, Python, Maven, Go), while AWS ECR is Docker-only but deeply integrated with ECS, EKS, and CodePipeline.

Repository Structures: Separating Concerns at Scale

A flat "one repo for everything" approach breaks down fast. Production-grade organizations create a structured set of repositories that reflect the lifecycle of an artifact — not just its type.

The canonical structure, which mirrors what Netflix and Spotify use internally:

local-dev repos: Developers push snapshot/prerelease builds here from feature branches. Artifacts are mutable. Retention is short (7–14 days). Nothing in production ever pulls from here.
ci-staging repos: The CI pipeline pushes artifacts from the main branch. Artifacts are immutable (a given version SHA can never be overwritten). This is the first tier where automated integration tests run against real binaries.
release repos: Artifacts are promoted here only after passing all quality gates. Pulling from this repo is the only permitted source for production deployments. Deletion is disabled — you may deprecate but never delete a released version.
remote-proxy repos: Transparently proxy public registries (Docker Hub, npm registry, PyPI, Maven Central). Your build never touches the public internet — it talks to this proxy, which caches the response. If the upstream is down or a package is yanked, your cached copy still works.
virtual repos: A single URL that aggregates local + remote repos in a defined resolution order. Developers configure their tooling to point at one URL and the repository manager handles routing.

Artifact promotion flow: builds move left-to-right through increasingly strict repos; a remote proxy cache insulates all builds from the public internet; a virtual repo gives tooling one stable URL.

Configuring a Production Artifactory Setup

The following Terraform snippet provisions the core repository structure for a Docker-based platform on Artifactory Cloud using the official provider. This is the pattern used at Spotify and Zalando for their multi-team setups:

# artifactory.tf — Terraform provider: registry.terraform.io/jfrog/artifactory
terraform {
  required_providers {
    artifactory = {
      source  = "jfrog/artifactory"
      version = "~> 10.0"
    }
  }
}

provider "artifactory" {
  url          = var.artifactory_url
  access_token = var.artifactory_access_token
}

# --- Local repos (one per environment tier) ---
resource "artifactory_local_docker_v2_repository" "dev" {
  key             = "docker-dev-local"
  description     = "Developer snapshot images — mutable, short retention"
  tag_retention   = 10    # keep only last 10 tags per image name
  max_unique_tags = 50
  xray_index      = true  # scan on push, block on CRITICAL CVEs
}

resource "artifactory_local_docker_v2_repository" "staging" {
  key             = "docker-staging-local"
  description     = "CI-built images — immutable SHA-pinned"
  tag_retention   = 100
  max_unique_tags = 500
  xray_index      = true
}

resource "artifactory_local_docker_v2_repository" "release" {
  key             = "docker-release-local"
  description     = "Production-grade released images — deletion blocked"
  tag_retention   = 0     # 0 = unlimited (never auto-delete)
  max_unique_tags = 0
  xray_index      = true
}

# --- Remote proxy for Docker Hub ---
resource "artifactory_remote_docker_repository" "dockerhub" {
  key                             = "docker-hub-remote"
  url                             = "https://registry-1.docker.io"
  description                     = "Proxy cache for Docker Hub"
  external_dependencies_enabled   = false
  enable_token_authentication     = true
  block_pushing_schema1           = true
  retrieval_cache_period_seconds  = 600
}

# --- Virtual repo: single pull URL for all build tooling ---
resource "artifactory_virtual_docker_repository" "all" {
  key             = "docker"
  description     = "Virtual: release > staging > hub-proxy"
  repositories    = [
    artifactory_local_docker_v2_repository.release.key,
    artifactory_local_docker_v2_repository.staging.key,
    artifactory_remote_docker_repository.dockerhub.key,
  ]
  default_deployment_repo = artifactory_local_docker_v2_repository.staging.key
}

Use the virtual repository URL everywhere in your CI pipelines. Point every docker pull, npm install --registry, and pip install --index-url at the virtual repo endpoint. When you need to change resolution order or add a new upstream, you update one Terraform resource — not dozens of pipeline YAML files. This is the key operational benefit of the virtual-repo pattern.

Cloud Registry Deep Dive: AWS ECR and GCP Artifact Registry

For cloud-native shops, managed registries reduce operational burden at the cost of vendor lock-in and less flexibility on package types. The most important configuration decisions:

# ECR lifecycle policy — applied per repository via AWS CLI
# Keeps only the last 30 tagged images; deletes untagged layers older than 1 day
aws ecr put-lifecycle-policy \
  --repository-name myapp/api \
  --lifecycle-policy-text '{
    "rules": [
      {
        "rulePriority": 1,
        "description": "Expire untagged images after 1 day",
        "selection": {
          "tagStatus": "untagged",
          "countType": "sinceImagePushed",
          "countUnit": "days",
          "countNumber": 1
        },
        "action": { "type": "expire" }
      },
      {
        "rulePriority": 2,
        "description": "Keep last 30 tagged images",
        "selection": {
          "tagStatus": "tagged",
          "tagPrefixList": ["v"],
          "countType": "imageCountMoreThan",
          "countNumber": 30
        },
        "action": { "type": "expire" }
      }
    ]
  }'

# Authenticate Docker to ECR (token valid 12 hours)
aws ecr get-login-password --region us-east-1 \
  | docker login --username AWS --password-stdin \
    123456789012.dkr.ecr.us-east-1.amazonaws.com

# Enable immutable tags (critical: prevents tag overwrite in release repos)
aws ecr put-image-tag-mutability \
  --repository-name myapp/api \
  --image-tag-mutability IMMUTABLE

Retention Policies: What to Keep, What to Expire

Unmanaged artifact storage becomes a cost and compliance problem at scale. At Uber, a single monorepo's build system was generating over 500 GB of Docker layers per week before they introduced tiered retention. The production-grade retention model:

Dev / snapshot repos: Expire everything older than 14 days OR keep only the last N tags (N = 10–25). Untagged layers expire after 1 day. No exceptions — developers should never rely on a 3-month-old snapshot.
CI staging repos: Retain artifacts for the length of your release cycle plus a safety margin. For weekly releases, 30 days is typical. Retain any artifact that is currently deployed anywhere (query your deployment platform to know what is live before expiring).
Release repos: Retain indefinitely for compliance. Many regulated industries (finance, healthcare) require 7 years of build provenance. Mark older releases as deprecated in your catalog but never delete them. Storage is cheap; the inability to reproduce a build for an audit is not.
Remote proxy caches: Set a short cache TTL (1–4 hours) for metadata (package manifests, tag lists) and a longer TTL (7–30 days) for immutable content (layer blobs, tarball content by digest). This balances upstream freshness against cache utility.

Never set tag mutability to MUTABLE on a release repository. If tags are mutable, a bad actor (or an accidental pipeline re-run) can overwrite v2.3.0 with a completely different binary. Your production cluster may then pull a silently different image on the next pod restart. This is one of the most common supply-chain vulnerability vectors. Release repositories must have immutable tags enforced at the registry level, not just by convention.

Security: Scanning, Signing, and Access Control

An artifact repository is not just storage — it is a policy enforcement point. Three security mechanisms every production setup must implement:

Vulnerability scanning on push: Artifactory Xray, ECR Enhanced Scanning (Trivy/Inspector), and GCP Artifact Analysis all scan images on push and can be configured to block promotion if CRITICAL or HIGH CVEs are found. Block at the CI stage, not at deployment — finding a vulnerability in a container that is already in production is several hours more expensive than blocking the build.
Image signing (Cosign / Notary): Sign every artifact before it enters the release repository. In your Kubernetes admission controller (Kyverno or OPA Gatekeeper), enforce that only signed images from your release repository can run in production. This closes the "pull from staging and forget to promote" failure mode entirely.
RBAC and network scoping: CI service accounts get write access to staging only. Release promotion is a separate token with write access to the release repo — held only by the release pipeline, not individual developers. Production nodes get read-only pull credentials. No human ever has write access to the release repository without a change-management approval.

# Sign a container image with Cosign after building
# Cosign uses keyless signing via Sigstore OIDC — no long-lived keys to rotate
cosign sign \
  --oidc-issuer=https://token.actions.githubusercontent.com \
  ghcr.io/acme/api:v2.3.0

# Verify the signature before deploying (also enforced by admission controller)
cosign verify \
  --certificate-identity-regexp=https://github.com/acme/api/.github/workflows/ \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
  ghcr.io/acme/api:v2.3.0

# Kyverno policy: only allow signed images from the release registry
# (in cluster-policy.yaml, applied via kubectl apply -f)
# spec.rules[].verifyImages[].attestors[].entries[].keyless.subject:
#   "https://github.com/acme/api/.github/workflows/release.yml@refs/heads/main"

Treat your artifact repository like a production database. Back it up (Artifactory supports replication to a secondary instance or S3 export; ECR supports cross-region replication). Monitor its storage growth weekly. Set cost alerts. An artifact repository that goes down stops all your deployments — it is on the critical path of your release process, not an optional caching layer.

Common Failure Modes in Production

Repository misconfigurations cause a disproportionate number of production incidents:

Tag overwrite on release: A pipeline re-runs and overwrites a released image tag with a new binary. Kubernetes nodes on the next pod restart pull the new binary without any deployment event. Fix: immutable tags on release repos, always.
Proxy cache stale after upstream deletion: A package is yanked from npm (as happened with left-pad in 2016). If your proxy cached only metadata but not the tarball content, your build breaks. Fix: configure your proxy to cache the full tarball content by digest, not just metadata.
Storage quota exhaustion: The artifact repository hits its storage limit. New pushes fail silently or with confusing errors. CI pipelines go red. Fix: set retention policies, monitor storage weekly, and set hard alerts at 70% capacity.
Missing cross-region replication: Your artifact repository is single-region. A regional AWS outage stops all deploys across every region. Fix: replicate release repos across at least two regions; production clusters pull from the nearest replica.