Kubernetes Cost Management
Kubernetes Cost Management
Kubernetes solved resource scheduling beautifully but shipped with almost no cost accountability. A cluster is a shared pool: compute, memory, and network are consumed by hundreds of pods across dozens of namespaces, yet the cloud bill arrives as a single line item — the node pool. Recovering per-team, per-service cost from that shared pool is the central challenge of Kubernetes FinOps, and at big-tech scale it determines whether engineering leadership can make rational investment decisions or is operating in the dark.
Why Kubernetes Cost Is Hard
Three structural properties make Kubernetes cost uniquely difficult compared to VM-based infrastructure.
- Bin packing obscures ownership. The scheduler places pods from different teams on the same node. The node cost is real, but attributing it to a workload requires knowing both the resource requested by each pod and the idle capacity on the node — and deciding who pays for that idle capacity.
- Requests vs. limits vs. actual usage diverge. A pod can request 2 CPU and 4 GiB, be limited to 4 CPU and 8 GiB, and actually consume 0.3 CPU and 900 MiB. Cost tools must decide which number to use. Charging on requests is conservative and predictable; charging on actual usage is accurate but volatile and harder to budget against.
- Shared infrastructure has no single owner. Cluster add-ons — CoreDNS, kube-proxy, metrics-server, the ingress controller, the CNI daemonset — consume real resources but cannot be attributed to any team. This overhead typically runs 8–15% of total cluster capacity and must be socialised across consumers.
The Namespace-as-Team Model
The standard big-tech approach is to map cost boundaries to Kubernetes namespaces and enforce them with labels. Every team owns one or more namespaces; every namespace is tagged with team, env, and cost-centre labels via a MutatingAdmissionWebhook or enforced by OPA/Kyverno policies at the namespace level. Cost is then aggregated per namespace and reported weekly to team leads as showback, and monthly to finance as chargeback.
Resource quotas make the model operational: without them, a single misconfigured deployment in one namespace can consume the entire node pool and starve other teams. Every namespace that participates in chargeback should have a ResourceQuota and a LimitRange.
Bin Packing: Turning Idle Capacity Into Savings
The gap between what nodes provide and what pods request is called cluster slack. At 1,000 nodes x $0.50/node-hour, 25% slack costs $3,000/day. There are three levers to reduce it.
1. Vertical Pod Autoscaler (VPA) in recommendation mode. Run VPA in Off mode first — it emits recommendations without acting — so you can audit request accuracy before enabling auto-updates. After 7 days of data, sort by the ratio of requested CPU to recommended CPU; workloads with a ratio above 4x are the highest-priority right-sizing targets.
2. Node consolidation via Karpenter (or Cluster Autoscaler consolidation mode). Karpenter's disruption.consolidationPolicy: WhenUnderutilized drains underutilised nodes and repacks workloads onto fewer, fuller nodes. At steady-state this typically improves bin-packing efficiency from 55–65% to 75–85% within a few hours of enabling it.
3. Node instance family selection. Karpenter's NodePool lets you specify a priority-ordered list of instance families. Mixing m7i, m7g (Graviton), and c7g allows the scheduler to pick the cheapest instance that fits. Graviton instances are typically 20% cheaper per vCPU than x86 equivalents for the same workload — but validate with your own benchmarks; memory-intensive or x86-only workloads may not benefit.
Kubecost-Style Visibility
Kubecost (and its OSS core, OpenCost) runs inside the cluster and continuously models the cost of every pod, deployment, namespace, and label combination by combining Prometheus resource metrics with cloud provider pricing APIs. This gives you sub-hour cost attribution without writing any custom tooling.
The core cost model is: pod cost = (CPU requested / node CPU) × node hourly rate × hours running, plus the same calculation for memory, then summed. Idle node capacity is split across all pods proportionally to their requests, so teams that over-request pay for their own waste rather than socialising it to neighbours.
Rightsizing at Scale: The Automated Feedback Loop
Manual right-sizing does not scale past 50 services. The production pattern is an automated weekly pipeline: VPA recommendations are collected, filtered for statistical significance (a workload must have at least 7 days of data and variance below 40%), and surfaced as pull requests against the team's Helm values file with the current and recommended request values side by side. The PR is auto-approved if the change reduces requests by more than 20% and the service has an HPA configured (so it can scale out if needed).
Shared Cluster vs. Dedicated Clusters: The Architecture Trade-off
A shared multi-tenant cluster has the best bin-packing efficiency — overhead is amortised, nodes are fuller, and Karpenter can repack across all workloads. A per-team dedicated cluster eliminates noisy-neighbour risk, simplifies cost attribution (one cloud bill = one team), and allows independent upgrade schedules, but multiplies overhead: every cluster needs its own control plane, add-ons, and operations staff.
The industry consensus at most hyperscaler-adjacent companies is a tiered model: one shared platform cluster per environment (dev/staging/prod) for the majority of workloads, plus opt-in dedicated clusters for workloads with strict compliance, network isolation, or hardware requirements (GPU training jobs, PCI-scoped payment processors). This delivers roughly 80% of the bin-packing benefit while containing the blast radius of the remaining 20%.
Surfacing Cost in the Developer Workflow
The most effective behavioural change is making cost visible at the moment a developer makes a sizing decision, not after the bill arrives. Three integration points matter most:
- CI cost estimates. A step in the deploy pipeline (post-helm-diff) calls the OpenCost API to project the monthly cost of the new deployment spec and posts it as a PR comment: "This deployment will cost approximately $1,240/month (+18% vs. current). CPU request increased from 500m to 800m across 10 replicas." Engineers cannot act on what they cannot see.
- Weekly Slack digest. An automated message every Monday to each team's channel: top 5 most expensive workloads, efficiency score, week-over-week change, and a link to the Kubecost dashboard. Teams that see their cost go up without a matching business justification investigate; teams that do not see it do not.
- Grafana cost panel on every service dashboard. A standard Grafana panel (using OpenCost's Prometheus metrics) showing cost/day and efficiency % next to latency and error rate. Cost becomes a first-class operational signal, not a finance artifact.
Kubernetes cost management is not a tool purchase — it is an operating model change. The tooling (OpenCost, VPA, Karpenter) is mature and largely free. The hard work is the labelling taxonomy, the quota enforcement, and the cultural shift that makes every team accountable for their own cloud spend.