Why FinOps?
Why FinOps?
Cloud computing promised economics that matched spend to usage. The reality, for most organisations that scaled past a few dozen engineers, has been the opposite: bills that grew faster than headcount, surprise charges that showed up weeks late, and engineering teams that had no idea what their services actually cost. FinOps — Financial Operations — is the discipline that closes that gap. Before you learn to optimise anything, you need to understand why the problem exists in the first place and what the practise framework looks like at big-tech scale.
The Cloud Spend Problem
Three structural forces combine to make cloud cost ungovernable without deliberate effort.
- Decentralised provisioning with centralised billing. Any developer with the right IAM role can
terraform applya GPU cluster or a NAT gateway. The charge shows up on one consolidated invoice 30 days later, attributed only to an account or a project — not to the service or the team that created it. By the time finance flags the anomaly, the resource may already be gone or the engineer who created it may have moved on. - Pricing complexity. AWS alone publishes over 2 million SKU price points. EC2 pricing varies by region, OS, tenancy, purchase option, and network placement. Data transfer fees are particularly opaque: an egress charge from one AZ to another is billed differently from the same transfer crossing a VPC peering link, and both differ from transfer leaving the region. Nobody memorises this; most engineers have a vague intuition that is usually wrong by a factor of two to ten.
- The lag between action and signal. Cloud bills are not real-time. Cost Explorer data is typically 24 hours stale. Committed-use discount analysis requires months of historical usage data. A team that ships a new feature with an architectural mistake — say, a fanout that reads millions of S3 objects per request — will not see the cost impact until two or three billing cycles later, by which point the feature is deeply embedded in production.
The result at scale is predictable: a SaaS company growing 3x year-over-year often finds that cloud spend grows 5–8x in the same period because of accumulated waste — oversized instances nobody right-sized, dev environments running 24/7, cross-region data copies that nobody decommissioned, and on-demand pricing on workloads that have been stable for 18 months. Gartner and McKinsey both estimate that 30–35% of enterprise cloud spend is waste. At $10M/month of cloud spend that is $3–3.5M walking out the door every month.
The FinOps Foundation Framework
The FinOps Foundation (finops.org), now a Linux Foundation project, standardised the practise around three phases that form a continuous loop: Inform, Optimize, and Operate. This is not a one-time project; it is an operating model that runs permanently in parallel with your engineering delivery cycles.
Phase 1 — Inform
You cannot optimise what you cannot see. The Inform phase is about achieving cost visibility at the granularity of a team, a service, and eventually a single unit of business value (a user, a transaction, a request). The key deliverables are:
- Tagging taxonomy. Every resource tagged with
env,team,service, andcost-centre. Without this, cost allocation is guesswork. AWS Config rules, Azure Policy, and GCP Organisation Policies enforce mandatory tags at resource creation time. Enforce before you scale — retrofitting tags onto 50,000 resources is a multi-quarter project. - Showback and chargeback. At minimum, weekly reports emailed to team leads showing what their services cost. Chargeback — actually debiting the team's budget — follows once showback data is trusted. The psychological effect of seeing your service cost on your OKR dashboard is enormous.
- Anomaly detection. AWS Cost Anomaly Detection, GCP Budget alerts, and Azure Cost Alerts provide automated signals when spend deviates from a rolling baseline. A 20% day-over-day spike in one service is worth investigating before the month closes.
Phase 2 — Optimize
Once you can see your costs, you can reduce them. Optimization is a portfolio of interventions, each with a different time horizon and complexity:
- Immediate wins (days): delete idle resources — unattached EBS volumes, unused Elastic IPs, stopped EC2 instances that have not run in 30 days, orphaned load balancers. AWS Trusted Advisor and the open-source
cloud-nuketool automate discovery. Most organisations find 5–15% of spend here on the first sweep. - Medium-term wins (weeks): right-sizing instances, moving workloads to graviton/ARM, enabling S3 Intelligent-Tiering, setting lifecycle policies on CloudWatch Logs (default retention is forever; change it to 30 days unless compliance requires longer).
- Structural wins (months): commitment discounts — Reserved Instances, Savings Plans, GCP CUDs — typically deliver 40–70% off on-demand for stable baseline workloads. Spot/Preemptible for fault-tolerant batch. Architectural changes like replacing a polling pattern with an event-driven fanout, or replacing a NAT gateway with a VPC endpoint for S3 and DynamoDB traffic.
Phase 3 — Operate
Operate is where FinOps becomes culture rather than a project. The goal is to embed cost awareness into every engineering workflow so that optimisation happens continuously rather than in quarterly fire drills. The mechanisms are:
- Cost gates in CI/CD. Tools like Infracost integrate into Terraform PRs and post a cost diff comment before merge. An engineer adding a new RDS multi-AZ instance sees the $400/month impact before the code ships. This is the FinOps equivalent of shifting security left.
- Per-team budget alerts at 80% and 100%. Alerts go to the team Slack channel, not just to finance. The team owns the response.
- Unit economics tracking. Cost per transaction, cost per active user, cost per API call — tracked in the same dashboards as latency and error rate. When cost-per-user starts climbing while user count is flat, something architectural has changed and the team sees it immediately.
- Regular FinOps reviews. Monthly reviews at the team level, quarterly reviews at the VP/CTO level. Review what was committed, what was optimised, and what the next quarter's target is.
Personas: Who Does FinOps?
The FinOps Foundation identifies three personas that must collaborate for the framework to work:
- Engineering — makes the architectural and provisioning decisions that determine cost. Responsible for tagging, right-sizing, and implementing Spot/Savings Plans.
- Finance — owns the budget, does chargeback, and tracks cloud spend against forecasts. Needs granular, trusted allocation data from Engineering.
- Product/Business — owns unit economics. Determines which cost targets are acceptable given the business model (a low-margin SaaS has a very different cost tolerance than a high-margin enterprise product).
At a company like Netflix or Spotify, FinOps is a dedicated team of 10–20 engineers and analysts. At a mid-size SaaS, it is typically a shared responsibility across a platform team and a finance business partner, meeting weekly. At a startup, it is one person looking at the Cost Explorer dashboard once a month. The tooling and cadence scale; the principles do not change.
Getting Started: The First 30 Days
If you are the engineer tasked with starting a FinOps programme from scratch, the first 30 days should produce three things: a tagging standard that is enforced by policy, a Slack channel or dashboard that shows each team's weekly cost, and a list of the top 10 idle resources with owners identified. Nothing else. Resist the temptation to buy a FinOps platform (Apptio Cloudability, CloudHealth, Spot.io) on day one — you do not yet understand your own data well enough to configure it correctly. Start with native tools (Cost Explorer, BigQuery Billing Export, Azure Cost Management), build intuition, then evaluate third-party platforms from a position of knowledge.