Architecting for Cost
Architecting for Cost
Every architectural decision carries a price tag that compounds over the lifetime of the system. A senior engineer who chooses a synchronous cross-region call over an async queue, or stores warm analytics data in S3 Standard instead of Intelligent-Tiering, is making a cost decision — usually without realising it. At $5M/month of cloud spend, architecture-driven waste routinely accounts for 20–40% of the bill, dwarfing the gains from instance right-sizing and reservation coverage. This lesson covers the three highest-leverage architectural levers: egress-aware design, storage tiering, and serverless economics.
Egress-Aware Design
Data transfer fees are the most widely underestimated line on a cloud bill. AWS charges nothing for ingress, but charges $0.09/GB for data leaving a region to the internet, $0.02/GB for cross-region transfer, and $0.01/GB for cross-AZ transfer in both directions. GCP and Azure follow similar structures. These numbers look small until you run the math: a microservices architecture producing 50 TB/day of inter-service traffic crossing AZ boundaries costs roughly $15,000/month in transfer fees alone.
- Co-locate data and compute in the same AZ. An EC2 instance reading from an RDS replica in the same AZ pays nothing. The same read crossing an AZ boundary costs $0.01/GB each direction. Use AZ-affinity routing (Kubernetes
topologyKey: topology.kubernetes.io/zone) to keep hot paths local. - Replace NAT gateway with VPC endpoints. S3 and DynamoDB have free Gateway Endpoints that route traffic privately inside AWS — no NAT gateway charge ($0.045/GB processed). A single NAT gateway handling 100 TB/month of S3 traffic costs $4,500/month; the same traffic over a Gateway Endpoint costs $0.
- Push data to the edge, not the origin. CloudFront cache hit rates of 80–95% mean the origin never serves that data. A service streaming 1 PB/month from S3 direct costs ~$23,000 in egress; through CloudFront at 85% hit rate it costs ~$8,500.
- Shrink inter-service payloads. Fanout patterns — one event triggering 10 downstream calls each returning 200 KB of JSON — multiply egress silently. Use Protobuf or Avro for internal communication (5–10x smaller than JSON) and return projected fields rather than full resource documents.
Storage Tiering
Object storage is priced by tier, and most teams use exactly one tier — Standard — for everything. The result is significant overpayment for data that is accessed rarely or never. AWS S3 storage pricing ranges from $0.023/GB/month (Standard) down to $0.004/GB/month (Glacier Deep Archive) — a 6x spread. A 1 PB dataset that has not been accessed in 180 days but sits in Standard costs $23,000/month unnecessarily.
The authoritative tiers and their access patterns:
- S3 Standard: $0.023/GB. Active data accessed multiple times per month. Never expire from here automatically — the data is being used.
- S3 Intelligent-Tiering: $0.023/GB + $0.0025/1,000 objects monitoring fee. Automatically moves objects between frequent-access and infrequent-access tiers based on 30-day access patterns. Use this as the default for any data whose access pattern is uncertain. The monitoring fee is negligible above 128 KB object size.
- S3 Standard-IA (Infrequent Access): $0.0125/GB storage but $0.01/GB retrieval. Right for backups and disaster-recovery data accessed a handful of times per year. Wrong for data accessed daily — the retrieval fee exceeds the storage savings.
- S3 Glacier Instant Retrieval: $0.004/GB. Millisecond retrieval latency. Right for compliance archives accessed once a quarter.
- S3 Glacier Deep Archive: $0.00099/GB. 12-hour retrieval. Right for regulatory archives that must be retained for 7 years and will almost never be accessed.
Beyond object storage, apply the same tiering mindset to block and database storage. EBS gp3 is 20% cheaper than gp2 with identical IOPS at baseline — there is no reason to use gp2 for new volumes. RDS and Aurora offer storage auto-scaling that prevents over-provisioning. ElastiCache Redis clusters should use tiered node types: a cache.r7g.large for the hot working set backed by S3 or DynamoDB for the cold tail is almost always cheaper than a cluster sized for the full dataset.
Serverless Economics
Serverless (Lambda, Cloud Functions, Azure Functions) and managed containers (Fargate, Cloud Run) turn the cost model from capacity to consumption. This is a fundamentally different economic structure, and it cuts both ways: at low and spiky utilisation serverless is dramatically cheaper than reserved instances; at high sustained utilisation it becomes dramatically more expensive.
The break-even point for Lambda vs. a reserved EC2 instance is roughly 20–30% utilisation. Below that threshold, Lambda wins on cost. Above it, a Reserved Instance or Savings Plan wins. The critical analysis questions are:
- What is the p50 and p99 invocation rate over a 24-hour period? Workloads with a 10x day/night ratio are classic Lambda candidates; workloads with flat 24/7 traffic are not.
- What is the function duration? Lambda charges per GB-second of execution. A function using 512 MB for 200 ms costs $0.000001667 per invocation. At 1 billion invocations/month (a large-scale API), that is $1,667/month — versus a 10-node ECS Fargate cluster for the same workload at ~$800/month. Duration discipline matters: trim unused memory allocations, and measure actual memory usage with Lambda Power Tuning rather than guessing.
- What does cold-start latency cost you in user experience? Lambda cold starts range from 100 ms (Python/Node) to 1–3 seconds (JVM with large classpaths). Provisioned Concurrency eliminates cold starts but at a cost of ~$0.015/hour per concurrency unit — use it only for latency-critical paths.
For sustained workloads migrating off Lambda, Fargate Spot offers up to 70% savings over standard Fargate while remaining fully serverless (no instance management). Combine it with AWS Graviton2/3 task definitions (ARM64): Graviton tasks on Fargate are 20% cheaper than x86 at identical vCPU and memory, and typically 15–30% faster for CPU-bound workloads. A task running 1 vCPU / 2 GB on Graviton Fargate Spot costs roughly $0.008/hour — two-thirds the price of the same task on standard Fargate x86.
Synthesis: Architectural Cost Review Checklist
Before any significant architecture review or pre-production readiness check, run through these questions:
- Does every cross-service call cross an AZ or region boundary? Is that justified by resilience requirements, or is it accidental topology?
- Is there a NAT gateway in the path for traffic that could use a VPC endpoint instead?
- Is any data stored in S3 Standard that has not been accessed in 30 days? Is there a lifecycle rule?
- Are Lambda functions memory-profiled? Are log groups set to expire?
- Is serverless being used for sustained-high-throughput workloads where reserved compute would be cheaper?
- Are CDN cache hit rates being measured? Is the cache TTL set deliberately or left at framework default?