Cost Anti-Patterns
The leaks every cloud bill has.
Cost Anti-Patterns
Every cloud bill has the same half-dozen line items leaking money, and engineers find them by staring at Cost Explorer for an hour. This lesson is that hour, compressed.
Analogy
Think of a big old house at the end of a long month when the utility bills are all higher than expected. The same small problems show up in every house: a tap dripping in the upstairs bathroom, an unused fridge humming in the garage, a window left ajar under the thermostat, lights on in the basement nobody visits. None of them is dramatic on its own, but together they are half the bill. A walk-through with a torch and a checklist finds them in an hour — and it's the same leaks every time.
The top five leaks
- NAT Gateway data processing. $0.045/GB through the NAT, on top of egress. A chatty private-subnet workload can quintuple your internet bandwidth bill without you noticing.
- Cross-AZ data transfer. $0.01/GB each way between AZs in the same region. Microservices in different AZs yell at each other 24/7; the bill scales linearly.
- Egress to the internet. First 100 GB/month free, then $0.09/GB (dropping at volume). At TB scale, egress is often the single biggest line item.
- Unbounded log ingestion. CloudWatch Logs is $0.50/GB ingested + storage + queries. A DEBUG-verbose microservice at scale can hit five figures in a month.
- Idle resources. Unattached EBS volumes, idle NAT Gateways, unused ALBs, oversized RDS in non-prod. Pure waste that compounds.
NAT Gateway is the loudest leak
A NAT Gateway costs three ways:
- Hourly per gateway per AZ: ~$0.045/hr = ~$32/month just for existing.
- Per-GB data processing: $0.045/GB of traffic through the gateway.
- Plus regular internet egress on top.
A private-subnet app pushing 1 TB/month of logs to a third-party SaaS = $45 NAT processing + ~$90 internet egress = $135/month. Pull it through a Gateway VPC endpoint (for S3/DynamoDB, free) or an Interface VPC endpoint / PrivateLink (for most AWS services, cheaper than NAT at scale) and most of that evaporates.
Cross-AZ is the sneaky leak
It's $0.01/GB each way. Tiny per-request, but microservices-at-scale chat looks like:
- 500 req/s × 2 KB payload × 2 directions = ~86 GB/day = ~2.6 TB/month = ~$26/mo per service-pair cross-AZ link.
You can have 30 of those pairs. $800/month, gone. Fixes:
- Deploy services zonally (one deployment per AZ) and prefer same-AZ traffic.
- Use VPC Lattice or service mesh zone-aware load balancing so in-zone endpoints win the routing decision.
- When multi-AZ HA matters, accept the cost intentionally — but measure it.
Egress to the internet
Most of what you send out is to other AWS accounts or the same cloud; only internet egress is expensive. Reduce it by:
- Serving through CloudFront. Cache-hit egress is cheaper than origin egress.
- Using VPC peering / Transit Gateway instead of going out and back in through public IPs.
- Compressing responses (gzip/brotli) — easy 3–5× savings on JSON APIs.
Log ingestion: the silent killer
CloudWatch Logs is very expensive at scale ($0.50/GB ingested). Tactics:
- Sample INFO and DEBUG in production.
- Ship high-volume logs to S3 + Athena or Firehose → S3 for cheap retention with queryable access.
- Keep CloudWatch for operational logs only; long-tail archive elsewhere.
Idle and oversized resources
Run these checks weekly:
- Unattached EBS volumes. Snapshot, then delete.
- Idle NAT Gateways in AZs with no workloads.
- Load balancers with zero targets.
- Elastic IPs not attached to a running instance (billed per hour when detached).
- Non-prod databases running 24/7 at production size.
- Oldest EC2 generation — gp3 is cheaper and faster than gp2; m7i beats m5 on $/vCPU-hr.
The one chart to watch
In Cost Explorer, group by Usage Type not Service. That surfaces the granular line items — DataTransfer-Regional-Bytes, NatGateway-Bytes, DataProcessing-Bytes, LogsIngestion — that Service grouping hides.
The right order of work
- Delete waste. Idle resources. Pure profit, zero risk.
- Right-size. Compute Optimizer + Trusted Advisor have concrete recommendations.
- Fix architecture leaks. NAT processing, cross-AZ, egress — the structural stuff.
- Commit to discounts. Only after steps 1–3. Reserved Instances and Savings Plans lock you in — you want to be sure what you're committing to is right-shaped first.
Tag everything with an owner. Untagged resources are where waste hides because nobody feels responsible for them.