terraform · level 6

Data Sources

Read existing infra without owning it.

100 XP

Data Sources

A data block reads existing infrastructure WITHOUT managing it. The opposite of a resource — Terraform queries, gets attributes back, and never creates / updates / destroys anything.

The shape

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]   # Canonical's AWS account

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id      # use the result
  instance_type = "t3.micro"
}

data "aws_ami" "ubuntu" reads the latest matching AMI. Reference attributes with data.<type>.<name>.<attr>.

Why use them

Three common cases:

  1. Look up something the cloud assigned. AMI IDs change per region and per release; pinning a specific ID in code is brittle. A data source picks the right one at plan time.
  2. Read state from another team's Terraform. terraform_remote_state reads outputs from a separate state file — the way two configs hand off values without a shared module.
  3. Reference resources you don't own. A bucket created manually, a hosted zone you bought through the console — read its attributes without taking it over.

Common data sources

# Read the current AWS account ID + region.
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

# Read your default VPC.
data "aws_vpc" "default" {
  default = true
}

# Read a hosted zone you didn't create here.
data "aws_route53_zone" "main" {
  name = "example.com."
}

# Read another stack's outputs.
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "tf-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "web" {
  subnet_id = data.terraform_remote_state.network.outputs.public_subnet_ids[0]
}

Data sources run on every plan

Data blocks are re-evaluated every terraform plan and terraform apply. That's how the latest AMI ID stays current. The cost is a CloudControl call (or a state read) every time. Most data sources are cheap; some (filtering thousands of items) aren't — be aware in tight loops.

data vs resource

The mental model:

resource data
Creates / updates / destroys? Yes No
Reads attributes? Yes Yes
Lives in state? Yes A short-lived attribute snapshot
Required for Terraform to "own" the thing? Yes Never

Use data when you want attribute lookup without ownership. Use resource when Terraform is the source of truth.

Don't fight ownership

The most common bug: defining a resource for something a different team already manages. Two configs apply against the same cloud thing → both think they own it → fights. The fix is a data source on one side and a clear ownership boundary.

terraform_remote_state vs data sources

If you're handing off values between two configs in the same organisation, two patterns:

  • terraform_remote_state — reads outputs from another state file. Tight coupling: consumer needs to know the producer's state location. Useful for tightly-related stacks.
  • A pure data source on the underlying cloud resource (data "aws_vpc" looking up a tagged VPC). Looser coupling: consumer queries the cloud, not the producer's state. Survives a producer state move/migration.

Prefer the data-source approach when it's available. terraform_remote_state couples your stacks together more than is usually wise.