Data Sources
Read existing infra without owning it.
Data Sources
A data block reads existing infrastructure WITHOUT managing it. The opposite of a resource — Terraform queries, gets attributes back, and never creates / updates / destroys anything.
The shape
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical's AWS account
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id # use the result
instance_type = "t3.micro"
}
data "aws_ami" "ubuntu" reads the latest matching AMI. Reference attributes with data.<type>.<name>.<attr>.
Why use them
Three common cases:
- Look up something the cloud assigned. AMI IDs change per region and per release; pinning a specific ID in code is brittle. A data source picks the right one at plan time.
- Read state from another team's Terraform.
terraform_remote_statereads outputs from a separate state file — the way two configs hand off values without a shared module. - Reference resources you don't own. A bucket created manually, a hosted zone you bought through the console — read its attributes without taking it over.
Common data sources
# Read the current AWS account ID + region.
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# Read your default VPC.
data "aws_vpc" "default" {
default = true
}
# Read a hosted zone you didn't create here.
data "aws_route53_zone" "main" {
name = "example.com."
}
# Read another stack's outputs.
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "tf-state"
key = "network/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.network.outputs.public_subnet_ids[0]
}
Data sources run on every plan
Data blocks are re-evaluated every terraform plan and terraform apply. That's how the latest AMI ID stays current. The cost is a CloudControl call (or a state read) every time. Most data sources are cheap; some (filtering thousands of items) aren't — be aware in tight loops.
data vs resource
The mental model:
resource |
data |
|
|---|---|---|
| Creates / updates / destroys? | Yes | No |
| Reads attributes? | Yes | Yes |
| Lives in state? | Yes | A short-lived attribute snapshot |
| Required for Terraform to "own" the thing? | Yes | Never |
Use data when you want attribute lookup without ownership. Use resource when Terraform is the source of truth.
Don't fight ownership
The most common bug: defining a resource for something a different team already manages. Two configs apply against the same cloud thing → both think they own it → fights. The fix is a data source on one side and a clear ownership boundary.
terraform_remote_state vs data sources
If you're handing off values between two configs in the same organisation, two patterns:
terraform_remote_state— reads outputs from another state file. Tight coupling: consumer needs to know the producer's state location. Useful for tightly-related stacks.- A pure data source on the underlying cloud resource (
data "aws_vpc"looking up a tagged VPC). Looser coupling: consumer queries the cloud, not the producer's state. Survives a producer state move/migration.
Prefer the data-source approach when it's available. terraform_remote_state couples your stacks together more than is usually wise.