practice · level 8

Defense in Depth

Layered controls, least privilege, network segmentation, and the honeypot in your VPC.

250 XP

Defense in Depth

The single biggest mistake in security architecture is making one control load-bearing — the WAF, the perimeter firewall, the bcrypt hash — and assuming compromise of that control means compromise of the system. Real-world breaches almost always involve a single weak link plus the absence of secondary defenses. Defense in depth is the discipline of building those secondary defenses on purpose.

The principle

Multiple independent controls, each fails-closed, each operating at a different layer. An attacker who breaks one still has to break the next, and the next, and so on. Each layer:

Is independent — its failure mode is uncorrelated with other layers.
Fails closed — when broken, it denies access rather than allowing it.
Has a detection signal — you know when it has been touched, even if it holds.

A common mental model:

         ┌─────────────────────────────────────────┐
         │  Public Internet                         │
         └──────────┬──────────────────────────────┘
                    │
         ┌──────────▼─────────┐    Edge: WAF + DDoS + bot mgmt
         │  CDN / WAF         │
         └──────────┬─────────┘
                    │
         ┌──────────▼─────────┐    DMZ: ALB + auth gate
         │  Public subnet     │
         │  (DMZ)             │
         └──────────┬─────────┘
                    │
         ┌──────────▼─────────┐    App tier: AuthZ checks
         │  Private app       │    + audit log + rate limits
         │  subnet            │
         └──────────┬─────────┘
                    │
         ┌──────────▼─────────┐    Data tier: encryption + IAM
         │  Private data      │    + per-row ACLs
         │  subnet            │
         └────────────────────┘

The attacker faces five (or more) gates between them and your data. Each gate has its own administrator, its own log, its own failure mode.

Least privilege — the bedrock principle

Every identity (user, role, service, machine) should have exactly the permissions it needs and nothing more.

The web app role can read its own DB tables. It cannot read other apps' tables.
The build pipeline role can push images to its own ECR repo. It cannot read S3 buckets.
The IAM admin can create roles. They cannot read customer PII.
The CI/CD's deploy role gets short-lived credentials, scoped to one environment.

Specific patterns:

Per-app IAM roles, not shared. If app A is compromised, app B's data is still safe.
Per-environment isolation: prod role can't touch staging, staging can't touch prod, and dev can't touch either.
No long-lived * permissions in production. Audit them quarterly. Use OPA / Conftest / IAM Access Analyzer to surface over-broad policies.
Just-in-time access for humans: an SRE needing prod-shell access requests it; it's auto-revoked in 4 hours.

Network segmentation

A flat network is the attacker's dream. Once they're in anywhere, they're in everywhere. Segmentation imposes structure:

DMZ (public-facing subnet): only the bare minimum that needs internet ingress lives here. Load balancers, edge proxies. Nothing with persistent state.
App subnet (private): app servers, autoscaling groups. Egress to the internet via NAT (so they can call third-party APIs); no ingress except from the DMZ.
Data subnet (private): RDS, ElastiCache, Elasticsearch. No internet route at all. Ingress only from the app subnet, on the specific DB port.

Inside each tier, use security groups (or k8s NetworkPolicies) to limit lateral movement. App pod A can call app pod B only if there's a rule saying so. The SRE-bastion ENI is the only path from human → prod, and it's logged.

The cheapest layer in your stack — and one of the highest-impact. Configure once in Terraform, get it for life.

The data-tier rule

Your database should not have a route to the internet. There is no legitimate reason for prod-postgres-1 to be able to make outbound HTTP calls. If it can, an attacker who lands code in your app tier can exfil to anywhere they want.

Concretely in AWS:

- RDS lives in a "data" subnet group.
- The data subnet has NO route to an Internet Gateway.
- The data subnet has NO NAT.
- Access to RDS goes through VPC endpoints from the app subnet only.

The cost of getting this wrong is total data exfil after a single CVE. The cost of getting it right is zero — Terraform changes that don't add complexity.

The honeypot in your VPC

Detection is half of defense. The most underrated detection layer in 2024 is canary tokens — fake credentials, fake DBs, fake endpoints that exist only to be touched.

Plant them in places attackers look:

A fake aws-credentials file in your repo with a working-looking key. The key is registered with AWS but has no permissions. The moment it's used anywhere on Earth, you get an alert.
A fake ~/.kube/config that points to a sentinel cluster.
A fake admin user in your DB whose login attempt fires a Slack page.
A fake payments Postgres table accessible from the DB. SELECT against it fires an alert.
A fake "internal-admin-tool" subdomain that's never linked anywhere — only attackers scanning your DNS find it.

Real attackers grep for credentials, scan DNS, and probe for "admin" routes. The decoys cost you nothing and cost the attacker their stealth.

Tools: Thinkst Canary (commercial), Canarytokens.org (free), AWS GuardDuty's "honeytoken" feature.

Failure modes

A defense layer is only useful if it fails closed:

WAF rule with action=monitor is not a defense. It's a dashboard. Ship it as block.
IAM policy that's "we'll review violations later" isn't a defense. It's an audit log of breaches.
Rate limit that returns 200 if Redis is down is a wide-open door. Fail-closed: refuse the request.

Every layer needs a "what does this do when it fails?" answer that's "deny." If you can't answer that, it's not a layer — it's a hope.

Independence — the harder property

Two layers built on the same foundation are one layer. Examples:

"WAF + IDS, both pulling from the same threat-intel feed" — both fail when the feed lies.
"Network ACL + security group, both managed by the same Terraform module without testing the path" — both go missing in the same mis-merge.
"Two MFA factors that both depend on the carrier (SMS + voice call)" — one SIM-swap takes both.

When designing a layer, ask: "If layer N is breached, is layer N+1 affected?" Aim for "no."

Concrete checklist

For a typical web application:

Edge:
  ☐ TLS everywhere; HSTS preload
  ☐ WAF with managed rule sets, in *block* mode
  ☐ CDN with bot management

DMZ:
  ☐ Public subnet for ALB/edge proxy only
  ☐ Security group limits ingress to 80/443
  ☐ IDS / VPC flow logs to SIEM

App tier:
  ☐ Private subnet, NAT egress only
  ☐ Per-service IAM role, scoped to its data
  ☐ Authentication required on every endpoint
  ☐ Authorization checks at every layer (edge, app, DB)
  ☐ Rate limits, audit logs

Data tier:
  ☐ Private subnet, NO internet route
  ☐ Encryption at rest (KMS) + in transit (TLS)
  ☐ Per-app DB user, minimal grants
  ☐ Backup with separate IAM and a different region

Identity:
  ☐ MFA on all human accounts
  ☐ Short-lived credentials, auto-rotated
  ☐ Just-in-time elevation for prod access
  ☐ All access logged to a tamper-evident SIEM

Detection:
  ☐ Canary tokens in repos, DBs, file shares
  ☐ GuardDuty / equivalent on VPC flow logs
  ☐ Alerting on impossible-travel logins
  ☐ Alerting on first-time IAM permission usage

Most of this is one-time configuration. The hard part is keeping it green over years of feature work — when the engineer who set it up leaves, when "let's just open this up temporarily for a demo" becomes permanent, when the new microservice forgets to use the per-service IAM role.

The mindset

Single-control architectures are convenient until they aren't. The team that ships defense in depth doesn't have spectacular security stories — they have boring ones. "We saw the canary fire, rotated everything in 20 minutes, attacker had nothing." The team without it has spectacular stories — and they're all bad.

Build for the day after the worst day of your security career. That's what depth means.

Tools in the wild

5 tools

AWS Security Hub
Centralized findings across IAM, GuardDuty, Inspector, Config — measures defense-in-depth posture.
service
Thinkst Canary
Hosted honeypot appliances + tokens; the gold standard for low-noise detection.
service
Canarytokens (free)free tier
Free canary tokens — DNS, HTTP, AWS keys, Word doc, and more.
service
OPA / Conftestfree tier
Policy-as-code for IAM, k8s, Terraform — enforce least-privilege at design time.
library
AWS GuardDuty
Anomaly detection on VPC flow logs / DNS / CloudTrail — flags unusual lateral movement.
service