programming · level 8

Types & Shapes

Records, tuples, sum types — and how to make illegal states unrepresentable.

175 XP

Types & Shapes

Programming languages give you a small alphabet of ways to combine data into types. Knowing which one fits your data is one of the highest-leverage skills you'll develop — pick the right shape and bugs become impossible to write; pick the wrong one and you'll be writing validation code forever.

Three shapes

Almost every type you'll ever build is one of these three:

Shape What it represents Example
Record (struct, object) A fixed group of named fields User { id, email, signupDate }
Tuple A fixed group of positional values Point = (x, y)
Sum type (tagged/discriminated union) One of several alternatives Result = Ok(value) | Err(error)

Plus collections (lists, maps, sets) for "many of the same thing".

Most type design boils down to picking among these.

Records — named fields

A record groups a fixed set of named fields. The name is part of the contract.

interface User {
  id: string;
  email: string;
  signupDate: Date;
}

When to reach for a record:

  • The fields have meaningfully different roles (name is not interchangeable with email).
  • Code accesses fields by name (user.email).
  • The set of fields is stable across instances of the type.

Records are the workhorse — most domain types are records.

Tuples — positional values

A tuple is a fixed-size collection where position matters more than name.

type Point = [number, number];
type RgbColor = [number, number, number];
type DbResult = [error: Error | null, rows: Row[]];

When to reach for a tuple:

  • The values have a natural order (x then y, RGB).
  • The size is fixed and small (2-4 elements).
  • Naming each one feels redundant.

The honest truth: in most languages, a record is a slightly better default than a tuple because field names document themselves. Reach for a tuple specifically when the positional meaning is universally understood (mathematical pairs, entries in a key-value map).

Sum types — one of several

A sum type (also called tagged union, discriminated union, algebraic data type, enum-with-data) represents exactly one of several alternatives. Each variant has its own associated data.

type OrderStatus =
  | { kind: "pending" }
  | { kind: "shipped"; trackingNumber: string; shippedAt: Date }
  | { kind: "cancelled"; reason: string };

This is the shape most engineers underuse. The reason it matters: the compiler now knows that trackingNumber exists IF AND ONLY IF the order is shipped. You literally cannot construct a "shipped order without a tracking number" — the type doesn't admit it.

When to reach for a sum type:

  • A value is in exactly one of several mutually-exclusive states.
  • Different states carry different associated data.
  • You want the compiler to force exhaustive handling.

Make illegal states unrepresentable

This is the slogan that captures the highest-payoff use of sum types. Yaron Minsky's article from Jane Street, often quoted, paraphrased:

If your types let you build a value that shouldn't exist, you'll spend forever writing validation. Design types that don't admit illegal values in the first place.

The classic anti-pattern:

interface Order {
  status: "pending" | "shipped" | "cancelled";
  trackingNumber?: string;       // present iff shipped
  cancellationReason?: string;   // present iff cancelled
  shippedAt?: Date;              // present iff shipped
}

This shape lets you construct nonsense:

const o: Order = {
  status: "shipped",
  trackingNumber: undefined,        // legal but wrong!
  cancellationReason: "user changed mind",  // legal but contradictory!
};

The fix is a sum type. Each variant carries exactly the data that's valid for that state:

type Order =
  | { status: "pending" }
  | { status: "shipped"; trackingNumber: string; shippedAt: Date }
  | { status: "cancelled"; reason: string };

// The compiler refuses:
const o: Order = { status: "shipped" };  // Error: missing trackingNumber, shippedAt
const o: Order = { status: "shipped", reason: "x", trackingNumber: "y", shippedAt: new Date() };  // Error: 'reason' not allowed on 'shipped' variant

The validation logic that used to be scattered through your code is now collapsed into the type definition. The compiler does the work.

Exhaustiveness checking

The other big payoff of sum types: the compiler can check you've handled every variant.

function describe(o: Order): string {
  switch (o.status) {
    case "pending":   return "awaiting shipment";
    case "shipped":   return `tracking ${o.trackingNumber}`;
    case "cancelled": return `cancelled: ${o.reason}`;
  }
}

If a future engineer adds a "refunded" variant to Order and forgets to update describe, TypeScript with noImplicitReturns will refuse to compile. The bug is impossible to introduce silently.

This is the meta-superpower of sum types: changes to the type force changes to all the code that touches it.

When records still win

Records aren't always the wrong answer.

  • When all fields are always present. A User with id, email, and signupDate always populated — there's no "user without an id" state. Record is the right shape.
  • When the cardinality is huge. "A point in N-dimensional space" or "a JSON object with arbitrary keys" doesn't fit sum types — too many cases.
  • When the variants are open-ended. "Any HTTP error" might be 50+ status codes; sum-typing every one is overkill. Use a record with code + message.

Collections

The fourth shape, less foundational but everywhere:

Collection Use for
List / array Ordered, possibly-many of the same type
Map / dict / hash Key-keyed lookup
Set Membership check, deduplication
Tree / graph Hierarchical or networked data

Pick the collection by access pattern. List for "iterate in order"; map for "look up by key"; set for "is X in here?".

Composing types

Real types are usually compositions:

interface User {
  id: string;
  email: string;
  preferences: UserPreferences;          // record inside record
  recentOrders: Order[];                 // list of sum types
  twoFactor:
    | { kind: "off" }
    | { kind: "totp"; secret: string }
    | { kind: "u2f"; key: U2fKey };      // sum type as a field
}

Each leaf is a record, tuple, sum type, or collection. The composition is the type.

Reading types

Reading a complex type is a skill. The trick: read outside-in.

Map<UserId, Set<TaskId>>
  • Outside: Map<K, V> — a map.
  • Keys: UserId.
  • Values: Set<TaskId> — a set of task IDs.

So: "for each user, the set of task IDs they own."

Sum types compose similarly:

Result<User, ApiError>
// → either Ok(User) or Err(ApiError)

Language differences

The same shapes have different names in different languages:

Concept TypeScript Python Rust Go
Record interface, type dataclass, TypedDict struct struct
Tuple [T, U] tuple[T, U] (T, U) (no first-class; use struct or [2]any)
Sum type discriminated union union of dataclasses + match (3.10+) enum (no first-class; use interface + tag field)

Go's lack of sum types is the language's most-felt limitation. The idiom is a record with a kind field plus a switch — same idea, no compiler exhaustiveness check.

What to internalise

  • Three shapes account for almost everything: record, tuple, sum type.
  • Records: when fields differ in role and are stable.
  • Tuples: when position is universal (math pairs).
  • Sum types: when the value is one of several alternatives — and you want the compiler to enforce exhaustive handling.
  • Make illegal states unrepresentable: design types so nonsense values can't be constructed. Validation code disappears.

Tools in the wild

4 tools
  • TypeScriptfree tier

    First-class discriminated unions and exhaustiveness checking.

    library
  • Pyrightfree tier

    Type checker for Python with good union narrowing support.

    cli
  • Rustfree tier

    Enums + pattern matching are core; the compiler refuses unhandled cases.

    cli
  • Zodfree tier

    Runtime schema validation for TypeScript — record/tuple/union with parse-time guarantees.

    library