Types & Shapes
Records, tuples, sum types — and how to make illegal states unrepresentable.
Types & Shapes
Programming languages give you a small alphabet of ways to combine data into types. Knowing which one fits your data is one of the highest-leverage skills you'll develop — pick the right shape and bugs become impossible to write; pick the wrong one and you'll be writing validation code forever.
Three shapes
Almost every type you'll ever build is one of these three:
| Shape | What it represents | Example |
|---|---|---|
| Record (struct, object) | A fixed group of named fields | User { id, email, signupDate } |
| Tuple | A fixed group of positional values | Point = (x, y) |
| Sum type (tagged/discriminated union) | One of several alternatives | Result = Ok(value) | Err(error) |
Plus collections (lists, maps, sets) for "many of the same thing".
Most type design boils down to picking among these.
Records — named fields
A record groups a fixed set of named fields. The name is part of the contract.
interface User {
id: string;
email: string;
signupDate: Date;
}
When to reach for a record:
- The fields have meaningfully different roles (
nameis not interchangeable withemail). - Code accesses fields by name (
user.email). - The set of fields is stable across instances of the type.
Records are the workhorse — most domain types are records.
Tuples — positional values
A tuple is a fixed-size collection where position matters more than name.
type Point = [number, number];
type RgbColor = [number, number, number];
type DbResult = [error: Error | null, rows: Row[]];
When to reach for a tuple:
- The values have a natural order (x then y, RGB).
- The size is fixed and small (2-4 elements).
- Naming each one feels redundant.
The honest truth: in most languages, a record is a slightly better default than a tuple because field names document themselves. Reach for a tuple specifically when the positional meaning is universally understood (mathematical pairs, entries in a key-value map).
Sum types — one of several
A sum type (also called tagged union, discriminated union, algebraic data type, enum-with-data) represents exactly one of several alternatives. Each variant has its own associated data.
type OrderStatus =
| { kind: "pending" }
| { kind: "shipped"; trackingNumber: string; shippedAt: Date }
| { kind: "cancelled"; reason: string };
This is the shape most engineers underuse. The reason it matters: the compiler now knows that trackingNumber exists IF AND ONLY IF the order is shipped. You literally cannot construct a "shipped order without a tracking number" — the type doesn't admit it.
When to reach for a sum type:
- A value is in exactly one of several mutually-exclusive states.
- Different states carry different associated data.
- You want the compiler to force exhaustive handling.
Make illegal states unrepresentable
This is the slogan that captures the highest-payoff use of sum types. Yaron Minsky's article from Jane Street, often quoted, paraphrased:
If your types let you build a value that shouldn't exist, you'll spend forever writing validation. Design types that don't admit illegal values in the first place.
The classic anti-pattern:
interface Order {
status: "pending" | "shipped" | "cancelled";
trackingNumber?: string; // present iff shipped
cancellationReason?: string; // present iff cancelled
shippedAt?: Date; // present iff shipped
}
This shape lets you construct nonsense:
const o: Order = {
status: "shipped",
trackingNumber: undefined, // legal but wrong!
cancellationReason: "user changed mind", // legal but contradictory!
};
The fix is a sum type. Each variant carries exactly the data that's valid for that state:
type Order =
| { status: "pending" }
| { status: "shipped"; trackingNumber: string; shippedAt: Date }
| { status: "cancelled"; reason: string };
// The compiler refuses:
const o: Order = { status: "shipped" }; // Error: missing trackingNumber, shippedAt
const o: Order = { status: "shipped", reason: "x", trackingNumber: "y", shippedAt: new Date() }; // Error: 'reason' not allowed on 'shipped' variant
The validation logic that used to be scattered through your code is now collapsed into the type definition. The compiler does the work.
Exhaustiveness checking
The other big payoff of sum types: the compiler can check you've handled every variant.
function describe(o: Order): string {
switch (o.status) {
case "pending": return "awaiting shipment";
case "shipped": return `tracking ${o.trackingNumber}`;
case "cancelled": return `cancelled: ${o.reason}`;
}
}
If a future engineer adds a "refunded" variant to Order and forgets to update describe, TypeScript with noImplicitReturns will refuse to compile. The bug is impossible to introduce silently.
This is the meta-superpower of sum types: changes to the type force changes to all the code that touches it.
When records still win
Records aren't always the wrong answer.
- When all fields are always present. A
Userwithid,email, andsignupDatealways populated — there's no "user without an id" state. Record is the right shape. - When the cardinality is huge. "A point in N-dimensional space" or "a JSON object with arbitrary keys" doesn't fit sum types — too many cases.
- When the variants are open-ended. "Any HTTP error" might be 50+ status codes; sum-typing every one is overkill. Use a record with
code+message.
Collections
The fourth shape, less foundational but everywhere:
| Collection | Use for |
|---|---|
| List / array | Ordered, possibly-many of the same type |
| Map / dict / hash | Key-keyed lookup |
| Set | Membership check, deduplication |
| Tree / graph | Hierarchical or networked data |
Pick the collection by access pattern. List for "iterate in order"; map for "look up by key"; set for "is X in here?".
Composing types
Real types are usually compositions:
interface User {
id: string;
email: string;
preferences: UserPreferences; // record inside record
recentOrders: Order[]; // list of sum types
twoFactor:
| { kind: "off" }
| { kind: "totp"; secret: string }
| { kind: "u2f"; key: U2fKey }; // sum type as a field
}
Each leaf is a record, tuple, sum type, or collection. The composition is the type.
Reading types
Reading a complex type is a skill. The trick: read outside-in.
Map<UserId, Set<TaskId>>
- Outside:
Map<K, V>— a map. - Keys:
UserId. - Values:
Set<TaskId>— a set of task IDs.
So: "for each user, the set of task IDs they own."
Sum types compose similarly:
Result<User, ApiError>
// → either Ok(User) or Err(ApiError)
Language differences
The same shapes have different names in different languages:
| Concept | TypeScript | Python | Rust | Go |
|---|---|---|---|---|
| Record | interface, type |
dataclass, TypedDict |
struct |
struct |
| Tuple | [T, U] |
tuple[T, U] |
(T, U) |
(no first-class; use struct or [2]any) |
| Sum type | discriminated union | union of dataclasses + match (3.10+) |
enum |
(no first-class; use interface + tag field) |
Go's lack of sum types is the language's most-felt limitation. The idiom is a record with a kind field plus a switch — same idea, no compiler exhaustiveness check.
What to internalise
- Three shapes account for almost everything: record, tuple, sum type.
- Records: when fields differ in role and are stable.
- Tuples: when position is universal (math pairs).
- Sum types: when the value is one of several alternatives — and you want the compiler to enforce exhaustive handling.
- Make illegal states unrepresentable: design types so nonsense values can't be constructed. Validation code disappears.
Tools in the wild
4 tools- libraryTypeScriptfree tier
First-class discriminated unions and exhaustiveness checking.
- cliPyrightfree tier
Type checker for Python with good union narrowing support.
- cliRustfree tier
Enums + pattern matching are core; the compiler refuses unhandled cases.
- libraryZodfree tier
Runtime schema validation for TypeScript — record/tuple/union with parse-time guarantees.