Database Indexes
Prompt
A query that filters on a column is doing a full table scan and it's slow. Walk me through what adding an index actually changes — and then talk me through what it costs you everywhere else.
How this round runs
This is a conversation that escalates — I'll keep pushing into "and what happens when…" until I find the edge of what you know. I want the mechanism, not just "indexes make reads faster," and I'm watching whether you can reason about the column-order and write-cost trade-offs rather than reciting that more indexes are better.
Model answer
I'll go from the read path, to the structure that makes it fast, to what that structure
costs on writes. A full scan is O(n) — it touches every row. A B-tree index turns the
lookup into O(log n): it's a balanced, sorted, high-fanout tree, so for a million rows
you walk maybe three or four levels instead of scanning a million rows. The "balanced"
part is the load-bearing detail — it's why the lookup cost stays logarithmic as the table
grows instead of degrading.
The consequence people skip is the write cost. The index is a second sorted structure
that has to stay consistent with the table, so every INSERT, UPDATE of an indexed
column, and DELETE now also mutates the tree — extra page writes, occasional node splits,
and more pages to keep in cache. So an index is a read-speed-for-write-cost trade, and on a
write-heavy or append-only table the maintenance can cost more than the scan it saves.
Two more things govern whether it even helps. Selectivity: an index on a low-cardinality
column like a boolean flag barely narrows the search, so the planner may ignore it and scan
anyway. And ordering on composite indexes — an index on (a, b) is sorted by a first, so
it serves WHERE a = ? and WHERE a = ? AND b = ?, but not a query on b alone, because
b isn't contiguous in the tree. My honest boundary: whether the planner actually picks the
index is a cost-estimate decision, and stale statistics can make it choose wrong even when a
good index exists.
- Walked read path → structure → write cost in order, instead of just 'indexes speed up reads'
- Named the B-tree's balanced/logarithmic property as why lookup scales, and went to page splits on writes
- Framed it as an explicit read-vs-write trade and named when NOT to index (write-heavy, low selectivity)
- Got composite-index column order right: (a,b) serves a and a+b, not b alone
- Honest boundary: the planner's cost estimate, driven by statistics, decides whether the index is used
- You have a composite index on (a, b) — which queries does it accelerate and which does it not? → leftmost-prefix: a and a+b yes, b alone no
- And what happens when the query selects only columns that are all in the index? → covering index: the index alone answers it, the table heap is never touched
- When would you deliberately NOT add an index even though a column is queried? → low selectivity, or a write-heavy table where maintenance cost dominates