arcadegame.cloud

Prompt

A query that filters on a column is doing a full table scan and it's slow. Walk me through what adding an index actually changes — and then talk me through what it costs you everywhere else.

How this round runs

This is a conversation that escalates — I'll keep pushing into "and what happens when…" until I find the edge of what you know. I want the mechanism, not just "indexes make reads faster," and I'm watching whether you can reason about the column-order and write-cost trade-offs rather than reciting that more indexes are better.

Model answer

I'll go from the read path, to the structure that makes it fast, to what that structure costs on writes. A full scan is O(n) — it touches every row. A B-tree index turns the lookup into O(log n): it's a balanced, sorted, high-fanout tree, so for a million rows you walk maybe three or four levels instead of scanning a million rows. The "balanced" part is the load-bearing detail — it's why the lookup cost stays logarithmic as the table grows instead of degrading.

The consequence people skip is the write cost. The index is a second sorted structure that has to stay consistent with the table, so every INSERT, UPDATE of an indexed column, and DELETE now also mutates the tree — extra page writes, occasional node splits, and more pages to keep in cache. So an index is a read-speed-for-write-cost trade, and on a write-heavy or append-only table the maintenance can cost more than the scan it saves.

Two more things govern whether it even helps. Selectivity: an index on a low-cardinality column like a boolean flag barely narrows the search, so the planner may ignore it and scan anyway. And ordering on composite indexes — an index on (a, b) is sorted by a first, so it serves WHERE a = ? and WHERE a = ? AND b = ?, but not a query on b alone, because b isn't contiguous in the tree. My honest boundary: whether the planner actually picks the index is a cost-estimate decision, and stale statistics can make it choose wrong even when a good index exists.

Signals — what a strong answer shows

Walked read path → structure → write cost in order, instead of just 'indexes speed up reads'
Named the B-tree's balanced/logarithmic property as why lookup scales, and went to page splits on writes
Framed it as an explicit read-vs-write trade and named when NOT to index (write-heavy, low selectivity)
Got composite-index column order right: (a,b) serves a and a+b, not b alone
Honest boundary: the planner's cost estimate, driven by statistics, decides whether the index is used

Follow-ups — where it goes next

You have a composite index on (a, b) — which queries does it accelerate and which does it not? → leftmost-prefix: a and a+b yes, b alone no
And what happens when the query selects only columns that are all in the index? → covering index: the index alone answers it, the table heap is never touched
When would you deliberately NOT add an index even though a column is queried? → low selectivity, or a write-heavy table where maintenance cost dominates

Study this: Databases

Database Indexes

Prompt

How this round runs

Model answer