Slow SQL Query
Prompt
This query is in a PR for the dashboard's "recent active users and their posts" panel. It renders on every dashboard load — call it a few hundred requests a minute at peak — and the author says it "runs fine locally." Review it. Would you approve, and how much does the hot-path framing change your answer?
SELECT u.id, u.name, p.id, p.title, p.created_at
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01';
How this round runs
Rank what you find by severity, factor in that this runs hundreds of times a minute, and give an approve / approve-with-follow-up / block call. Say what you'd ask for before trusting it in production.
Model answer
Approve only with a required follow-up before it ships to prod — the query is correct, but its performance profile is unverified on a hot path.
1 (highest — full table scan on the filter). There is almost certainly no
index on users.created_at, so the planner scans the entire users table to
find rows after the cutoff. "Runs fine locally" is the tell: a dev DB has a few
thousand rows where a seq scan is instant; production has millions where the same
scan is O(n) per request. At a few hundred req/min that scan runs hundreds of
times a minute, so this is a real latency-and-load finding, not a micro-opt.
2 (join key). If posts.user_id is unindexed, the join degrades too — every
qualifying user triggers a scan or an expensive hash build on posts. Both
indexes matter; the filter one matters most.
Root cause is missing indexes on the columns the query filters and joins on — not
the query text itself, which is fine. The diagnosis is the part I'd insist on
before approving: don't add indexes on a hunch, run EXPLAIN ANALYZE on prod-like
data and confirm a Seq Scan on users. Then:
CREATE INDEX idx_users_created_at ON users (created_at);
CREATE INDEX idx_posts_user_id ON posts (user_id);
A covering index users (created_at, id) lets the filter and the join-key read
come from the index alone, avoiding heap lookups — worth it given the call
frequency. Two caveats I'd state, not hide: every index slows writes and costs
storage, and the indexes should be created CONCURRENTLY (Postgres) so building
them doesn't lock users on a live table.
So: approve conditional on an EXPLAIN ANALYZE showing the scan, the two indexes
added concurrently, and a before/after latency number. Without the hot-path
framing I'd be more relaxed — for a once-a-day report a brief scan is tolerable;
at hundreds of req/min it isn't, and that's what flips this from a nice-to-have
to a merge blocker on the follow-up.
- Identified the seq scan on the WHERE column as the top finding and ranked it above the join-key index
- Used the call frequency to set severity — same query, different verdict at hundreds/min vs once/day
- Insisted on EXPLAIN ANALYZE to confirm the plan before adding indexes, rather than indexing on a hunch
- Named the write/storage cost and CONCURRENTLY locking caveat — didn't present indexes as free
- Concrete verdict: approve-with-required-follow-up, with the follow-up spelled out
- Would you approve as-is? → Not on a hot path; conditional on EXPLAIN-confirmed indexes and a before/after number.
- Why a composite index over two single-column ones? → users(created_at, id) covers both the filter and the join read, avoiding heap lookups.
- What would change your mind on severity? → If this ran rarely, or on a small/bounded table, the scan is tolerable and I'd not block.