databases · level 6

Query Optimization

EXPLAIN, plan reading, and the rewrites that beat the planner.

200 XP

Query Optimization

Most slow queries are slow for boring reasons. The planner picked a sequential scan because there's no index, or it estimated 100 rows when there are 100 million. The skill is reading EXPLAIN, recognising the pattern, and knowing the rewrite.

Analogy

A query plan is a recipe. EXPLAIN shows you the recipe the chef plans to follow; EXPLAIN ANALYZE is the chef cooking it once and reporting back how long each step actually took. Most "slow recipe" complaints are someone trying to make a stew from scratch every dinner instead of pre-prepping the stock once and reusing it (an index). Or the chef thinks they're cooking for 4 when actually it's a banquet for 400 (stale statistics). Or the recipe says "blend with rice from the rice bag" but the rice bag is at the bottom of a 50-foot warehouse (Seq Scan when an index exists). Once you can read the recipe, the fix is usually obvious.

EXPLAIN — read the plan

EXPLAIN <query> shows you the planner's chosen execution tree. EXPLAIN ANALYZE <query> runs the query and adds actual measured timings.

EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT u.name, COUNT(*)
FROM users u JOIN posts p ON p.user_id = u.id
WHERE u.country = 'GB'
GROUP BY u.name
ORDER BY COUNT(*) DESC
LIMIT 10;

Sample output (Postgres):

Limit  (cost=812..812 rows=10) (actual time=24.3..24.4 rows=10 loops=1)
  ->  Sort  (cost=812..820 rows=3210) (actual time=24.2..24.3 rows=3210)
        Sort Key: (count(*)) DESC
        ->  HashAggregate  (cost=...)
              ->  Hash Join  (cost=...)
                    Hash Cond: (p.user_id = u.id)
                    ->  Seq Scan on posts p  (cost=... rows=1000000)
                    ->  Hash  (cost=...)
                          ->  Index Scan using idx_users_country on users u
                                Index Cond: (country = 'GB'::text)
Planning Time: 0.3 ms
Execution Time: 24.6 ms

How to read it:

  1. Top-down is the order of nesting, not execution. The Limit node consumes from Sort, which consumes from HashAggregate, etc. Leaf nodes execute first.
  2. Each line shows (planner estimate) and (actual) counts and times. When they disagree wildly, the planner is wrong.
  3. Seq Scan = full table scan. Index Scan = used the index. Index Only Scan = the index alone has all needed columns. Bitmap Index Scan + Bitmap Heap Scan = build a bitmap of matching rows from one or more indexes, then fetch the rows.
  4. Joins: Hash Join (build hash of small side, probe by large), Merge Join (both sides sorted), Nested Loop (one row at a time — ouch on large data without an index).
  5. BUFFERS shows shared buffer hits / reads / dirties. High read = your data isn't cached.

The patterns to recognise

Seq Scan on a large table for a selective WHERE

->  Seq Scan on users u  (cost=0..150000 rows=100)
      Filter: (email = 'alice@example.com'::text)
      Rows Removed by Filter: 9999900

The planner read 10M rows to find 1. Fix: an index on email.

CREATE INDEX idx_users_email ON users (email);

After: Index Scan in single-digit ms.

Nested Loop with high loops=

->  Nested Loop  (cost=...)
    ->  Seq Scan on orders o
    ->  Index Scan on customers c (cost=... loops=1000000)

loops=1000000 means the inner side ran a million times. If the inner-side index isn't sized for it, the join is O(n × log m). For mid-sized joins this is fine; for big ones, push the planner toward Hash Join — usually by adding statistics or by ANALYZEing.

Estimate vs actual diverges

->  Index Scan  (cost=... rows=10) (actual rows=1000000)

The planner thought 10 rows; actually a million. It made a bad downstream choice (probably a Nested Loop) on this estimate. Fix: ANALYZE table_name; to refresh statistics. If statistics aren't enough, increase default_statistics_target for the column (Postgres) or create extended statistics.

Function in WHERE prevents index use

WHERE LOWER(email) = 'alice@example.com'    -- no index help

The planner can't use an index on email because the predicate is on LOWER(email). Two fixes:

-- Functional index
CREATE INDEX idx_users_lower_email ON users (LOWER(email));

-- Or, better, store data already-lowercased and skip LOWER() at query time.

Same trap with WHERE created_at::date = '2024-01-15' — the cast prevents index use. Rewrite to a range:

WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16'

The high-leverage rewrites

Correlated subquery → JOIN

-- BEFORE: inner subquery runs once per outer row
SELECT * FROM orders o
WHERE o.user_id IN (
  SELECT u.id FROM users u WHERE u.tier = 'premium'
);

-- AFTER: planner picks hash / merge / nested-loop
SELECT o.*
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.tier = 'premium';

Modern planners often handle this rewrite themselves, but not always. EXPLAIN both versions to confirm.

OR across columns → UNION of index scans

-- BEFORE: can't use a single composite index
SELECT * FROM tickets WHERE status = 'open' OR priority = 'high';

-- AFTER: each branch uses its own index, UNION dedups
SELECT * FROM tickets WHERE status   = 'open'
UNION
SELECT * FROM tickets WHERE priority = 'high';

Postgres can sometimes do this automatically via "Bitmap OR" — confirm in EXPLAIN.

SELECT * → only-needed-columns

-- BEFORE — pulls every column, including the giant `body` text
SELECT * FROM articles WHERE author_id = ?;

-- AFTER — covering index scan, no row visit
SELECT id, title, published_at FROM articles WHERE author_id = ?;

If the index covers all selected columns, Postgres can do an Index Only Scan — never touches the heap. 10–100× faster for narrow projections.

Pagination with OFFSET → keyset (cursor)

-- BEFORE — OFFSET 100000 reads + discards 100000 rows
SELECT * FROM events ORDER BY created_at DESC OFFSET 100000 LIMIT 50;

-- AFTER — keyset cursor, O(50) regardless of page
SELECT * FROM events
WHERE created_at < $cursor_created_at
ORDER BY created_at DESC LIMIT 50;

OFFSET is a foot-gun on large tables. Keyset cursors are the production-correct pattern.

LIKE 'foo%' works; LIKE '%foo%' doesn't

A leading-anchored LIKE can use a B-tree index. A floating one can't. Use a full-text index (GIN + tsvector) for substring search.

When the planner gets it wrong

The planner makes decisions based on statistics it gathers about each column: row count, distinct values, histogram of value distribution. Statistics get stale.

Symptoms:

  • EXPLAIN's rows= estimate is off by an order of magnitude from actual rows.
  • The plan has changed unexpectedly after a data load.
  • A query that was fast yesterday is slow today with no schema change.

Fixes:

  1. ANALYZE table_name; — recompute statistics. Cheap; no lock.
  2. VACUUM ANALYZE — same plus dead-tuple cleanup.
  3. Increase the histogram resolution: ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1000;
  4. Extended statistics for correlated columns: CREATE STATISTICS ON (col_a, col_b) FROM tbl; — when two columns are highly correlated, the planner otherwise multiplies independent selectivities and gets it wrong.
  5. Configure auto-vacuum so this happens automatically (it does by default in Postgres, but tune the thresholds).

Per-database flavour notes

  • Postgres: EXPLAIN (ANALYZE, BUFFERS) is the gold standard. pg_stat_statements shows query history with average time, calls, total time.
  • MySQL/MariaDB: EXPLAIN ANALYZE (8.0+); EXPLAIN FORMAT=TREE for tree shape.
  • SQLite: EXPLAIN QUERY PLAN — much simpler output.
  • MS SQL Server: SET STATISTICS PROFILE ON or graphical "Estimated / Actual Execution Plan" in SSMS.
  • Oracle: EXPLAIN PLAN FOR ...; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

The discipline transfers; the syntax is per-vendor.

A diagnostic loop

When a query is slow:

  1. Capture the actual SQL (don't tune what your ORM logs — get the real text it sent).
  2. EXPLAIN ANALYZE with realistic parameters.
  3. Read the plan top-down, watching for: Seq Scans on big tables, estimate≠actual, high loops on Nested Loop, function-in-WHERE patterns.
  4. If estimates are wrongANALYZE first.
  5. If a Seq Scan is the bottleneck → consider an index; check pg_stat_user_indexes for unused ones first to avoid bloat.
  6. If a Nested Loop is slow → ensure the inner side has a usable index, or push the planner toward Hash Join.
  7. If joins look reasonable but rows are still huge → consider rewriting the query, materialising a view, or adding a covering index.
  8. Re-EXPLAIN to confirm the plan changed and the cost dropped.

Most slow-query fixes follow this loop in 15 minutes. The hard ones are the other 5%.

Tools in the wild

5 tools
  • The reference docs; pair with `(ANALYZE, BUFFERS)` for actual-vs-estimated timings.

    spec
  • Visualises Postgres EXPLAIN output; highlights the slow rows.

    service
  • AI-assisted query-plan review with concrete recommendations.

    service
  • Logs the plan of any query exceeding a threshold; great for capturing prod outliers.

    library
  • MySQL's equivalent — the columns differ but the discipline is the same.

    spec