Binary Search
The halving walk — find 1 in a million with 20 probes.
Binary Search
Sorted data hides a superpower. If your array is in order, you never have to scan it — you can teleport to the middle, check if your target is smaller or larger, and throw away half the array in one step. Keep halving and you've reduced a million-element search to twenty comparisons.
Analogy
Imagine flipping open a phonebook looking for "Parsons". You don't start at page 1 and read every name. You drop your thumb in the middle — say you land on "Mitchell". Parsons comes after Mitchell alphabetically, so the whole first half of the book is useless. Drop your thumb in the middle of what's left — "Roberts". Parsons comes before Roberts, so the back half is useless. Two probes in, you've already cut the book to a quarter. Three more and you're on the right page.
That's binary search. The only prerequisite is that the data is already sorted.
The halving argument
On each step, binary search cuts the candidate window in half. So after k steps, the window has size n / 2^k. The search terminates when the window shrinks to 1 (or to 0 for a miss). Solving n / 2^k = 1 for k gives k = log₂(n) — hence O(log n).
A log-factor doesn't sound dramatic until you plug in numbers:
n |
Linear scan (worst) | Binary search (worst) |
|---|---|---|
| 100 | 100 | 7 |
| 10,000 | 10,000 | 14 |
| 1,000,000 | 1,000,000 | 20 |
| 1,000,000,000 | one billion | 30 |
Every time you multiply n by 1000, binary search only needs about ten more probes. That's the entire value proposition of "sorted + binary-searchable" as a data-structure shape — it's why databases build B-tree indexes, why std::map is a tree, why sorted arrays outperform hash tables for range queries.
Tracing binary search by hand
Take the sorted array [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31] (n = 16) and search for 19:
| Step | lo |
hi |
mid |
arr[mid] |
Decision | Next window |
|---|---|---|---|---|---|---|
| 1 | 0 | 15 | 7 | 15 | target > 15 → go right | [8, 15] |
| 2 | 8 | 15 | 11 | 23 | target < 23 → go left | [8, 10] |
| 3 | 8 | 10 | 9 | 19 | match! | return 9 |
Three probes to find one element in 16. Log₂(16) = 4, so we had room for one more. The pattern holds: no matter where 19 lives in a size-16 array, at most 4 probes find it.
The canonical loop (and its traps)
let lo = 0;
let hi = arr.length - 1;
while (lo <= hi) {
const mid = lo + ((hi - lo) >> 1);
if (arr[mid] === target) return mid;
if (arr[mid] < target) lo = mid + 1;
else hi = mid - 1;
}
return -1; // not found
Three things in that snippet that look trivial but are actually the entire lesson:
lo <= hi vs lo < hi. The inclusive form is the one you want when hi = length - 1; the strict form is the one you want when hi = length (a half-open convention some languages prefer). Mixing the two is the classic off-by-one that produces either an infinite loop or a missed last element.
lo + ((hi - lo) >> 1) instead of (lo + hi) / 2. The naive form can overflow when lo and hi are both large 32-bit ints. In 2006, Joshua Bloch pointed out this bug lived for nine years in the official Java Arrays.binarySearch. Write the overflow-safe form by reflex; it's no harder to read once you know what it's for.
lo = mid + 1 / hi = mid - 1, not lo = mid / hi = mid. When you've already compared arr[mid] and it wasn't the target, mid can't be the answer — so exclude it from the next window. Forgetting the ± 1 is the other classic off-by-one: the window stops shrinking and the loop runs forever.
Variants you'll hit in practice
"Does the value exist" is the textbook version. The versions you actually use in production are these:
| Variant | Returns | Use case |
|---|---|---|
| Exact match | index of target, or −1 | Dictionary lookup, "is this value present" |
| Lower bound | first index where arr[i] ≥ target |
Insert point; "first score that qualifies" |
| Upper bound | first index where arr[i] > target |
Insert-after point; half-open range end |
| Rotated array | index after an unknown pivot rotation | Classic interview question, real in log-shifted sorted buffers |
| First-bad-version | first i where pred(i) === true |
Git bisect, version rollouts, release probes |
| Answer search | smallest parameter satisfying a predicate | "What's the smallest capacity that fits?" / "What's the minimum speed?" — the predicate is monotone |
The last two are the most general: binary-search-over-a-predicate. You're not searching an array at all; you're searching a function that returns true after some point and false before. git bisect is literally this — bisecting commit history over "does this commit have the bug".
What standard libraries ship
Most languages don't ship naive binary search either — they ship variants that return insertion points, because those are strictly more useful:
| Language / lib | API | What it returns |
|---|---|---|
Python bisect |
bisect_left, bisect_right |
Lower/upper bound insertion point |
C++ <algorithm> |
std::lower_bound, std::upper_bound, std::binary_search |
Insertion iterator; binary_search returns bool only |
| Java | Arrays.binarySearch(arr, key) |
Negative value encodes insertion point when miss (-(ip) - 1) |
| Go | sort.Search(n, pred) |
Smallest i where pred(i) is true (predicate form) |
| Rust | slice::binary_search |
Result<usize, usize> — index on hit, insertion point on miss |
| JavaScript | — (no stdlib) | Write your own; Lodash has sortedIndex |
JavaScript's omission is notable — the stdlib has sort but no binary search. That's why you'll see hand-rolled binary searches throughout production JS codebases.
When binary search breaks
Binary search has a single assumption: the array is sorted according to the comparison function you're using. When that breaks, everything breaks:
- Unsorted input. Binary search on unsorted data returns nonsense — no errors, no warnings, just a wrong answer. Always confirm your data is actually sorted, especially when it's almost sorted (a single out-of-place element is enough).
- Duplicates + "any match is fine" vs "leftmost match" vs "rightmost match". The simple loop above returns some index with the matching value, but not necessarily the first or last occurrence. If you need a specific duplicate, you need
lower_bound/upper_bound. - Approximate / "closest" matches. The basic loop only finds exact hits. To find "the closest value ≤ target" you need to track the best candidate as the window shrinks; it's still O(log n), but the code is no longer a one-liner.
- Non-comparable keys. Binary search needs a total order. Objects with multiple fields need an explicit comparator that imposes one; dicts/objects with unordered keys aren't searchable at all.
Common gotchas
- Integer overflow in
mid. If you're ever working in a language with fixed-width ints (Go, Rust, C, C++, Javaint), uselo + (hi - lo) / 2. JS numbers don't have this problem for arrays but you should write it the safe way out of habit. - Search on a wrong comparator. Sorting by
(a, b) => a - band then searching with===works. Sorting by any non-numeric comparator and searching with===may or may not work depending on whether===agrees with the total order. Use the same comparator for both phases. - Binary searching a
SetorMap. Hash-backed collections have no order. Binary search needs an array (or tree) view. Don't be the person who callsbinarySearch(new Set([...]))and wonders why it's broken. - The mid index in an empty window. When
hi = lo - 1(empty window), the loop conditionlo <= hiis false, so no probe fires. That's the correct behavior — miss returns-1— but it's worth checking once by hand when implementing a variant.
Playground
Pick a target with the slider or the buttons, press Play, and watch the window close in. The [lo, hi] range is shaded; the current mid is painted in var(--color-accent). Every probe cuts the shaded region in half. On a hit, the array flashes var(--color-success); on a miss, var(--color-warning). The step counter at the bottom is the one the challenge asks about.
Visualizer
The Halving Window panel plots "how many probes to find one element in n" for both linear scan (n) and binary search (log₂ n). Toggle the scale to crank n up to one million: the linear curve runs off the top of the chart; the binary curve is still a modest slope at 20. That picture — the exponential gap between linear and log — is the intuition to carry out of this level.