Group Anagrams
Prompt
We're deduping a big list of search terms, and we want terms that are rearrangements of the same letters bucketed together — "listen" and "silent" in one group. Given a list of strings, group the anagrams. Order of groups doesn't matter.
How this round runs
I haven't told you the alphabet or how long the strings get — clarify before coding, because that choice changes which key is cheaper, and I'll push on it once you have a working grouping.
Model answer
Clarify first: what's the character set — lowercase ASCII, full Unicode, case- sensitive? How long are the strings, and how many are there? Do we care about the group order or the order within a group?
The trick is a canonical key: two strings are anagrams exactly when some
normalized form matches. Easiest key is the sorted characters — "eat" and
"tea" both become "aet". Bucket into a hash map keyed by that. Sorting each
string is O(k log k), so total O(n * k log k) time, O(n * k) space.
function groupAnagrams(words) {
const groups = new Map();
for (const word of words) {
const key = [...word].sort().join("");
if (!groups.has(key)) groups.set(key, []);
groups.get(key).push(word);
}
return [...groups.values()];
}
If k is large, drop the sort: build a character-count key instead — a
26-slot tally for lowercase ASCII, serialized to a string. That's O(k) per word
and O(n * k) overall, beating the sort. The trade-off is the count key assumes a
small, known alphabet; the sort key works for any characters without that
assumption.
Dry-run on ["eat", "tea", "tan", "nat"]: keys aet, aet, ant, ant →
[["eat","tea"], ["tan","nat"]].
- Asked about the alphabet (ASCII vs Unicode) and string length before choosing a key
- Named the sorted-key approach and its O(n * k log k) cost
- Offered the count-key as an O(n * k) improvement when k is large
- Stated space as O(n * k) and dry-ran the bucketing
- The alphabet is huge (not 26 letters) → a fixed 26-slot array breaks; use a hashed count-map or fall back to the sort key
- Inputs are arbitrary Unicode → sort code points carefully (surrogate pairs / normalization), or normalize first then key