← Bank
DSA

Group Anagrams

DSAMid~12m
hash-mapsortingstrings

Prompt

We're deduping a big list of search terms, and we want terms that are rearrangements of the same letters bucketed together — "listen" and "silent" in one group. Given a list of strings, group the anagrams. Order of groups doesn't matter.

How this round runs

I haven't told you the alphabet or how long the strings get — clarify before coding, because that choice changes which key is cheaper, and I'll push on it once you have a working grouping.

Model answer

Clarify first: what's the character set — lowercase ASCII, full Unicode, case- sensitive? How long are the strings, and how many are there? Do we care about the group order or the order within a group?

The trick is a canonical key: two strings are anagrams exactly when some normalized form matches. Easiest key is the sorted characters"eat" and "tea" both become "aet". Bucket into a hash map keyed by that. Sorting each string is O(k log k), so total O(n * k log k) time, O(n * k) space.

function groupAnagrams(words) {
  const groups = new Map();
  for (const word of words) {
    const key = [...word].sort().join("");
    if (!groups.has(key)) groups.set(key, []);
    groups.get(key).push(word);
  }
  return [...groups.values()];
}

If k is large, drop the sort: build a character-count key instead — a 26-slot tally for lowercase ASCII, serialized to a string. That's O(k) per word and O(n * k) overall, beating the sort. The trade-off is the count key assumes a small, known alphabet; the sort key works for any characters without that assumption.

Dry-run on ["eat", "tea", "tan", "nat"]: keys aet, aet, ant, ant[["eat","tea"], ["tan","nat"]].

Signals — what a strong answer shows
  • Asked about the alphabet (ASCII vs Unicode) and string length before choosing a key
  • Named the sorted-key approach and its O(n * k log k) cost
  • Offered the count-key as an O(n * k) improvement when k is large
  • Stated space as O(n * k) and dry-ran the bucketing
Follow-ups — where it goes next
  • The alphabet is huge (not 26 letters) → a fixed 26-slot array breaks; use a hashed count-map or fall back to the sort key
  • Inputs are arbitrary Unicode → sort code points carefully (surrogate pairs / normalization), or normalize first then key