← Bank
Project Deep-Dive

Leading a Major Refactor or Migration

Project Deep-DiveSenior~12m
refactoringrisk-managementleadership

Prompt

Tell me about a large refactor or migration you drove. Something with real blast radius — where getting it wrong would have hurt — and how you brought it in safely.

How this round runs

I'll follow your story and drill on risk: how you sequenced a big change so it stayed reversible, how you'd have known early if it was going wrong, and what you'd do differently now. Pick a migration you actually drove, not one you were assigned a slice of.

Model answer

Selection matters here especially: pick a migration you drove — you set the sequencing, owned the rollback story, made the call on when to advance — not one where you ported a few files of someone else's plan. Lead with the motivation as a risk, not a preference: "our custom token validation was hand-rolled crypto; one bad assumption and every session was forgeable", not "we wanted something cleaner". That framing tells the interviewer you migrate because the status quo is dangerous, which is when the work is justified.

The substance is risk management, and the spine is: make the big change a sequence of small reversible steps, each independently safe to ship and roll back. Run old and new in parallel before you cut over (dual-write, dual-read, or accept-either), so divergence is observable rather than catastrophic. Shift traffic behind a flag — a canary first, then a ramp — with the metric that tells you it's going wrong before customers do, and a kill switch that needs no deploy. The drill is "what would you do differently?" — answer with a specific de-risking lesson, not regret: maybe the dual-write window was too long and you'd have built the reconciliation check first, or you'd have picked a smaller blast-radius service as the canary. Close with the real outcome quantified — "5M users over six weeks, error rate flat, zero unplanned downtime" — and keep the decisions first-person.

Signals — what a strong answer shows
  • Chose a migration they drove — owned sequencing and the rollback story, not a slice of someone else's plan
  • Framed the motivation as a risk being retired, not a preference
  • Sequenced it into small reversible steps, each safe to ship and roll back alone
  • Ran old and new in parallel so divergence was observable, with a flag, canary, and kill switch
  • Named the early-warning metric and quantified the outcome
Follow-ups — where it goes next
  • 'How did you de-risk it?' → small reversible steps, parallel run, canary then ramp, a metric that warns before customers do, a no-deploy kill switch
  • 'How would you have known early if it was going wrong?' → the specific divergence or error-rate signal watched during the ramp
  • 'What would you do differently?' → a concrete de-risking change (build the reconciliation check first, smaller canary), not vague regret