Ticket Triage & Structured Troubleshooting
Priority vs severity, SLA clocks, and the triage questions that separate one-line tickets from hours of guesswork.
Ticket Triage & Structured Troubleshooting
Tickets are how a company's pain reaches IT. On a given day you might see a prod outage, a phishing report, a password reset, a printer on fire, and the CEO saying "email is broken." Triage is the job of turning that unordered stream into a queue where the right thing gets worked first and nothing slips through the SLA.
Severity vs priority — two different questions
The single biggest trap in triage is treating severity and priority as the same number. They're different questions:
- Severity answers how broken is the underlying system? S1 is prod down or data loss. S2 is a major degradation for a group. S3 is one user blocked with a workaround. S4 is cosmetic.
- Priority answers how fast must we respond? P0 is "page someone now." P1 is "within the hour." P2 is "today." P3 is "this week." P4 is "when we get around to it."
Same failure can land in different priority columns. If the CEO's email breaks at quarter-close with the board waiting for numbers, that's still technically S2 (one user's authentication path degraded) — but it's P0, because the business impact of doing nothing for even 20 minutes is enormous. If the same auth path breaks for an intern on their second day, the severity is unchanged and the priority is P3.
| S1 (prod down) | S2 (group degraded) | S3 (user blocked) | S4 (cosmetic) | |
|---|---|---|---|---|
| P0 (now) | Pager event | Exec-impact outage | CEO locked out | Very unusual |
| P1 (<1h) | Sev1 runbook | Security incident | Phishing triage | — |
| P2 (<4h) | Rare | Printer floor | VPN for a user | — |
| P3 (<1d) | — | — | Password reset | Wrong font |
| P4 (<2d) | — | — | — | Slow laptop |
The diagonal is the "expected" region; the off-diagonal cells are the judgement calls.
SLA clocks — first-response vs resolution
Every priority comes with an SLA (Service Level Agreement) — a contract with the reporter about when they'll hear back. Two separate clocks:
- First-response SLA — how quickly someone acknowledges the ticket. This is about being a good citizen; a silent queue is worse than a slow queue.
- Resolution SLA — how quickly the issue is actually fixed. Much harder to meet because it depends on the nature of the problem.
SLAs pause when the ticket is blocked on the user (waiting for information, asking them to try something). They resume when the user replies. A ticket that sits in "waiting on customer" for three days isn't an SLA breach — it's a customer who took three days to reply.
Common values in helpdesk systems:
- P0 → 15-minute first response
- P1 → 1-hour first response
- P2 → 4-hour first response, same-business-day resolution
- P3 → next-business-day first response
- P4 → informational, 48-hour acknowledgement
Intake questions — the triage interview
A triage interview is short and mechanical. Three questions almost always move a ticket forward:
- What changed? — new laptop, new app version, password rotation, upstream maintenance. 80% of tickets are caused by something that changed recently.
- When did it start? — maps to a deploy, an incident, a scheduled job. Bounds the investigation.
- Who is affected? — just you, your team, your floor, the whole company. Determines severity and sometimes priority in one question.
Missing one of these is the single biggest reason for "could you send more detail?" back-and-forth. Ask all three in your first reply.
Escalation — don't heroically hold a ticket you can't solve
If you've been on a ticket for longer than its first-response SLA without meaningful progress, escalate. Handing off a P1 at 55 minutes because you're stuck is a win for the customer — it's not a confession of failure. The rule of thumb: if the next 30 minutes of your work is as likely to solve it as someone else's, you're the right owner. If not, get help.
Escalation paths are usually written down in the runbook:
- Tier 1 (generalist helpdesk) → Tier 2 (specialist IT)
- Tier 2 → Vendor support (with a support contract number ready)
- Tier 2 → Engineering on-call (for product-side bugs)
Writing user-visible updates
Updates are not for you — they're for the reporter. Keep them short, factual, and cadenced:
"14:32 — We've reproduced the issue and are investigating the mail gateway. Next update at 14:50 regardless of progress."
Three things done right: concrete state, what we're doing, a commitment to the next update. No speculation. No "should be fixed soon." If the cadence you promised passes without news, post another update saying "still investigating, next update at …". Silence is worse than an honest "we don't know yet."
Real tools you'll see
- Jira Service Management (Atlassian) — queue + SLA automations.
- Zendesk — more customer-facing, strong macros and triggers.
- Freshservice — ITIL-shaped, competitive with Jira SM.
- ServiceNow — enterprise incumbent, deep integrations, heavy process.
They all map onto the same abstraction: a queue of tickets, a priority → SLA mapping, a severity field, a status/owner, a comment thread, and automation that pauses/resumes clocks.
Gotchas
- Prioritising by squeakiness, not impact. The loudest reporter isn't automatically the highest priority. A quiet P0 (one engineer seeing a weird crash that'll be everywhere in an hour) still outranks a loud P3.
- Re-prioritising silently. If you move a ticket from P1 to P3, tell the reporter why. Otherwise they think you're ignoring them.
- Missing the SLA pause. A ticket waiting on the user is not accruing SLA time — make sure your system's "waiting on customer" status is actually paused.
- Heroic debugging at the cost of the queue. One ticket is not worth breaching three.
Playground
Pick a ticket from the queue. Choose a priority and severity, name an owner, sketch the first action you'd take. Submit and read the score breakdown. The SLA clock is running — try to answer before it goes red.
Visualizer
The priority / severity matrix plots all ten seed tickets as faint ghost dots in a 4 × 5 grid. As you score them in the playground, your placements are overlaid in full colour so you can see how close you got — and where the expected answer disagrees with the obvious-looking cell.