bash · level 7

Job Control & Traps

& background, wait, jobs, fg/bg, trap on EXIT/ERR/INT — making your scripts behave like good UNIX citizens.

175 XP

Job Control & Traps

Bash inherits UNIX's process model: every command is a process, processes can be backgrounded, signals can interrupt them, and the parent shell can wait on them. Master this small set of mechanisms and you can write scripts that handle Ctrl-C gracefully, parallelise work, and never leak resources.

Foreground vs background

Append & to run a command in the background:

sleep 30 &                 # detaches; you get the prompt back immediately
echo "spawned: $!"         # $! is the PID of the most-recent background job

The job is in the same process group as your shell, but you're not waiting for it. The PID lands in $! for one command — capture it now if you need it later.

wait — synchronise

sleep 1 &
sleep 2 &
sleep 3 &
wait                       # blocks until ALL background jobs finish
echo "all done"

wait (no args) waits for everything. wait $pid waits for one specific PID and returns its exit code. wait -n (bash 4.3+) waits for the next job to finish (any of them).

The classic parallel-fan-out pattern:

pids=()
for url in "${urls[@]}"; do
  curl -fsS "$url" > "out-${url//[^a-z]/_}.txt" &
  pids+=($!)
done

for pid in "${pids[@]}"; do
  wait "$pid" || echo "PID $pid failed" >&2
done

jobs, fg, bg — interactive controls

In an interactive shell:

sleep 100 &
sleep 200 &
jobs                        # [1]+ Running   sleep 100 &
                            # [2]- Running   sleep 200 &

fg %2                       # bring job 2 to foreground
# Ctrl-Z (suspend)          # → SIGTSTP, returns shell prompt
bg %2                       # resume in background
kill %1                     # signal job 1

In scripts you usually don't need this — you spawn with &, capture $!, and wait.

nohup and disown

To survive logout:

nohup long-job > out.log 2>&1 &
disown                      # remove from shell's job table

nohup makes SIGHUP (sent on terminal close) a no-op. disown removes the job from the shell's tracking, so even without nohup it won't be HUP'd on shell exit. Use both for "fire and forget".

A modern alternative for long-running interactive sessions: tmux or screen. Either lets you reattach a session from another terminal — far better than nohup for anything you'd want to come back to.

Signals — the OS interrupt mechanism

Signals are POSIX's way for the kernel (or other processes) to interrupt a running process. The ones you care about:

Signal Number Default action Catchable Sent by
SIGINT 2 terminate yes Ctrl-C
SIGTERM 15 terminate yes kill PID, polite shutdown
SIGKILL 9 terminate NO kill -9 PID (last resort)
SIGHUP 1 terminate yes terminal close, also "reload config" idiom
SIGUSR1 / SIGUSR2 10 / 12 terminate yes user-defined
SIGSTOP 19 suspend NO kill -STOP PID
SIGTSTP 20 suspend yes Ctrl-Z
SIGCONT 18 resume yes bg, fg
SIGCHLD 17 ignored yes when a child exits

Bash convention: a script killed by signal N exits with code 128+N. Ctrl-C → exit 130, SIGTERM → 143.

trap — handle signals and exit

trap '<command>' SIGNAL [SIGNAL ...]

The most important pattern — cleanup on any exit:

WORKDIR=$(mktemp -d)
trap 'rm -rf "$WORKDIR"' EXIT

# ... work that might fail or be interrupted ...

EXIT is a pseudo-signal. It fires on every exit path: normal completion, exit N, error, signal. The cleanup runs once.

Add INT TERM to also clean up if the process gets killed by Ctrl-C or the OOM killer:

trap 'rm -rf "$WORKDIR"; exit 130' INT
trap 'rm -rf "$WORKDIR"; exit 143' TERM
trap 'rm -rf "$WORKDIR"' EXIT

…but actually, with the EXIT trap registered, INT and TERM also fire EXIT on their way out, so the simpler form is:

trap 'rm -rf "$WORKDIR"' EXIT
trap 'exit 130' INT
trap 'exit 143' TERM

The exit-code traps tell bash what code to return; the EXIT trap does the cleanup.

Single quotes vs double quotes in trap

trap "rm -rf $WORKDIR" EXIT       # ❌ $WORKDIR expands NOW (at trap-set time)
trap 'rm -rf "$WORKDIR"' EXIT     # ✅ $WORKDIR expands LATER (at trap-fire time)

Always use single quotes in trap commands unless you specifically want immediate expansion. The double-quote form bakes in whatever value $WORKDIR had at the time you called trap.

Trap pseudo-signals

Pseudo-signal Fires when
EXIT the shell exits, for any reason
ERR any command fails (with set -e semantics — same exemptions)
DEBUG before every command
RETURN a function returns

trap '<cleanup>' ERR is useful for error logging:

trap 'echo "error at line $LINENO" >&2' ERR

Reaping zombies

When a child process exits, it becomes a "zombie" in the kernel until the parent calls wait on it. Bash automatically reaps children when you use wait. If you don't, the OS cleans them up when your shell exits — but in container entrypoints, you can accumulate zombies forever.

If your bash script is a container entrypoint that spawns long-lived children, either:

  • Call wait after each spawn (synchronous), or
  • Use tini or dumb-init as the actual PID 1 and have your bash run under it.

tini -- ./entrypoint.sh is the standard.

Backgrounded subshells and signal propagation

Backgrounded subshells don't receive SIGINT from the parent's terminal:

sleep 100 &
# Ctrl-C only kills the foreground (your shell, but it ignores INT interactively)
# The sleep keeps running until SIGTERM/SIGKILL.

In a script with set -e, a backgrounded job's failure does NOT abort the parent — you have to wait $pid and check the exit code. This is a common surprise.

Locking and flock

For "only one instance at a time":

exec 9> /var/lock/myscript.lock
flock -n 9 || { echo "already running"; exit 1; }
trap 'rm -f /var/lock/myscript.lock' EXIT

# ... script body ...

flock is the right answer for inter-process locking. Don't roll your own with PID files — they get stale, race, and miss kills.

Bash vs zsh

Job control is essentially identical: &, wait, jobs, fg, bg, disown. Differences:

  • zsh's wait supports negative arguments to wait for jobs by job-spec (wait %1), bash needs the PID.
  • zsh has coproc (coprocess) — bidirectional pipe to a backgrounded command. Bash also has coproc (4+) but the syntax differs.
  • zsh's TRAPxxx functionsTRAPINT() { ... } is an alternative to trap '...' INT and feels nicer if you have a function.
  • Default behaviour: zsh by default warns when you exit with running jobs ("you have running jobs"). Bash silently sends SIGHUP unless the huponexit shopt is off.

Common bugs

Forgot the trap. Script crashes, leaks /tmp/tmp.XXXX. Fix: trap-and-rm at top of script.

Double-quoted trap with $WORKDIR. The variable expanded at trap-set time, capturing the empty string before $WORKDIR=$(mktemp -d) ran. Fix: use single quotes.

Backgrounded job's exit silently ignored. set -e doesn't abort on background failures. Fix: wait $pid || die "background job failed".

Trying to trap SIGKILL. It's uncatchable by design (so misbehaving processes can always be killed). Trap SIGTERM and design your handler to be quick.

Race between trap and concurrent jobs. EXIT runs once, after everything else. Background children that haven't been waited on may still be running when EXIT fires. wait first, then clean up.

trap '' SIGNAL — empty command IGNORES the signal entirely. Different from trap - SIGNAL which RESETS to default. The empty-string form is what nohup does internally for SIGHUP.

Tools in the wild

3 tools
  • tinifree tier

    Tiny init for containers. Reaps zombies and forwards signals; pairs with bash entrypoints.

    library
  • supervisordfree tier

    When your bash entry script becomes a process supervisor, it's time for supervisord (or systemd).

    library
  • pstree / htopfree tier

    Visualise the job tree your script spawned. Indispensable for diagnosing zombies and orphan processes.

    cli