Percent-Encoding, In Depth
form vs path vs query · the double-encoding trap.
Percent-Encoding, In Depth
You already know %20 is space. The deeper question is which %20 — because percent-encoding is not one rule but a family of overlapping rules, and using the wrong one is how authentication bypasses, path-traversal vulnerabilities, and silently-corrupted form data are born.
Three escape sets you meet every day
When you write a URL like https://example.com/search?q=hello world, the question "is the space allowed here?" has three answers depending on where the space appears.
1. The path
Inside a path component (the /... part), the reserved set is /?#. Most other ASCII characters are legal literally. Spaces are not — they must be %20. The plus sign is just a literal +.
GET /docs/quick%20start HTTP/1.1
2. The query string
Inside a query string, the intent is key=value pairs separated by &. To deliver "hello world" as a value:
- Spec-strict (RFC 3986): use
%20. - Form-style (HTML form submissions): use
+.
The two are interchangeable as long as the consumer knows which one you're using. This is the source of more bugs than I can count.
3. application/x-www-form-urlencoded (the form body)
This is the encoding HTML forms produce when submitted with <form method=post enctype="application/x-www-form-urlencoded">. It's also what most APIs accept as request bodies for token endpoints (OAuth 2 demands it). The rules:
- Space →
+. - Reserved + non-ASCII →
%XX. - Plus sign →
%2B(because plain+is space).
POST /login HTTP/1.1
Content-Type: application/x-www-form-urlencoded
email=alice%40example.com&password=p%2Bword
Side-by-side
| Input | RFC 3986 path | RFC 3986 query | form-urlencoded |
|---|---|---|---|
(space) |
%20 |
%20 |
+ |
+ |
+ |
+ |
%2B |
& |
& (reserved sub-delim, but commonly encoded inside values) |
%26 |
%26 |
= |
= |
%3D |
%3D |
/ |
/ (delimiter) |
/ |
%2F |
é |
%C3%A9 |
%C3%A9 |
%C3%A9 |
The "common" parts: percent-encode any non-ASCII byte using its UTF-8 representation. The differences come down to which delimiters need escaping in which context.
The "right" function for the job
JavaScript exposes three encoders, and picking the wrong one is the cause of most URL bugs.
// encodeURI — preserves URL delimiters. Use when wrapping a whole URL.
encodeURI("https://example.com/p?q=hello world&x=1");
// → "https://example.com/p?q=hello%20world&x=1"
// note: '?', '=', '&' are PRESERVED — they're URL syntax.
// encodeURIComponent — escapes everything that has special meaning in URLs.
// Use for query VALUES.
encodeURIComponent("hello world&x=1");
// → "hello%20world%26x%3D1"
// URLSearchParams — form-style: '+' for space.
new URLSearchParams({q: "hello world"}).toString();
// → "q=hello+world"
Rule of thumb:
- Building a query value?
encodeURIComponentorURLSearchParams. - Wrapping an already-formed URL (e.g., for a redirect)?
encodeURI. - Building a form body?
URLSearchParams(orqs.stringifyin Node).
The double-encoding trap
A defender writes a filter:
if (req.url.includes("../")) return res.status(403).end();
An attacker sends:
GET /static/..%252Fadmin/secret.txt HTTP/1.1
The framework decodes the URL once: ..%2Fadmin arrives at the handler. The handler reads the path, sees ..%2F, and (because filesystem APIs decode percent escapes too) the OS opens ../admin/secret.txt. The first decode turned %252F into %2F; the second decode turned %2F into /.
This is double encoding. The fix is two-fold:
- Validate after decoding fully. Decode in a loop until the value is stable, then run path-traversal checks.
- Don't decode inputs more than the protocol mandates. A web framework should decode percent-escapes once. Application code should never decode again.
Variants of this attack break login filters (adm%6Cn → admin), SSRF mitigations (http%253A//169.254.169.254), and content-type sniffing (%2E%2E%2F for ../).
Common pitfalls
- Encoding
:and@. They have special meaning inuserinfo@host:portparts, but in paths and queries they're legal literally. Over-escaping doesn't break anything but produces ugly URLs. - Encoding
/inside path segments.%2Fis not equivalent to/for path-routing. Some servers treat/files%2Fadminand/files/admindifferently. Apache'sAllowEncodedSlashesis off by default for exactly this reason. - Mixing encodings. A common bug: build a query string with
URLSearchParams, then runencodeURIover it. Now spaces are double-encoded —+becomes%2B. Pick one and stop. - Not normalising before comparing.
HelloandHell%6fdecode to the same string. If you cache or compare URLs, decode first.
Mental model
Percent-encoding is bytes-to-text-safely. It encodes one byte (0x00–0xFF) as three characters (%XX). Multi-byte characters (UTF-8) become multiple %XX escapes — é (UTF-8: C3 A9) becomes %C3%A9.
Three contexts, three escape sets, but one underlying mechanism. Pick the right context, never decode twice, and prefer the standard library encoder over hand-rolled escaping.
Tools in the wild
4 tools- libraryURLSearchParams (Web)free tier
Browser- and Node-built-in form-urlencoded encoder/decoder. Always prefer over hand-rolled escaping.
- libraryurllib.parsefree tier
Python stdlib — quote/quote_plus/unquote/unquote_plus + urlencode. Three contexts, three functions.
- service
Pen-tester's tool for nested encoding/decoding — useful for hunting double-encoding bypasses.
- serviceZAPfree tier
Open-source web app scanner — automatically tries double-encoded payloads against bypass filters.