Encoding
URL Encoding Rules — encodeURI vs encodeURIComponent and When Each Matters
URL encoding is simple until you build a URL from untrusted input. The rules behind percent-encoding, the two JavaScript functions, and the edge cases to actually care about.
URL encoding is the kind of topic that seems trivial until production traffic proves otherwise. Every team I have worked with has had at least one outage from a missing or double encodeURIComponent. This guide is the full mental model, plus the JavaScript-specific gotchas.
The one-line rule
If you are inserting a dynamic value into a URL — query parameter, path segment, fragment — run it through encodeURIComponent. If you are handed a complete URL you need to make safe to transmit, run it through encodeURI. Most bugs come from using the wrong one.
What percent-encoding is
RFC 3986 defines two character categories: reserved and unreserved.
Unreserved characters never need encoding: A-Z a-z 0-9 - _ . ~. Everything else is either reserved (has structural meaning in URLs: / ? # [ ] @ ! $ & ' ( ) * + , ; =) or outside ASCII (needs encoding because URLs are ASCII by spec).
Percent-encoding replaces a byte with %XX where XX is its hex value. space (0x20) becomes %20. é in UTF-8 is two bytes 0xC3 0xA9 and encodes to %C3%A9. Yes — URL encoding is UTF-8 by modern convention. Latin-1 URL encoding is a ghost story from the 1990s.
| Character | Category | Encoded |
|---|---|---|
a-z, A-Z, 0-9 | Unreserved | never |
-, _, ., ~ | Unreserved | never |
!, *, ', (, ) | Sub-delims (reserved) | sometimes |
/, ?, #, [, ] | Gen-delims (reserved) | in query/path values |
| space | Other | %20 in paths, + in form data |
é (U+00E9) | Non-ASCII | %C3%A9 in UTF-8 |
The “sometimes” row is what the two JavaScript functions disagree about.
encodeURI vs encodeURIComponent
JavaScript (and most languages) give you two functions. They differ in exactly one thing: which reserved characters get encoded.
encodeURIleaves structural characters alone::/?#[]@!$&'()*+,;=. It is designed to be called on a full URL string where those characters still carry their URL-structural meaning.encodeURIComponentencodes almost everything except the unreserved set. It is designed to be called on a single fragment (a query value, a path segment) that should NOT have any structural meaning.
const value = "a/b?c=d&e f";
encodeURI(value);
// "a/b?c=d&e%20f" -- / ? = & survive, space encoded
encodeURIComponent(value);
// "a%2Fb%3Fc%3Dd%26e%20f" -- everything unsafe encoded
The correct call depends on whether the string is a full URL or a fragment:
// Full URL (probably typed by a user, or a URL template already)
const url = "https://example.com/search?q=hello world";
fetch(encodeURI(url));
// → https://example.com/search?q=hello%20world
// Building a query string
const query = "cats & dogs";
const url2 = `https://example.com/search?q=${encodeURIComponent(query)}`;
// → https://example.com/search?q=cats%20%26%20dogs
The second case is the one teams mess up. Using encodeURI(query) would leave the & intact and the server would interpret your query as two parameters.
Path segment vs query — subtle but real
Modern best practice is to use URL and URLSearchParams instead of manually concatenating:
const url = new URL("https://example.com/search");
url.searchParams.set("q", "cats & dogs");
url.searchParams.set("since", "2026-04-17");
url.toString();
// "https://example.com/search?q=cats+%26+dogs&since=2026-04-17"
Note that URLSearchParams encodes space as +, not %20. That is because it implements application/x-www-form-urlencoded (the HTML form encoding), not RFC 3986 path encoding. Both are valid in query strings, and servers handle either. But if you are building a non-query portion of a URL, stick with %20.
For path segments, there is no URLPathParams equivalent. Build them by hand with encodeURIComponent:
const id = "café/au/lait";
const url = `/items/${encodeURIComponent(id)}`;
// → /items/caf%C3%A9%2Fau%2Flait
If you want human-readable slugs rather than raw percent-encoded values, you want a different tool — the slug generator turns arbitrary text into URL-safe kebab-case.
The double-encoding trap
Running encodeURIComponent twice is not idempotent.
encodeURIComponent("hello world");
// "hello%20world"
encodeURIComponent("hello%20world");
// "hello%2520world" -- % itself got encoded
This bites when a value passes through two layers that both think they own the encoding. Classic cases: a redirect URL embedded as a query parameter, or a form submission that already encoded its values and then the framework encodes them again.
Rule: encode once, at the boundary where you build the URL. Do not encode at the source of the data. Decode on receive. If your value is already encoded when it arrives in your code, decode it before building a new URL.
Form data — the + surprise
HTML forms encode bodies differently from URL paths. A form submission of hello world sends hello+world, not hello%20world. Both decode to the same bytes on a correct server, but if you are reading a raw form body with a string replace or regex, you have to handle + → space.
function decodeForm(str) {
return decodeURIComponent(str.replace(/\+/g, "%20"));
}
The URLSearchParams constructor does this for you automatically when you hand it document.body or an application/x-www-form-urlencoded string.
Unicode — trust UTF-8
Three facts that cover 95% of real bugs:
- Modern URLs are UTF-8. The browser encodes
éin a query string as%C3%A9. Your server should decode%C3%A9as UTF-8 bytes, not Latin-1. - IRIs (Internationalized Resource Identifiers) let you type
https://example.com/caféin the address bar. The browser converts tohttps://example.com/caf%C3%A9before sending. Your server receives the encoded form. - Hostnames use a separate encoding (Punycode) for non-ASCII domain names.
café.combecomesxn--caf-dma.com. Percent-encoding does not apply to hostnames — ever.
Regex and URL encoding together
If you are building regex patterns against URL-encoded data, remember that percent sequences are three characters (% + two hex digits). You can match encoded characters with %[0-9A-Fa-f]{2}. For everything else regex-related, the regex cheatsheet has the operators and flags.
For the adjacent Base64 encoding that comes up in similar conversations (JWTs, data URIs, embedded binary), see the Base64 in production guide.
Takeaways
encodeURIComponent for fragments (query values, path segments). encodeURI for whole URLs that still contain structural characters. Use URL and URLSearchParams when you can — they handle the edge cases. Encode once, at the boundary. Trust UTF-8 for Unicode. When you see %2520 in a log, you have encoded twice.