URL Encoder

Input

Output

Overview

URL encoding—also known as percent-encoding—is the process of transforming arbitrary text into a representation that can be safely included in URLs and HTTP query strings. Since URLs can only contain a limited set of characters (letters, digits, and a few punctuation marks), any character outside this “unreserved” set must be converted. The encoder replaces each unsafe byte with a percent sign (%) followed by two hexadecimal digits corresponding to the byte’s value. When `spaces: true`, the encoder may insert spaces or line breaks between encoded blocks to aid readability in logs or documentation.

What Is URL Encoding?

At its core, URL encoding maps bytes to ASCII text. Each non-unreserved byte is represented as “%XY”, where “XY” is the two-digit hex code (00–FF). Unreserved characters—namely A–Z, a–z, 0–9, -, _, ., and ~—remain unchanged. This ensures that reserved characters (/, ?, &, +, =) and non-ASCII symbols (spaces, punctuation, emoji) do not break the structure of URLs or lead to ambiguous parsing by servers and browsers.

URL Components & Encoding Context

Different parts of a URL have different encoding rules. For example, path segments may allow slashes (/) but must escape spaces and non-ASCII, query parameter values must escape &, =, and + differently, and fragment identifiers have their own set of reserved characters. A robust encoder applies context-aware rules—percent-encoding exactly what is necessary for each component—preventing over- or under-encoding, which can lead to broken links or security issues.

Character Sets & Unicode

URL encoding always operates on bytes—not characters—so the input string is first converted to a UTF-8 byte sequence. Multi-byte characters (e.g., emoji or non-Latin scripts) become multiple percent-encoded bytes. For instance, the Euro sign (€) is U+20AC in Unicode, which in UTF-8 is the byte sequence E2 82 AC, and thus is encoded as “%E2%82%AC”. This preserves the full fidelity of international text in URL contexts.

Reserved vs. Unreserved Characters

Per RFC 3986, unreserved characters (A–Z, a–z, 0–9, -, _, ., ~) do not require encoding. Reserved characters—gen-delims (/, ?, #, :, @, [, ], @) and sub-delims (!, $, &, ', (, ), *, +, ,, ;, =) —may carry special meaning in URLs. The encoder selectively escapes only those reserved characters that would otherwise be interpreted by servers or browsers, maintaining proper URL semantics while protecting against injection or parsing errors.

How It Works

1. **UTF-8 Conversion**: Transform the input string into bytes. 2. **Byte Iteration**: For each byte, check if it’s in the unreserved set. 3. **Percent-Encode**: If not unreserved, convert the byte to its two-digit uppercase hex representation prefixed with “%”. 4. **Formatting**: Insert spaces or line breaks between blocks if `spaces: true`. 5. **Concatenate**: Build and return the final encoded string.

Configuration & Options

The encoder supports customization such as: choosing ‘+’ instead of “%20” for spaces in form-encoded contexts, controlling line-length breaks (commonly at 76 characters), toggling URL-safe alphabets, and specifying which additional characters to exempt from encoding. Advanced settings allow preserving case-sensitivity or normalizing percent-encoded sequences for consistency.

Use Cases

Developers construct safe query strings, embed user input in route parameters, and generate analytics tags without risking malformed URLs. Embedding small images or fonts in CSS via Data URIs leverages Base64, but URL-encoding is crucial for path and query segments. In serverless or microservice architectures, encoded URLs are passed between functions and APIs, ensuring data integrity across heterogeneous environments.

Common Pitfalls

Double-encoding occurs when an already encoded URL is encoded again, turning “%20” into “%2520”, which breaks links. Forgetting to encode reserved characters may lead to injection attacks (e.g., open redirects) or truncated queries. Using `encodeURIComponent` versus `encodeURI` in JavaScript highlights this difference: the former encodes all reserved characters, the latter preserves URI delimiters.

Performance Considerations

Encoding is a linear-time operation with respect to input length. For very large payloads (tens of megabytes), performing encoding on the main thread can block the UI. Offloading to Web Workers or chunked encoding reduces latency and avoids freezing the user interface in browser contexts.

Security Considerations

Proper encoding prevents URL injection and cross-site scripting (XSS) via malicious query parameters. Always percent-encode user input before concatenating into URLs or HTML attributes. Beware of Unicode normalization attacks, where visually similar characters bypass simple filter rules—normalize inputs to NFC or NFD before encoding.

Best Practices

Encode each URL component separately rather than the entire URL string. Use well-tested libraries or built-in browser functions like `encodeURIComponent` for components and `encodeURI` for full URIs. Validate inputs and enforce maximum length to mitigate denial-of-service from overly long or malicious strings.

Example

`Hello, world!` → `Hello%2C%20world%21` `/search?q=100%25+coverage` → `%2Fsearch%3Fq%3D100%2525%2Bcoverage`