URL Decoder
Input
Output
Overview
URL decoding reverses percent-encoding, restoring the original bytes from “%XY” sequences and then decoding the resulting byte array as UTF-8 text. In form-encoded contexts, the plus sign (+) is also converted back to a space (0x20). This process recovers path segments, query parameter values, and form data for safe parsing and further processing. The decoder handles both standard and URL-safe variants, gracefully skipping or reporting invalid sequences depending on configuration.
What Is URL Decoding?
Each percent-encoded triplet (“%XY”) represents one byte. The decoder scans the input string, converts each pair of hex digits back into its numeric byte value, and then reconstructs the original text via UTF-8 decoding. Plus signs in query strings or form bodies translate back to spaces. By precisely reversing encoding rules, the decoder ensures that arbitrary binary or Unicode data survives round-trip conversion.
URL Components & Decoding Context
Different URL sections—paths, queries, fragments—may apply different decoding rules. For example, fragment identifiers may include literal spaces in some implementations, while query parameters use the plus-space convention. A context-aware decoder applies the correct rules per component, avoiding misinterpretation or data loss.
Percent-Encoding Grammar
Valid percent-encodings follow the pattern “%” + two hexadecimal digits (0–9, A–F, case-insensitive). The decoder normalizes uppercase and lowercase hex, ignores whitespace between sequences when configured, and rejects invalid patterns (e.g., “%G1” or stray “%” at end). Robust error reporting identifies the exact position of malformed sequences to aid debugging.
Handling Plus Signs (+)
In `application/x-www-form-urlencoded` data, spaces are commonly encoded as “+”. The decoder optionally treats plus signs as spaces, depending on the context flag. This ensures correct round-trip conversion for HTML forms and query strings, while preserving literal “+” characters in other URL components.
Character Sets & UTF-8
After reconstructing the raw byte array, the decoder decodes it as UTF-8, mapping multi-byte sequences back to Unicode code points. Invalid byte sequences trigger errors or replacement characters (�), depending on strictness settings. This preserves international text and complex scripts across URL encoding and decoding cycles.
How It Works
1. **Scan & Normalize**: Replace URL-safe variants and optional plus-to-space.
2. **Strip**: Remove whitespace if configured.
3. **Parse**: Extract all “%XY” sequences into bytes.
4. **Byte Buffer**: Assemble the resulting bytes.
5. **UTF-8 Decode**: Convert the buffer into a text string.
6. **Return**: Output the decoded text.
Use Cases
Web frameworks decode incoming request URLs and form bodies to extract parameters. Analytics pipelines parse query strings to aggregate usage data. Command-line tools decode URL-encoded logs from proxies and CDNs. In security research, decoding percent-encoded payloads reveals concealed attack vectors in HTTP requests or malicious URLs.
Common Pitfalls
Double-decoding can lead to security flaws, such as bypassing filters by encoding “%252F” to “%2F” and then “/”. Misinterpreting literal “+” as space or vice-versa corrupts data. Neglecting normalization of case in hex digits may break equality checks.
Performance Considerations
Decoding is linear in input length. Huge strings (>10MB) should be processed in a streaming fashion or offloaded to a Web Worker to avoid blocking. Memory copying can be optimized by decoding in place or using typed arrays.
Security Considerations
Always validate decoded output before using it in SQL queries or HTML contexts to avoid injection attacks. Reject invalid percent-encodings early to prevent parsing ambiguities. Limit input length to mitigate denial-of-service from pathological cases.
Best Practices
Decode each URL component separately (path, query key, query value, fragment) rather than applying a blanket decode on the full URL. Leverage built-in functions like `decodeURIComponent` for components and `decodeURI` for full URIs. Normalize outputs to prevent security bypasses.
Example
`Hello%2C%20world%21` → `Hello, world!`
`q=100%2525%2Bcoverage` (double-encoded) → after one decode: `q=100%25+coverage`, after second: `q=100%+coverage` → finally: `q=100% coverage`