Question 1

Why do my decoder results look like garbage?

Accepted Answer

Three common causes: (a) bit-grouping mismatch — the input is a continuous stream of 1s and 0s but the decoder split it into 7-bit groups when the source was 8-bit (or vice versa); (b) encoding mismatch — the input is base64 but the decoder treats it as raw binary, or the input is binary representing UTF-8 bytes but the decoder assumes ASCII; (c) padding error — base64 inputs missing the trailing '=' padding bytes confuse strict decoders. Verify the source format first (binary vs base64 vs hex), then byte-align (8 bits per character per ASA X3.4-1963).

Question 2

Why is base64 33% larger than the original binary?

Accepted Answer

Base64 (RFC 4648) encodes every 3 input bytes (24 bits) as 4 output characters (each carrying 6 bits = 24 bits total). The encoding alphabet is 64 printable ASCII characters (A–Z, a–z, 0–9, +, /), which fits in 6 bits per character. Result: 4 output chars per 3 input bytes = 33.3% expansion, plus '=' padding adds up to 2 chars at the end. Base32 expands +60%, base16 (hex) +100%. The trade-off is intentional: a smaller alphabet produces larger output but transports cleanly through case-folding, URL-encoding, and other channel constraints.

Question 3

What's the difference between binary, hex, and base64 encodings?

Accepted Answer

All three are binary-to-text encodings (lossless). Binary (base2) lists each byte as 8 ones and zeros — 8× size, fully readable bits. Hex (base16, RFC 4648) groups every 4 bits as 0–9/A–F — 2× size, readable for short hashes, case-insensitive by spec. Base64 (RFC 4648) packs 6 bits per character — 1.33× size, compact for longer payloads (PEM keys, email attachments via MIME RFC 2045, data URIs). Choose by transport constraint: binary for human-readable bit inspection, hex for short identifiers, base64 for compact transport over text channels.

Question 4

How do I detect what encoding a binary stream uses?

Accepted Answer

Several heuristics: (a) BOM (Byte Order Mark) at the stream start — 'EF BB BF' is UTF-8, 'FF FE' is UTF-16 LE, 'FE FF' is UTF-16 BE per Unicode Standard 16.0 §2.6; (b) HTTP Content-Type header 'charset=' parameter; (c) statistical analysis — UTF-8 sequences have specific byte patterns (continuation bytes always start with bits '10') per RFC 3629 §3. None is foolproof. A UTF-8 stream missing both BOM and Content-Type can be mistaken for Latin-1 — the failure mode is 'mojibake', visible-but-wrong characters where the decoder picked the wrong codepage and silently corrupted the text.

Question 5

Why do older email systems use base64 even when the network supports binary?

Accepted Answer

SMTP (RFC 5321, Klensin 2008) historically accepted only 7-bit ASCII characters in message bodies — a constraint inherited from telex-era 7-bit channels. RFC 2045 (MIME, Freed & Borenstein 1996) standardized the framework for transporting 8-bit binary content (images, executables, non-ASCII text) over 7-bit transports via Content-Transfer-Encoding values: 7bit, 8bit, binary, quoted-printable, base64. Modern SMTP supports the 8BITMIME extension, but base64 remains the safe default — intermediate gateways may strip the 8th bit, and base64 transit survives any 7-bit channel without corruption.

Binary to Text Converter — Decode Binary to ASCII / UTF-8

Text Output

Binary to Text — Decode Binary Code to Readable Text

Frequently asked questions

Related guides