Question 1

Why does string.length return wrong numbers for emoji and CJK text?

Accepted Answer

JavaScript's string.length returns UTF-16 code units, the storage unit the language uses internally. For characters in the Basic Multilingual Plane (the original ~65,536 code points up to U+FFFF), one code point fits in one code unit. But emoji like U+1F600 😀, less common CJK ideographs above U+FFFF, mathematical symbols, and ancient scripts require a surrogate pair — two UTF-16 code units encoding one code point. ECMA-262's String iterator (the [...str] spread operator since ES2015) iterates over code points instead, so [...'😀'].length returns 1, the human-expected value. UAX #29 (Unicode 16.0, revision 45) defines an even stricter level called grapheme clusters, where ZWJ-joined family emoji like 👨‍👩‍👧 count as one user-perceived character even though built from five code points; full grapheme-cluster segmentation requires Intl.Segmenter, which most counters don't ship.

Question 2

How does Twitter/X actually count characters under the 280 limit?

Accepted Answer

X Corp's developer documentation (docs.x.com/fundamentals/counting-characters) specifies weighted character counting after Unicode Normalization Form C (NFC). Most characters count as 1; Chinese, Japanese (Kanji, Hiragana, Katakana), Korean (Hangul), and fullwidth forms count as 2; all emoji count as 2 regardless of skin-tone or ZWJ complexity; URLs are wrapped to t.co at a fixed weight of 23 characters regardless of original length. The 280 number became the headline figure when Twitter doubled the original 140-char ceiling in 2017, but for Japanese or Chinese content the practical limit is closer to 140 weighted characters. The official open-source twitter-text library is the canonical reference implementation when integration precision matters.

Question 3

Where does the 160-character SMS limit come from?

Accepted Answer

3GPP TS 23.038 (originally GSM Recommendation 03.38, mandatory for GSM handsets) defines the GSM 7-bit default alphabet. An SMS message envelope carries up to 140 octets of payload; with 7 bits per character that yields ⌊140 × 8 / 7⌋ = 160 characters per single SMS. If a message contains any character outside the 7-bit table — most accented Latin (é, ñ, ü), all emoji, all CJK — the network falls back to UCS-2 encoding (16 bits per character) and the per-segment limit drops to 70. Some markets ship national language shift tables (Portuguese, Turkish, several Brahmic scripts) that extend the 7-bit set. Multi-part SMS (per 3GPP TS 23.040) adds a User Data Header that further reduces per-segment payload to 153 (7-bit) or 67 (UCS-2).

Question 4

Are emoji always 2 characters everywhere?

Accepted Answer

It depends on the system. ECMA-262 code-point counting treats a simple emoji like U+1F600 😀 as 1; a regional indicator pair like 🇺🇸 (two code points U+1F1FA + U+1F1F8) as 2; and a ZWJ family 👨‍👩‍👧 as 5. UAX #29 grapheme-cluster counting collapses all three to 1 user-perceived character. X Corp's weighted counter charges every emoji 2 characters regardless of underlying complexity. SMS using the GSM 7-bit alphabet doesn't carry emoji at all — the message gets re-encoded as UCS-2 and each emoji costs one or two UTF-16 code units depending on plane. The 'right' count depends on which platform's billing or limit rule the user is trying to satisfy.

Question 5

How does this counter handle accessibility for screen readers?

Accepted Answer

The total and stripped counts and the Twitter/SMS progress bars sit inside a region marked aria-live="polite", which W3C WCAG Success Criterion 4.1.3 Status Messages (introduced in WCAG 2.1, W3C Recommendation 5 June 2018; carried unchanged into WCAG 2.2, Recommendation 5 October 2023) defines as the canonical mechanism for assistive technology to announce content updates without moving keyboard focus. The polite politeness setting queues announcements behind any speech the user is already hearing — appropriate for non-urgent tally updates, where assertive would interrupt mid-sentence on every keystroke. Screen readers (NVDA, JAWS, VoiceOver) consume the live region automatically; nothing else is required from the user.

Character Counter

Character Counter — Count Letters, Words & Symbols Online

Frequently asked questions