Skip to content

Remove Duplicate Lines — Deduplicate Text & Filter Unique Lines

Last verified May 2026 — runs in your browser

Remove Duplicate Lines Online — Deduplicate Text & Filter Unique Lines

Paste a list, log, dataset, or anything line-separated and the page strips out duplicate lines in real time. The first occurrence of each line is kept (so the original order survives) and toggles for case-sensitivity and whitespace trimming let you decide whether " Apple", "apple", and "Apple " all collapse to one entry or stay as three. A stats line shows how many lines you started with, how many unique survived, and how many duplicates were removed — useful when cleaning up an exported email list, deduping CSV rows pasted from a spreadsheet, or compressing a logfile to its unique error messages before grepping further.

About this tool

The deduplication is a single-pass O(n) walk over the input lines using a JavaScript `Set` keyed on the comparison form (lowercased + optionally trimmed). The first occurrence wins, so order of remaining lines exactly matches their first appearance in the input — important when the order encodes meaning (timestamped logs, ranked lists, ordered CSVs). Whitespace trimming applies to both the comparison key AND the output line when enabled, so " apple" and "apple " don't just match each other but also export as a clean "apple" in the result. Case-insensitive mode lowercases only the comparison key, so the kept line preserves its original casing — "Apple" is kept and "apple" is dropped, not the other way round. The whole pass runs reactively as you type or paste, so a 10,000-line list dedupes in tens of milliseconds. Use cases: cleaning up an email export before sending a newsletter, deduping a CSV column pasted from Excel, collapsing a noisy logfile to its unique error messages, building a unique-tags list from a folksonomy, or sanitizing a wordlist before feeding it to another tool.

  • Single-pass O(n) deduplication via JavaScript Set
  • First occurrence wins — original order of remaining lines preserved
  • Optional case-insensitive matching (Apple = apple = APPLE)
  • Optional whitespace trimming (applies to both comparison and output)
  • Live stats: original line count, unique count, removed count
  • Reactive — runs as you type or paste, no Run button needed
  • Handles 10,000+ lines in tens of milliseconds
  • Empty lines counted as a duplicate (first kept, rest removed)
  • One-click copy of the deduped result to clipboard
  • Useful for email exports, CSV columns, logfile compression, wordlist cleanup

Free. No signup. Your inputs stay in your browser. Ads via Google AdSense (consent required).

Frequently asked questions

Why is the first occurrence kept rather than the last?

The dedup pass walks the input lines once and inserts each into a JavaScript Set keyed on the comparison form. ECMA-262 specifies that Set iteration order equals insertion order, so the first time a key appears the original line is kept and any subsequent duplicates are dropped. This preserves order-as-meaning patterns (timestamped logs, ranked lists, ordered CSVs) where the first row is canonical. If last-occurrence-wins is needed, the inverse is achievable by reversing input → dedup → reversing output, but most use cases — email exports, CSV cleanup, log compression — want first-wins.

How does case-insensitive comparison handle Unicode edge cases?

The comparison key is generated with String.prototype.toLowerCase, which performs simple Unicode case folding — single-character mappings only. This matches what most users expect ('Apple' = 'apple' = 'APPLE') but does not handle a few full-folding cases defined by UCD CaseFolding.txt: German ß folds to 'ss' only under full folding, and the Turkish dotless I/i pair is another classic locale-dependent case. For everyday lists — emails, CSVs, log lines — simple folding is correct; for German legal text or Turkish, routing the comparison through Intl.Collator(locale, { sensitivity: 'accent' }) handles those cases instead.

What is the time complexity, and how does it scale?

The algorithm is a single-pass O(n) walk over the input lines. ECMA-262 specifies Set.prototype.has and Set.prototype.add as sublinear; mainstream engines (V8, SpiderMonkey, JavaScriptCore) implement Set on hash tables, where amortized O(1) follows from the standard hash-table analysis (Knuth, TAOCP Vol 3 §6.4 Hashing). The total work for n input lines is O(n) inserts and O(n) lookups. The pipeline scales linearly: a 10,000-line list dedupes in milliseconds, and a 100,000-line list completes in well under a second on typical hardware. Memory grows with the number of unique lines, not the input length — duplicate-heavy inputs (a noisy log with repeated errors) compress to a small unique set.

Why does whitespace trimming apply to both the comparison key AND the output line?

This is a deliberate design choice: when 'trim whitespace' is enabled, ' apple', 'apple ', and 'apple' are not only treated as equal during comparison, they all export as the clean 'apple' in the result. Trimming for comparison only — keeping the original spacing — is also defensible (some uses care about preserving exact bytes), but for typical use cases (email exports, CSV cleanup, list sanitization) the user wants both: collapse equivalents AND clean the survivors. If comparison-only trimming is required, pre-processing with text-replace before dedup with trimming off achieves that.

How does this tool handle accessibility for screen readers?

The output region and the stats line (original count, unique count, removed count) sit inside an aria-live="polite" region, the W3C WCAG Success Criterion 4.1.3 (Status Messages, introduced in WCAG 2.1, Recommendation 5 June 2018; carried unchanged into WCAG 2.2, Recommendation 5 October 2023) pattern. Polite live regions queue announcements after any speech in progress, appropriate for incremental updates as the user types or pastes. Screen readers (NVDA, JAWS, VoiceOver) consume the live region automatically; the user does not need to do anything else.

Sources (5)
  • The Unicode Consortium (2024). The Unicode Standard, Version 16.0 — UCD CaseFolding.txt (simple vs full case folding). Unicode Consortium, Mountain View, CA (released 10 September 2024).
  • ECMA International (2025). ECMAScript 2025 Language Specification — Set objects (insertion-order iteration) and String.prototype.toLowerCase. ECMA-262, 16th edition, June 2025.
  • ECMA International (2025). ECMAScript 2025 Internationalization API Specification — Intl.Collator (locale-aware case-insensitive option). ECMA-402, 12th edition, June 2025.
  • Knuth, D. E. (1998). The Art of Computer Programming, Vol. 3: Sorting and Searching — §6.4 Hashing. Addison-Wesley, 2nd edition (amortized O(1) hash-table analysis).
  • World Wide Web Consortium (W3C) (2018). Web Content Accessibility Guidelines (WCAG) 2.1 — Success Criterion 4.1.3 Status Messages. W3C Recommendation 5 June 2018; carried unchanged into WCAG 2.2 (Recommendation 5 October 2023).

These are the original publications the formulas in this tool are based on. Locate them by journal name and year on Google Scholar or PubMed.