Question 1

What is LCS, and why does it produce better diffs than line-by-line comparison?

Accepted Answer

A Longest Common Subsequence (LCS) is the longest sequence of lines that appear in the same order in both inputs (not necessarily contiguous). Hunt and Szymanski (1977, CACM 20(5):350–353) gave the original fast LCS algorithm; Hirschberg (1975, CACM 18(6):341–343) showed it can be computed in linear space; Myers (1986, Algorithmica 1:251–266) gave the O(ND) algorithm that powers git diff (default) and GNU diffutils (default heuristic variant). The LCS framing is 'what is the largest set of lines we can align between input A and input B?', and the diff is then 'everything in A but not in the LCS = removed; everything in B but not in the LCS = added; everything in the LCS = same'. Naive line-by-line diff (compare line 1 of A to line 1 of B, etc.) handles insertions and deletions poorly: a single line added at the top of B cascades into N remove/add pairs for everything below.

Question 2

Why does the tool fall back to naive comparison above 4M cells?

Accepted Answer

The dynamic-programming LCS table is an Int32Array of size (n+1)×(m+1) where n and m are the line counts of input A and B. At 4 million cells the array uses about 16 MB of memory; for two 2,000-line inputs that is the boundary. Above the cap, allocating gigabytes of memory would crash the page or slow the browser to a halt. The fallback is a fast line-by-line comparison: walk both inputs in parallel, mark lines that differ at the same position. It is not as accurate as LCS — a single inserted line propagates into N differences below it — but it lets the page stay responsive on very large inputs rather than dying.

Question 3

How does Myers' O(ND) algorithm differ from Hunt-Szymanski?

Accepted Answer

Hunt and Szymanski (1977) compute LCS in O((r+n) log n) time where r is the number of 'ordered pairs of matching positions' between A and B — fast when matches are sparse, slow when they are dense. Myers (1986) reframed LCS as a graph shortest-path problem and gave an O(ND) algorithm where N is the input length and D is the size of the resulting diff; for typical version-control diffs where most lines are unchanged (D is small), Myers is drastically faster. Hirschberg (1975) is a different optimization: applied to the standard O(nm) dynamic-programming recurrence, his divide-and-conquer construction reduces auxiliary space from O(nm) to O(n+m) at the cost of a roughly 2× time factor — so the same LCS algorithm can run on much larger inputs without the memory blowup. This page's implementation uses the straight quadratic dynamic programming because it produces edit scripts (added/removed/same) that match git diff-style output; Myers and Hirschberg are practical optimizations on top, not different output semantics.

Question 4

Why does inserting a line at the top of input B not cascade as remove/add for the rest?

Accepted Answer

Because the LCS algorithm finds the longest common subsequence first, then derives the diff from it. If input A is `[a, b, c, d]` and input B is `[x, a, b, c, d]`, the LCS is `[a, b, c, d]` — every line of A appears in B in the same order. The diff result is 'B has one extra line (x) at the start' and the four trailing lines are marked 'same' on both sides. Naive line-by-line diff would compare A[0]=a to B[0]=x (different), A[1]=b to B[1]=a (different), and so on — every line propagates into a fake change. This is the practical reason git diff and Unix diff(1) use LCS rather than naive comparison: the patches stay minimal and human-readable even when content is rearranged.

Question 5

How does this tool handle accessibility for screen readers?

Accepted Answer

The diff result region is marked aria-live="polite", the W3C WCAG Success Criterion 4.1.3 (Status Messages, introduced in WCAG 2.1, Recommendation 5 June 2018; carried unchanged into WCAG 2.2, Recommendation 5 October 2023) pattern. Polite live regions queue announcements after any speech in progress, so editing either input pane announces the new diff result without interrupting the user mid-sentence. Screen readers (NVDA, JAWS, VoiceOver) consume the live region automatically; the user does not need to do anything else.

Text Diff Online

Original

Modified

Text Diff Online — Compare Text & Find Differences (LCS Algorithm)

Frequently asked questions