Regex
Regex Lookbehind Assertions Explained
Lookbehind lets you match something that is preceded by a pattern without consuming it. Here is how it works across JavaScript, Python, Go, PCRE, and when to actually use it.
Lookbehind is the least understood regex feature for most developers. It is narrow, it is occasionally irreplaceable, and it breaks in engines you did not expect. This is a straight-to-the-point tour: the syntax, the traps, and the engine matrix.
What lookbehind does
A lookbehind is a zero-width assertion: it matches a position, not characters. (?<=foo)bar matches bar only when it is preceded by foo, but the foo is not part of the match. Negative lookbehind (?<!foo)bar matches bar only when it is NOT preceded by foo.
The point is that the preceding context does not get consumed. If you were to replace the match with something, you would only replace bar, not foobar.
// Match "100" only when preceded by "$"
const re = /(?<=\$)\d+/g;
"price is $100, quantity 200".match(re);
// => ["100"]
// Negative: match digits NOT preceded by "$"
const re2 = /(?<!\$)\b\d+/g;
"price is $100, quantity 200".match(re2);
// => ["200"]
Without lookbehind you would have to capture \$(\d+) and pick group 1 from the match object, which is fine for a single match but awkward for replace, split, or matchAll pipelines.
Engine support matrix
This is where people get burned. Lookbehind is not universal.
| Engine | Lookbehind | Variable-length | Notes |
|---|---|---|---|
| JavaScript (V8, SpiderMonkey) | Yes | Yes | ES2018; Chrome 62+, Node 10+, Safari 16.4+ |
Python re | Yes | No | Fixed length only; regex module lifts this |
Python regex (PyPI) | Yes | Yes | Drop-in replacement |
| PCRE / PCRE2 | Yes | No (PCRE), yes (PCRE2 partial) | Used by PHP, nginx, many C tools |
.NET Regex | Yes | Yes | Always supported |
Java java.util.regex | Yes | Bounded | Requires bounded quantifiers inside |
Go regexp (RE2) | NO | — | No lookaround at all |
Rust regex crate | NO | — | Same reason as Go: linear-time guarantee |
| ripgrep default | NO | — | Uses Rust regex; enable PCRE2 with -P |
Go and Rust drop lookbehind on purpose. Their regex engines guarantee linear time in input length; lookaround prevents that guarantee. If you grep with rg and need lookbehind, use rg -P to switch to PCRE2.
Variable-length lookbehind
The oldest complaint about lookbehind was the fixed-length restriction. (?<=abc) is fine. (?<=a{1,3}) used to fail in most engines. That changed:
- JavaScript allows variable length since day one of the ES2018 spec.
- Python’s stdlib
restill requires fixed length. Switch to theregexPyPI module if you need alternation of different lengths. - .NET has always allowed it.
- Java requires a bounded upper limit, e.g.
(?<=a{1,100}).
# Python stdlib: this FAILS
import re
re.search(r"(?<=ab|abcd)X", "abcdX")
# re.error: look-behind requires fixed-width pattern
# Python regex module: works
import regex
regex.search(r"(?<=ab|abcd)X", "abcdX")
# <regex.Match ...>
Common real-world uses
Three patterns cover 90% of legitimate lookbehind usage:
- Extracting a token after a marker without keeping the marker. Currency symbols, prefixes like
id-, log level tags. - Splitting on a delimiter that you want to keep on the right side.
str.split(/(?=\n## )/)splits a Markdown doc before each##heading without eating it. - Avoiding false positives around word boundaries that
\bcannot express. Example: matchlogbut not when preceded bysyslog, i.e.(?<!sys)log.
For the daily reference of metacharacters, quantifiers, and anchors, the regex cheatsheet covers the rest.
When lookbehind is the wrong tool
Just because lookbehind exists does not mean you should reach for it.
- If the engine does not support it (Go, Rust
regex, ripgrep default), rewrite with a capture group.\$(\d+)with group 1 extraction is equivalent to(?<=\$)\d+for most purposes. - If the preceding context is an entire word or line, anchors and word boundaries are cheaper:
^WARN:is better than(?<=^)WARN:. - If you are scanning a massive log file, lookaround in a backtracking engine (PCRE, Python) can explode into pathological runtimes on adversarial input. RE2-style engines (Go, Rust) reject this by design.
- If you find yourself nesting three lookbehinds, you probably want a real parser.
Performance in practice
Lookbehind is not free. In a backtracking engine (PCRE, Python re, Java) a lookbehind of width N costs roughly N times the baseline work at each candidate position because the engine has to step back N characters and attempt the sub-pattern. Variable-length lookbehind multiplies this — the engine tries every length in the allowed range.
For hot paths — log parsers, big CSV scans, code analyzers — this matters. Three rough guidelines:
- Fixed-length lookbehind of 1-3 characters: effectively free.
- Fixed-length 4-20: still cheap on modern CPUs.
- Variable-length with wide ranges: measure, especially on adversarial input.
If you are writing regex that runs once per HTTP request, ignore the performance concern and optimize for readability. If you are writing regex that runs over gigabytes, benchmark.
Debugging lookbehind in practice
Two tips that save hours:
Use matchAll in JavaScript, not match, when you need offsets. match with the global flag drops capture groups.
const re = /(?<=\$)(\d+(?:\.\d+)?)/g;
for (const m of "$1.50 and $42".matchAll(re)) {
console.log(m[1], m.index);
}
// 1.50 1
// 42 11
Test your pattern against at least three strings: the positive case, the negative case with the wrong prefix, and the edge case at start-of-string. Lookbehind at index 0 fails silently if the preceding context is required; that is usually what you want, but confirm it.
A surprisingly common bug: engineers hand-test a regex with lookbehind in the browser’s DevTools and it works, then paste it into server-side Go or Rust code and the pattern silently fails to compile or matches nothing. Always run the pattern through the exact engine your code will use.
Takeaways
Lookbehind is worth the memory cost. Use (?<=...) when you need the prefix context without consuming it, (?<!...) when you need the absence of a prefix. Check your engine first: Go and Rust say no, Python stdlib says fixed-width only, everyone else is fine. For long reference material bookmark the regex cheatsheet; for slug and URL patterns where encoding intersects with regex, see the URL encoding guide.