Skip to content
Regex Lookbehind Assertions Explained

Regex

Regex Lookbehind Assertions Explained

Lookbehind lets you match something that is preceded by a pattern without consuming it. Here is how it works across JavaScript, Python, Go, PCRE, and when to actually use it.

Lookbehind is the least understood regex feature for most developers. It is narrow, it is occasionally irreplaceable, and it breaks in engines you did not expect. This is a straight-to-the-point tour: the syntax, the traps, and the engine matrix.

What lookbehind does

A lookbehind is a zero-width assertion: it matches a position, not characters. (?<=foo)bar matches bar only when it is preceded by foo, but the foo is not part of the match. Negative lookbehind (?<!foo)bar matches bar only when it is NOT preceded by foo.

The point is that the preceding context does not get consumed. If you were to replace the match with something, you would only replace bar, not foobar.

// Match "100" only when preceded by "$"
const re = /(?<=\$)\d+/g;
"price is $100, quantity 200".match(re);
// => ["100"]

// Negative: match digits NOT preceded by "$"
const re2 = /(?<!\$)\b\d+/g;
"price is $100, quantity 200".match(re2);
// => ["200"]

Without lookbehind you would have to capture \$(\d+) and pick group 1 from the match object, which is fine for a single match but awkward for replace, split, or matchAll pipelines.

Engine support matrix

This is where people get burned. Lookbehind is not universal.

EngineLookbehindVariable-lengthNotes
JavaScript (V8, SpiderMonkey)YesYesES2018; Chrome 62+, Node 10+, Safari 16.4+
Python reYesNoFixed length only; regex module lifts this
Python regex (PyPI)YesYesDrop-in replacement
PCRE / PCRE2YesNo (PCRE), yes (PCRE2 partial)Used by PHP, nginx, many C tools
.NET RegexYesYesAlways supported
Java java.util.regexYesBoundedRequires bounded quantifiers inside
Go regexp (RE2)NONo lookaround at all
Rust regex crateNOSame reason as Go: linear-time guarantee
ripgrep defaultNOUses Rust regex; enable PCRE2 with -P

Go and Rust drop lookbehind on purpose. Their regex engines guarantee linear time in input length; lookaround prevents that guarantee. If you grep with rg and need lookbehind, use rg -P to switch to PCRE2.

Variable-length lookbehind

The oldest complaint about lookbehind was the fixed-length restriction. (?<=abc) is fine. (?<=a{1,3}) used to fail in most engines. That changed:

  • JavaScript allows variable length since day one of the ES2018 spec.
  • Python’s stdlib re still requires fixed length. Switch to the regex PyPI module if you need alternation of different lengths.
  • .NET has always allowed it.
  • Java requires a bounded upper limit, e.g. (?<=a{1,100}).
# Python stdlib: this FAILS
import re
re.search(r"(?<=ab|abcd)X", "abcdX")
# re.error: look-behind requires fixed-width pattern

# Python regex module: works
import regex
regex.search(r"(?<=ab|abcd)X", "abcdX")
# <regex.Match ...>

Common real-world uses

Three patterns cover 90% of legitimate lookbehind usage:

  1. Extracting a token after a marker without keeping the marker. Currency symbols, prefixes like id-, log level tags.
  2. Splitting on a delimiter that you want to keep on the right side. str.split(/(?=\n## )/) splits a Markdown doc before each ## heading without eating it.
  3. Avoiding false positives around word boundaries that \b cannot express. Example: match log but not when preceded by syslog, i.e. (?<!sys)log.

For the daily reference of metacharacters, quantifiers, and anchors, the regex cheatsheet covers the rest.

When lookbehind is the wrong tool

Just because lookbehind exists does not mean you should reach for it.

  • If the engine does not support it (Go, Rust regex, ripgrep default), rewrite with a capture group. \$(\d+) with group 1 extraction is equivalent to (?<=\$)\d+ for most purposes.
  • If the preceding context is an entire word or line, anchors and word boundaries are cheaper: ^WARN: is better than (?<=^)WARN:.
  • If you are scanning a massive log file, lookaround in a backtracking engine (PCRE, Python) can explode into pathological runtimes on adversarial input. RE2-style engines (Go, Rust) reject this by design.
  • If you find yourself nesting three lookbehinds, you probably want a real parser.

Performance in practice

Lookbehind is not free. In a backtracking engine (PCRE, Python re, Java) a lookbehind of width N costs roughly N times the baseline work at each candidate position because the engine has to step back N characters and attempt the sub-pattern. Variable-length lookbehind multiplies this — the engine tries every length in the allowed range.

For hot paths — log parsers, big CSV scans, code analyzers — this matters. Three rough guidelines:

  • Fixed-length lookbehind of 1-3 characters: effectively free.
  • Fixed-length 4-20: still cheap on modern CPUs.
  • Variable-length with wide ranges: measure, especially on adversarial input.

If you are writing regex that runs once per HTTP request, ignore the performance concern and optimize for readability. If you are writing regex that runs over gigabytes, benchmark.

Debugging lookbehind in practice

Two tips that save hours:

Use matchAll in JavaScript, not match, when you need offsets. match with the global flag drops capture groups.

const re = /(?<=\$)(\d+(?:\.\d+)?)/g;
for (const m of "$1.50 and $42".matchAll(re)) {
  console.log(m[1], m.index);
}
// 1.50 1
// 42 11

Test your pattern against at least three strings: the positive case, the negative case with the wrong prefix, and the edge case at start-of-string. Lookbehind at index 0 fails silently if the preceding context is required; that is usually what you want, but confirm it.

A surprisingly common bug: engineers hand-test a regex with lookbehind in the browser’s DevTools and it works, then paste it into server-side Go or Rust code and the pattern silently fails to compile or matches nothing. Always run the pattern through the exact engine your code will use.

Takeaways

Lookbehind is worth the memory cost. Use (?<=...) when you need the prefix context without consuming it, (?<!...) when you need the absence of a prefix. Check your engine first: Go and Rust say no, Python stdlib says fixed-width only, everyone else is fine. For long reference material bookmark the regex cheatsheet; for slug and URL patterns where encoding intersects with regex, see the URL encoding guide.

Related tools

By ·