Skip to content

review: XPath 1.0 evaluator (mirror of upstream #2305)#2

Closed
navidemad wants to merge 5 commits into
mainfrom
feat/xpath-1.0-evaluator
Closed

review: XPath 1.0 evaluator (mirror of upstream #2305)#2
navidemad wants to merge 5 commits into
mainfrom
feat/xpath-1.0-evaluator

Conversation

@navidemad
Copy link
Copy Markdown
Owner

Self-PR opened solely as the host for /ultrareview runs against the XPath 1.0 evaluator changes.

The real review and merge target is upstream lightpanda-io#2305 — please direct comments there.

This PR will be closed (not merged) once Ultrareview's run completes.

Ports the capybara-lightpanda XPath 1.0 polyfill into Lightpanda.
Exposes the WHATWG Document.evaluate / XPathResult / XPathEvaluator
/ XPathExpression surface and routes CDP DOM.performSearch XPath
queries through the new evaluator. The capybara-lightpanda gem can
drop its ~700-line JS polyfill in the next release.

New module src/browser/xpath/ (Tokenizer, Parser, Ast, Evaluator,
Functions, Result). New webapi types XPathResult,
XPathExpression, XPathEvaluator. Coverage and stubs match the
polyfill 1:1 — see capybara-lightpanda/XPATH_COMPLIANCE.md for
the full spec.

Tests: 91-case conformance + result-API + evaluator-API + CDP
fixtures, plus the engine's Zig unit suite (601/601 pass).
The Parser borrows string slices from its input for AST literals,
names, and var refs. Without duping, the AST holds slices into the JS
call_arena, which is reset when the top-level call returns — every
subsequent evaluate() of a cached XPathExpression would dereference
freed memory.
A bare indexOf("::") matched CSS pseudo-elements (a::before) and
attribute values containing '::' ([data-x="x::y"]), misrouting them
to the XPath evaluator. Require an axis-name shape ([a-zA-Z-])
immediately before '::' so only real axis specifiers like
descendant::p are dispatched to XPath.
The attribute axis was calling Entry.toAttribute on every visit,
materializing fresh *Attribute structs (plus duped name/value strings)
into page-lifetime storage. Repeated XPath queries — the Capybara/
Selenium polling pattern this PR targets — accumulated unbounded
copies for the same DOM entries. Route through frame._attribute_lookup
so each Entry resolves to a single cached *Attribute, matching
List.getAttribute and NamedNodeMap.getAtIndex.
Per XPath 1.0 §5.7, the data model has no CDATASection node — CDATA
content is part of the text node value. The text() node test was only
matching DOM nodeType 3 (Text), silently excluding CDATA sections
(nodeType 4) parsed via DOMParser/XMLDocument and inline foreign
content like SVG with embedded scripts.
@navidemad navidemad closed this Apr 28, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant