review: XPath 1.0 evaluator (mirror of upstream #2305)#2
Closed
navidemad wants to merge 5 commits into
Closed
Conversation
Ports the capybara-lightpanda XPath 1.0 polyfill into Lightpanda. Exposes the WHATWG Document.evaluate / XPathResult / XPathEvaluator / XPathExpression surface and routes CDP DOM.performSearch XPath queries through the new evaluator. The capybara-lightpanda gem can drop its ~700-line JS polyfill in the next release. New module src/browser/xpath/ (Tokenizer, Parser, Ast, Evaluator, Functions, Result). New webapi types XPathResult, XPathExpression, XPathEvaluator. Coverage and stubs match the polyfill 1:1 — see capybara-lightpanda/XPATH_COMPLIANCE.md for the full spec. Tests: 91-case conformance + result-API + evaluator-API + CDP fixtures, plus the engine's Zig unit suite (601/601 pass).
The Parser borrows string slices from its input for AST literals, names, and var refs. Without duping, the AST holds slices into the JS call_arena, which is reset when the top-level call returns — every subsequent evaluate() of a cached XPathExpression would dereference freed memory.
A bare indexOf("::") matched CSS pseudo-elements (a::before) and
attribute values containing '::' ([data-x="x::y"]), misrouting them
to the XPath evaluator. Require an axis-name shape ([a-zA-Z-])
immediately before '::' so only real axis specifiers like
descendant::p are dispatched to XPath.
The attribute axis was calling Entry.toAttribute on every visit, materializing fresh *Attribute structs (plus duped name/value strings) into page-lifetime storage. Repeated XPath queries — the Capybara/ Selenium polling pattern this PR targets — accumulated unbounded copies for the same DOM entries. Route through frame._attribute_lookup so each Entry resolves to a single cached *Attribute, matching List.getAttribute and NamedNodeMap.getAtIndex.
Per XPath 1.0 §5.7, the data model has no CDATASection node — CDATA content is part of the text node value. The text() node test was only matching DOM nodeType 3 (Text), silently excluding CDATA sections (nodeType 4) parsed via DOMParser/XMLDocument and inline foreign content like SVG with embedded scripts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Self-PR opened solely as the host for
/ultrareviewruns against the XPath 1.0 evaluator changes.The real review and merge target is upstream lightpanda-io#2305 — please direct comments there.
This PR will be closed (not merged) once Ultrareview's run completes.