-
-
Notifications
You must be signed in to change notification settings - Fork 27
Description
TL;DR
I’m thinking out loud. We need namespace information. I can think of three solutions. Not sure which is best.
Introduction
HTML has the concept of elements: things like <strong></strong> are normal elements. There’s a subcategory of “foreign elements”: those from MathML (mi) or from SVG (rect).
A practical example of why this information is needed is because of tag-name normalisation: in HTML, tag-names are case-insensitive. In SVG or MathML, they are not. And, unfortunately tag-names themselves cannot be used to detect whether an element is foreign or not, because there are elements which exist in multiple spaces. For example: var in HTML and MathML, and a in HTML and SVG.
Take the following code:
<!doctype html>
<title>Foreign elements in HTML</title>
<h1>HTML</h1>
<a href="#">HTML link</a>
<var>htmlVar</var>
<svg>
<a href="#">SVG link</a>
<span>SVG</span>
<a href="#">SVG link</a>
</svg>
<math>
<mi>mathMLVar</mi>
<span>MathML</span>
<mi>mathMLVar</mi>
</math>When running the following script:
var length = document.all.length;
var index = -1;
var node;
while (++index < length) node = document.all[index], console.log([node.tagName, node.namespaceURI, node.textContent]);Yields:
[Log] ["HTML", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵HTML↵HTML link↵htmlVar↵↵ …G↵ SVG link↵↵↵ mathMLVar↵ MathML↵ mathMLVar↵↵"] (3)
[Log] ["HEAD", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵"] (3)
[Log] ["TITLE", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML"] (3)
[Log] ["BODY", "http://www.w3.org/1999/xhtml", "HTML↵HTML link↵htmlVar↵↵ SVG link↵ SVG↵ SVG link↵↵↵ mathMLVar↵ MathML↵ mathMLVar↵↵"] (3)
[Log] ["H1", "http://www.w3.org/1999/xhtml", "HTML"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "HTML link"] (3)
[Log] ["VAR", "http://www.w3.org/1999/xhtml", "htmlVar"] (3)
[Log] ["svg", "http://www.w3.org/2000/svg", "↵ SVG link↵ "] (3)
[Log] ["a", "http://www.w3.org/2000/svg", "SVG link"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "SVG"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "SVG link"] (3)
[Log] ["math", "http://www.w3.org/1998/Math/MathML", "↵ mathMLVar↵ "] (3)
[Log] ["mi", "http://www.w3.org/1998/Math/MathML", "mathMLVar"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "MathML"] (3)
[Log] ["MI", "http://www.w3.org/1999/xhtml", "mathMLVar"] (3)Note 1: Non-foreign elements break out of their foreign context.
Note 2: HTML is case-insensitive (normalised to upper-case), foreign elements are case-sensitive.
Proposal
I propose either of the following:
- Add
namespaceon some nodes (notably,root,<mathml>,<svg>). To determine the namespace of a node, check its closest ancestor with a namespace. - Add
namespaceonrootnodes (and wrap<svg>and<mathml>inroots). To determine the namespace of a node, check its closestrootfor a namespace. This changes the semantics ofroots somewhat. - Add
namespaceon any element.
The downsides of the first two as that it’s hard to determine the namespace from an element in a syntax tree without ancestral getters. However, both make moving nodes around quite easy.
The latter is verbose, but does allow for easy access. However, it makes it easy for things to go wrong when shuffling nodes around.
Note: detecting namespaces upon creation (in rehype-parse), is very do-able. I’d like to make the usage of hastscript and transformers very easy too, though!
Do let me know about your thoughts on this!