Skip to content

Handle @ in tag names from lxml >= 6.0 HTML parser#63

Draft
drew-myers wants to merge 1 commit into
masterfrom
fix-keyerror-at-in-tag-name
Draft

Handle @ in tag names from lxml >= 6.0 HTML parser#63
drew-myers wants to merge 1 commit into
masterfrom
fix-keyerror-at-in-tag-name

Conversation

@drew-myers
Copy link
Copy Markdown

@drew-myers drew-myers commented May 6, 2026

Updated to lxml >6.0 in closeio due to a security vuln and it caused this:
https://closeio.sentry.io/issues/7461358574/?project=4506066590433280

Extend the existing :-in-tag-name HACK in get_html_tree to also rewrite tags containing @ to , stashing the original name in __tag_name so render_html_tree can restore it.

Ships a test to verify the case linked above.

@drew-myers drew-myers force-pushed the fix-keyerror-at-in-tag-name branch from 801e8ce to 15905a3 Compare May 6, 2026 20:26
lxml 6.0's HTML parser keeps pseudo-tags like <shawn@iluminarlighting.com>
(common in quoted reply headers of the form Name <addr@domain>) as real
elements with @ in the tag name. slice_tree then round-trips them
through getelementpath + find, and the elementpath tokenizer raises
KeyError: '@' because @ is the attribute axis token and isn't valid
mid-NameTest.

Extend the existing :-in-tag-name HACK in get_html_tree to also
rewrite tags containing @ to <span>, stashing the original name in
__tag_name so render_html_tree can restore it.
@drew-myers drew-myers force-pushed the fix-keyerror-at-in-tag-name branch from 15905a3 to e6979eb Compare May 6, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant