-
Notifications
You must be signed in to change notification settings - Fork 3.5k
KSES: Preserve some additional invalid HTML comment syntaxes. #6395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fa55c37
0601582
fd66dda
e0e36b0
cc39a88
5eacbc3
7c44482
b60d0e0
8a31702
5dbbe50
dca7d00
5949d4c
5ae60b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -963,6 +963,7 @@ function wp_kses_version() { | |
| * It also matches stray `>` characters. | ||
| * | ||
| * @since 1.0.0 | ||
| * @since 6.6.0 Recognize additional forms of invalid HTML which convert into comments. | ||
| * | ||
| * @global array[]|string $pass_allowed_html An array of allowed HTML elements and attributes, | ||
| * or a context name such as 'post'. | ||
|
|
@@ -981,7 +982,18 @@ function wp_kses_split( $content, $allowed_html, $allowed_protocols ) { | |
| $pass_allowed_html = $allowed_html; | ||
| $pass_allowed_protocols = $allowed_protocols; | ||
|
|
||
| return preg_replace_callback( '%(<!--.*?(-->|$))|(<[^>]*(>|$)|>)%', '_wp_kses_split_callback', $content ); | ||
| $token_pattern = <<<REGEX | ||
| ~ | ||
| ( # Detect comments of various flavors before attempting to find tags. | ||
| (<!--.*?(-->|$)) # - Normative HTML comments. | ||
| | | ||
| </[^a-zA-Z][^>]*> # - Closing tags with invalid tag names. | ||
| ) | ||
| | | ||
| (<[^>]*(>|$)|>) # Tag-like spans of text. | ||
| ~x | ||
| REGEX; | ||
| return preg_replace_callback( $token_pattern, '_wp_kses_split_callback', $content ); | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -1069,23 +1081,61 @@ function _wp_kses_split_callback( $matches ) { | |
| * @access private | ||
| * @ignore | ||
| * @since 1.0.0 | ||
| * @since 6.6.0 Recognize additional forms of invalid HTML which convert into comments. | ||
| * | ||
| * @param string $content Content to filter. | ||
| * @param array[]|string $allowed_html An array of allowed HTML elements and attributes, | ||
| * or a context name such as 'post'. See wp_kses_allowed_html() | ||
| * for the list of accepted context names. | ||
| * @param string[] $allowed_protocols Array of allowed URL protocols. | ||
| * | ||
| * @return string Fixed HTML element | ||
| */ | ||
| function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) { | ||
| $content = wp_kses_stripslashes( $content ); | ||
|
|
||
| // It matched a ">" character. | ||
| /* | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, great idea. Thanks for catching it! |
||
| * The regex pattern used to split HTML into chunks attempts | ||
| * to split on HTML token boundaries. This function should | ||
| * thus receive chunks that _either_ start with meaningful | ||
| * syntax tokens, like a tag `<div>` or a comment `<!-- ... -->`. | ||
| * | ||
| * If the first character of the `$content` chunk _isn't_ one | ||
| * of these syntax elements, which always starts with `<`, then | ||
| * the match had to be for the final alternation of `>`. In such | ||
| * case, it's probably standing on its own and could be encoded | ||
| * with a character reference to remove ambiguity. | ||
| * | ||
| * In other words, if this chunk isn't from a match of a syntax | ||
| * token, it's just a plaintext greater-than (`>`) sign. | ||
| */ | ||
| if ( ! str_starts_with( $content, '<' ) ) { | ||
| return '>'; | ||
| } | ||
|
|
||
| // Allow HTML comments. | ||
| /* | ||
| * When a closing tag appears with a name that isn't a valid tag name, | ||
| * it must be interpreted as an HTML comment. It extends until the | ||
| * first `>` character after the initial opening `</`. | ||
| * | ||
| * Preserve these comments and do not treat them like tags. | ||
| */ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aha, I know now. Then we couldn’t assume bkw that the content starts at the at index two and ends at index minus one |
||
| if ( 1 === preg_match( '~^</[^a-zA-Z][^>]*>$~', $content ) ) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any need to check for any invalid comment closers? I’m on a bus and can't remember exactly, but there were cases like triple dash or cdata-lookalike closer. Or does none of that matter with these tag-closer-like comments? |
||
| $content = substr( $content, 2, -1 ); | ||
| $transformed = null; | ||
|
|
||
| while ( $transformed !== $content ) { | ||
| $transformed = wp_kses( $content, $allowed_html, $allowed_protocols ); | ||
| $content = $transformed; | ||
| } | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is copied from the existing comment behavior below
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could use an inline comment? |
||
|
|
||
| return "</{$transformed}>"; | ||
| } | ||
|
|
||
| /* | ||
| * Normative HTML comments should be handled separately as their | ||
| * parsing rules differ from those for tags and text nodes. | ||
| */ | ||
| if ( str_starts_with( $content, '<!--' ) ) { | ||
| $content = str_replace( array( '<!--', '-->' ), '', $content ); | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are essentially the only change to the regex pattern (an addition).