Fix problem with encoding of css entities when post with existing block level custom css is edited by user without unfiltered_html#11104
Conversation
…custom css is edited by user without unfiltered_html
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
FYI - I will add tests for this once there is confirmation that this is the correct approach for fixing this bug. There may be a better solution. If there is feel free to close this PR and open an alternative. |
src/wp-includes/blocks.php
Outdated
|
|
||
| return str_replace( | ||
| array( '&', '>', '"', ''' ), | ||
| array( '&', '>', '"', "'" ), |
There was a problem hiding this comment.
We will not doubt need to account for other values here, but we can work out exactly what needs to be covered once there is some agreement on the best way to solve this problem - I imagine there will be a smarter solution - so didn't spend too much time finessing this one.
| @@ -2077,6 +2077,17 @@ function _filter_block_content_callback( $matches ) { | |||
| function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) { | |||
| $block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block ); | |||
There was a problem hiding this comment.
Thanks for getting up a fix!
I was wondering: what is the important sanitization step for block CSS attributes?
wp_kses() treats the CSS string as HTML. But should it?
I'm wondering if an alternative solution would be to target the css attribute and run it through wp_strip_all_tags rather than wp_kses.
Or running through something similar (or reuse this same validation in a helper) that @sirreal and @dmsnell worked on in WP_REST_Global_Styles_Controller::validate_custom_css() for https://core.trac.wordpress.org/ticket/64418
There, for users without unfiltered_html, & and > in block custom CSS were being double-encoded by KSES + JSON, so the CSS broke.
(Sorry Jon and Dennis - you've become my default go-to brains trust for this stuff 😄 )
|
Related: https://core.trac.wordpress.org/changeset/61486 / #10641 fixed the same class of KSES-mangling issue for Global Styles custom CSS by pre-escaping the JSON with JSON_HEX_TAG | JSON_HEX_AMP. This PR addresses the same problem for per-block custom CSS (attrs.style.css), which goes through the separate filter_block_kses() pipeline. I don't think the same pre-escaping approach can work in this case, as parse_blocks() calls json_decode() on the entire attributes object, which converts \u0026 back to & before KSES runs - but I do not know a lot about these flows, and don't have time to look closer as currenlty travelling - so could be completely wrong about this. |
|
I had some trouble reproducing the issue. In order to test this, I had to enable a recent version of the Gutenberg plugin. The individual block CSS feature is not yet available in Core yet, is it? When I used an author role, I don't see the additional CSS panel for blocks. When I used an editor role, I was unable to reproduce the issue because it seems to have Am I doing something wrong in the reproduction steps? Some thoughts based on the issue:
I was very happy with the solution in r61486. Post content exclusively contained JSON which has some flexibility in escaping. JSON can be made to be plain HTML text by escaping HTML syntax characters ( This PR tries to recover after KSES has mangled the data. That seems inherently more risky. It should be possible to decode HTML character references (like done here) but what if KSES starts to remove things that look like tags? What if I'd like to use Another option is to protect the data before KSES can mangle it by encoding it ourselves in an HTML-text safe way. A few quick options come to mind:
Of course, before using the value it will need to be decoded appropriately. Either of these seem likely to prevent the issue by ensuring HTML syntax |
@sirreal when logged in as the author user you do not need to see the custom CSS input box, you just need to edit the post content with the existing custom CSS in place that was added when you created the post as the admin user. This bug only occurs if a user without |
Thanks a lot @sirreal, this is great. Could "don’t run KSES on block attribute attrs.style.css" be another option? Above, I was thinking of an allowlist of “non-HTML” attribute paths that are not HTML, e.g. If there are hidden gotchas there...
Maybe @glendaviesnz can answer this: let's say we encode in |
Got it, that worked. I did have to change the post author so that the author role user could edit the post.
That's a way to prevent this issue. The problem is that exceptions like that often create vulnerabilities. If a bad actor knows |
this is going to be a dead-end, because it’s largely not possible to do that. we can build in signals into the storage to communicate it though. for instance, prefix a base64-encoded string. or in that same vein but better, store the attribute as a data URI which says explicitly what the content is. {
"style": {
"css": "data:text/css;base64,eyBjb2xvcjogcmVkOyB9"
}
}for the sake of transparency we can always escape the CSS from any characters that would be “dangerous,” but I think we’ve seen a number of cases where this has gone wrong because downstream code likes to unescape and re-escape, which ends up eliminating the escaping we intentionally applied.
the |
|
Why are these attributes allowed at all for folks without the appropriate capability?
That is incoherent. If they can use the feature, let's show the UI (and make sure it works correctly). Otherwise, they should not be able to add custom CSS at all. I've just confirmed that an <!-- wp:paragraph {"style":{"css":"color: blue"}} -->
<p class="has-custom-css">asdf</p>
<!-- /wp:paragraph -->How about a completely different approach:
|
Apologies, the wording of the ticket and PR was unclear, I have updated it to Before finalising a fix for this, we probably need a decision on whether users without I personally don't see any issue with allowing this user level to edit/add these attributes - seems very different to @ramonjd - any thoughts on the best way to get a decision on that? |
My bag of 2c coins: Here's the scenario I've been working with (I'm using authors as catch-all role for no unfiltered_html permissions):
So they can "edit" it technically, because they can edit the post content, but in the regular editor UI flow they cannot. I get @sirreal's point about incoherency. Following that I see the choice between: A) keep the status quo and deal with CSS integrity preservation/kses mangling when saving the post OR With A, we have the very helpful suggestions from In relation to B, I'm not sure we'd want to add new, potential security holes. Authors can't currently add CSS, so we probably shouldn't let them. Happy to be persuaded on this and all points. As for C, my instinct was that stripping wouldn't be appropriate, because authors by default can't see the custom css field (at least in my testing), so from their point of view they're not editing that attribute at all. And editors might wonder why the custom CSS they created is broken or stripped, BUT I was chatting with @tellthemachines, who made a good point to check the HTML block's behaviour in this regard. Authors can't add CSS/JSS, and
So stripping would be more consistent with that block, and also
Extending authors' permissions, I expect, would be something that needs to be run past the core team and security folks. Honestly, I think the quickest and safest approach right now is the HTML-block/Sirreal™️ approach because:
Optionally, there could be some help text underneath the UI control to tell editors that I say "for now" because maybe there's a better way down the road that preserves permissions and the intentions of admins when they are making changes, and, furthermore, promotes CSS content as generally safe under the right conditions. I threw up a rough PR to help my brain work through this:
|
|
Unfortunately, stripping it is not a good solution for us in our multi-site scenario where the site admins do not have |
I don't have a strong opinion. Maybe block css is the exception and it can be preserved? Also, sorry @sirreal I spelled your name wrong (Jon with an |
|
I want to clarify one thing. An author role user, right now, can access the code editor and write this: <!-- wp:paragraph {"style":{"css":"color: blue"}} -->
<p class="has-custom-css">asdf</p>
<!-- /wp:paragraph -->They have direct access to this feature right now. The can reach this point by editing another user's post, but that's just another example of this. Either the UI should provide access for them, or they should not be allowed to save that |
Thanks for confirming. That was my take away. If authors can already set classnames, and inline styles (via block supports) and save the CSS attributes like you say, then I think that dilutes the value of stripping as an interim patch. Glen's use case suggests it probably isn't. I don't know.
Yeah, so if I'm just trying to get my head around the scope/policy: would we need a real CSS sanitizer (nested selectors, at rules, URLs...) or is the main security concern here Pursuant to the latter, I messed around with skipping KSES and leaning on the Footnotes
|
Yes, that makes no sense. I think the approach @ramonjd suggests is a better solution than deleting existing CSS if edited by a user without |
|
@sirreal, @ramonjd I pushed a change that rather than letting KSES process the CSS and then trying to undo the damage (the previous approach in this PR), the CSS is extracted before KSES runs and sanitized using methods appropriate for CSS. How it worksIn
KSES never sees the CSS string, so it cannot mangle it. If we decide that this is appropriate then I suggest we also add a GB PR that shows the block level CSS input to users without I have not spent any time testing or tidying up the approach, I will worry about that if there is some agreement that this might be a valid approach. |
src/wp-includes/blocks.php
Outdated
| } | ||
|
|
||
| // Strip HTML tags — valid CSS never contains them. | ||
| $css = wp_strip_all_tags( $css ); |
There was a problem hiding this comment.
I might have missed this but I think there were doubts about using this for CSS
There was a problem hiding this comment.
Yeah, the changes are just to push the discussion forward at this stage - if someone has ideas for an alternative, they are welcome change this, but it seems like we should be taking steps to remove anything that is obviously not CSS from those strings - but I do not have an opinion on this.
This is a good point. safecss_filter_attr() checks blocks of property/value pairs, so in order to make it work here we'd probably have to run it on the top-level declarations and then separately on any declarations inside selectors/brackets. But it would be worth doing to ensure the actual CSS isn't dodgy. |
…nitize() and validate() Introduces `WP_CSS_Token_Processor`, a new class in `src/wp-includes/css-api/` modelled after `WP_HTML_Tag_Processor`. It tokenizes a CSS string into a typed token stream and exposes two high-level consumers: - `sanitize(): string` — strips unsafe tokens/rules (injection guard, CDO/CDC, bad tokens, disallowed URL schemes, non-allowlisted at-rules) and returns a safe CSS string. Idempotent: sanitize(sanitize($css)) === sanitize($css). - `validate(): true|WP_Error` — returns true if the CSS is safe, or a WP_Error with a specific error code (css_injection, css_html_comment, css_malformed_token, css_unsafe_url, css_disallowed_at_rule) on the first violation found. The primary motivation is fixing the compounding corruption bug (PR WordPress#11104) where wp_kses() — an HTML sanitizer — was applied to CSS, mangling & and > characters used in CSS nesting selectors on each save for users without unfiltered_html. Security policy: - </style anywhere → sanitize() returns ''; validate() returns css_injection error - url() with javascript:, data:, or non-wp_allowed_protocols() scheme → stripped - @import, @charset, @namespace, unknown at-rules → stripped (safety-first) - bad-url-token, bad-string-token → stripped - CDO/CDC (<!-- / -->) → stripped - Null bytes → stripped in constructor Allowed at-rules: @media, @supports, @Keyframes, @-webkit-keyframes, @layer, @container, @font-face. Also adds low-level navigation (next_token, get_token_type, get_token_value, get_block_depth) and non-destructive modification (remove_token, set_token_value, get_updated_css) APIs, plus get_removed_tokens() for sanitize() introspection. Integration with filter_block_kses_value() in blocks.php is a follow-on PR. Includes: - src/wp-includes/css-api/class-wp-css-token-processor.php (~1,250 lines) - src/wp-includes/css-api/README.md - tests/phpunit/tests/css-api/WpCssTokenProcessorTest.php (67 tests) - tests/phpunit/tests/css-api/WpCssTokenSanitizeTest.php (40 tests) - tests/phpunit/tests/css-api/WpCssTokenValidateTest.php (14 tests + data provider) - docs/plans/2026-03-06-wp-css-token-processor-design.md - docs/plans/2026-03-06-wp-css-token-processor.md Fixes #64771 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nitize() and validate() Introduces `WP_CSS_Token_Processor`, a new class in `src/wp-includes/css-api/` modelled after `WP_HTML_Tag_Processor`. It tokenizes a CSS string into a typed token stream and exposes two high-level consumers: - `sanitize(): string` — strips unsafe tokens/rules (injection guard, CDO/CDC, bad tokens, disallowed URL schemes, non-allowlisted at-rules) and returns a safe CSS string. Idempotent: sanitize(sanitize($css)) === sanitize($css). - `validate(): true|WP_Error` — returns true if the CSS is safe, or a WP_Error with a specific error code (css_injection, css_html_comment, css_malformed_token, css_unsafe_url, css_disallowed_at_rule) on the first violation found. The primary motivation is fixing the compounding corruption bug (PR WordPress#11104) where wp_kses() — an HTML sanitizer — was applied to CSS, mangling & and > characters used in CSS nesting selectors on each save for users without unfiltered_html. Security policy: - </style anywhere → sanitize() returns ''; validate() returns css_injection error - url() with javascript:, data:, or non-wp_allowed_protocols() scheme → stripped - @import, @charset, @namespace, unknown at-rules → stripped (safety-first) - bad-url-token, bad-string-token → stripped - CDO/CDC (<!-- / -->) → stripped - Null bytes → stripped in constructor Allowed at-rules: @media, @supports, @Keyframes, @-webkit-keyframes, @layer, @container, @font-face. Also adds low-level navigation (next_token, get_token_type, get_token_value, get_block_depth) and non-destructive modification (remove_token, set_token_value, get_updated_css) APIs, plus get_removed_tokens() for sanitize() introspection. Integration with filter_block_kses_value() in blocks.php is a follow-on PR. Includes: - src/wp-includes/css-api/class-wp-css-token-processor.php (~1,250 lines) - src/wp-includes/css-api/README.md - tests/phpunit/tests/css-api/WpCssTokenProcessorTest.php (67 tests) - tests/phpunit/tests/css-api/WpCssTokenSanitizeTest.php (40 tests) - tests/phpunit/tests/css-api/WpCssTokenValidateTest.php (14 tests + data provider) - docs/plans/2026-03-06-wp-css-token-processor-design.md - docs/plans/2026-03-06-wp-css-token-processor.md Fixes #64771 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
It doesn't seem like a good idea to me to hide specific attributes from KSES processing. Perhaps if there were a more general sanitization system, but 1-off exceptions to KSES content filtering isn't something I'm in favor of introducing here. I don't think we should introduce a new With the HTML API, anything that's newly developed does not need to worry about whether the contents of a wordpress-develop/src/wp-includes/class-wp-styles.php Lines 335 to 339 in 56a6768
This is not appropriate for sanitizing CSS. It function removes text that looks like tags, and valid CSS certainly may contain things that look like tags. See this ticket for example: @property --animate {
syntax: "<custom-ident>";
inherits: true;
initial-value: false;
}It's also worth considering any of the arbitrary strings that can be used in CSS, for example: div:has(input:first-child)::before {
content: "<text>";
}
ul {
list-style: "<li>";
} |
|
Using an HTML- and KSES-safe encoding for the serialized data seemed promising. Has that been explored?
|
This reverts commit 35eab15.
I looked into this briefly and found that it required some identifiable token in order to flag it as "to-be-decoded" on the way out. Similar to what Dennis spoke of above.
Yes I was overthinking it, sorry. Now that I've had some breakfast I see we could strip the token after kses, before saving. function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) {
// check for closing style tag then encode $block['attrs']['style']['css'] => WP_BLOCK_CUSTOM_CSS_KSES_PREFIX . base64_encode( $css )
$block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block );
// strip WP_BLOCK_CUSTOM_CSS_KSES_PREFIX and decode
// the rest...
} |
|
This commit uses JSON encode as @sirreal suggested. @ramonjd this seems a little less heavy-handed than base64 and matches existing approaches. So, in summary, as the PR stands now: Instead of hiding CSS from KSES or trying to undo KSES damage after the fact, we encode the CSS value using
|
|
I don't think transforming the data before and after KSES runs is the right approach. The system should be responsible for transforming its data without requiring exceptions built into general filters. KSES is designed to prevent problematic data from being stored. When KSES is circumvented, it often leads to things like stored XSS vulnerabilities. It's annoying, but that's kind of the point. The goal is to find ways to store the content to satisfy KSES, not circumvent it. The encoding and decoding of the data should happen in the system using the data. General filters should not have exceptional behavior to deal with specific attributes. The system that is storing the data should encode it on save and decode it for use. @dmsnell shared a good idea for how to indicate the encoding by using a data-uri.
I don't expect to have any CSS API for 7.0, which greatly limits the options. I still think the most reasonable thing at this point is to remove the custom block CSS from the post if a user without the capability tries to save it. |
|
@sirreal, @ramonjd serializeAttributes() already encodes the CSS in the serialised content, but parse_blocks() → json_decode() reverses that before it is passed into KSES, so it looks like base64 is the best option, but does that go against any Gutenberg principles that the block attribute content should be human-readable? I am not aware of any other attributes that you can't make sense of by reading the serialised block content, and I have a vague feeling that this is by design, but I could be wrong. If we are all happy with base64 I will close this PR and open an new Gutenberg PR to make that change. |
|
The essential idea is that the data stored in the attribute is something that KSES leaves alone because it's harmless. Either base64 (more opaque) or serializeAttributes (more readable, but perhaps more surprising) seem well suited to the task. Storing a string that is a base64- or json-encoded (via serializeAttributes) string of CSS text should serve. Note that either or these is an addition encoding of the stored data. Edit: The idea is that
Still contains safely encoded data |
This seems to work for me with base64_decodediff --git a/src/wp-includes/blocks.php b/src/wp-includes/blocks.php
index 89007d0d0d..7e577a4dd0 100644
--- a/src/wp-includes/blocks.php
+++ b/src/wp-includes/blocks.php
@@ -2075,8 +2075,38 @@ function _filter_block_content_callback( $matches ) {
* @return array The filtered and sanitized block object result.
*/
function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) {
+ /*
+ * Per-block custom CSS is not HTML; encode it before KSES (so it is not mangled)
+ * and decode it back to plain CSS immediately after.
+ */
+ if ( isset( $block['attrs']['style']['css'] ) && is_string( $block['attrs']['style']['css'] ) ) {
+ $css = trim( $block['attrs']['style']['css'] );
+ if ( '' !== $css ) {
+ /**
+ * Prefix applied to block custom CSS when base64-encoded for the KSES pass.
+ * The prefix lets wp_decode_block_custom_css_after_kses() recognise values it encoded.
+ */
+ $block['attrs']['style']['css'] = 'data:text/css;base64,' . base64_encode( $css );
+ }
+ }
+
$block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block );
+ if ( isset( $block['attrs']['style']['css'] ) && is_string( $block['attrs']['style']['css'] ) ) {
+ /**
+ * Decodes block custom CSS from base64 back to plain CSS after the KSES pass.
+ *
+ * Only decodes values that start with prefix so we never
+ * attempt to decode CSS that wasn't encoded above.
+ */
+ $css = $block['attrs']['style']['css'];
+ if ( str_starts_with( $css, 'data:text/css;base64,' ) ) {
+ $decoded = base64_decode( substr( $css, strlen( 'data:text/css;base64,' ) ), true );
+ // If the decoded string contains characters from outside the base64 alphabet or a null byte, set the CSS to an empty string.
+ $block['attrs']['style']['css'] = ( false === $decoded || str_contains( $decoded, "\0" ) ) ? '' : $decoded;
+ }
+ }
+
if ( is_array( $block['innerBlocks'] ) ) {
foreach ( $block['innerBlocks'] as $i => $inner_block ) {
$block['innerBlocks'][ $i ] = filter_block_kses( $inner_block, $allowed_html, $allowed_protocols );
Fair enough, I just added the diff above for the record. The motivation was to keep "encode and decode" in PHP and also keep plain CSS in the database (unless I'm reading the feedback incorrectly, the implication is to store the encoded string?). Also to cover all save paths (REST and direct wp_insert_post() calls). |
I don't think this is feasible.
Yes, that's my thinking. Here is my reasoning:
If those hold true, it seems to follow that the data stored in the DB must be encoded in some way that doesn't contain special HTML characters. I have my own lacunas! There may be a completely different way to achieve this, but encoding the data in the attribute seems like a straightforward and relatively simple way to resolve this issue. It's still possible to strip this data for users without the appropriate capability. That's probably the simplest thing to do for WordPress 7.0 which is right around the corner. That wouldn't involve any other changes and would just rely on KSES not operating. That may not be the best solution in the long term. I still think the data-URIs are very interesting: const originalCSS =
`@property --animate {
syntax: "<custom-ident>";
inherits: true;
initial-value: false;
}
div:has(input:first-child)::before {
content: "<text>";
}
ul {
list-style: "<li>";
}`;
const dataURI = `data:text/css,${ encodeURIComponent( originalCSS ) }`;
const parsedCSS = ( await import( dataURI, { with: { type: 'css' } } ) ).default;
const reconstructedCSS = Object.values( parsedCSS.cssRules ).map( rule => rule.cssText ).join( '\n' );The result in @property --animate { syntax: "<custom-ident>"; inherits: true; initial-value: false; }
div:has(input:first-child)::before { content: "<text>"; }
ul { list-style: "<li>"; }Of course, I think the style system will need to handle decoding the data-URI in PHP, but the native browser APIs in this space are exciting. |
Actually, PHP seems to handle base64 or %-encoded data URIs just fine in my testing: <?php
$c = file_get_contents( 'data:text/css;base64,QHByb3BlcnR5IC0tYW5pbWF0ZSB7CiAgc3ludGF4OiAiPGN1c3RvbS1pZGVudD4iOwogIGluaGVyaXRzOiB0cnVlOwogIGluaXRpYWwtdmFsdWU6IGZhbHNlOwp9CmRpdjpoYXMoaW5wdXQ6Zmlyc3QtY2hpbGQpOjpiZWZvcmUgewogIGNvbnRlbnQ6ICI8dGV4dD4iOwp9CnVsIHsKICBsaXN0LXN0eWxlOiAiPGxpPiI7Cn0=' );
$c2 = file_get_contents( 'data:text/css,%40property%20--animate%20%7B%0A%20%20syntax%3A%20%22%3Ccustom-ident%3E%22%3B%0A%20%20inherits%3A%20true%3B%0A%20%20initial-value%3A%20false%3B%0A%7D%0Adiv%3Ahas(input%3Afirst-child)%3A%3Abefore%20%7B%0A%20%20content%3A%20%22%3Ctext%3E%22%3B%0A%7D%0Aul%20%7B%0A%20%20list-style%3A%20%22%3Cli%3E%22%3B%0A%7D' );
echo "{$c}\n{$c2}"; |
I see, thanks for laying that out. 🍺 I was thinking about how folks reading content programmatically might need to handle this, e.g., Or we make it part of the contract that they have to decode or run the raw post content through parse blocks anyway. I don't really know either. What smells to me though is that we're dancing around KSES's limitations for a compromise. For WP 7.0, and assuming there is some time pressure, maybe storing as a data URI is pragmatic. Later, if there's some CSS API™️ that leaves no fingerprints on the stored format, we'd have to support both formats forever. |
|
I am going to close this PR as if taking the suggested approach, this is going to need a Gutenberg fix. |
It can be liberating and helpful to start with the primary thing: we’re attempting to make it possible for folks to add CSS to their blocks individually. To me, the most important “Gutenberg principle” is that the feature works for the editor and doesn’t lose their trust by randomly breaking without explanation. I don’t want people to spend hours working on a design, have someone else look at it who saves the post, maybe because they correct some unrelated typo, and now the page is broken on render and that person who invested their time has proverbial “egg on their face.” To that end I want to be open-minded about what technical challenges present themselves as obstacles to that goal. And granted, I think you both are raising awesome and exceptional points: having human-readable plaintext is the most open and transparent and interoperable thing we can reach for. but unfortunately I think the situation we find ourselves in is a huge quagmire with no easy way to achieve all those goals. we can also note that plenty of Core blocks already hide their content from
There might be misunderstanding on what @sirreal is saying. KSES has its issues, but in fairness, it’s designed to “sanitize” HTML, not CSS. We had a long call yesterday to discuss the entire concept of CSS sanitization and it seems even more nebulous to discuss than HTML sanitization anyway. There’s a reason we don’t run The database stores HTML and received kind of broad blanket HTML-based linting/sanitization. The CSS needs to run through policy-based transformations (sanitization), but the only pragmatic tool we may have is to do that at the point of demarcation between the raw CSS string and when it is rendered back into an HTML document into either a STYLE element or
This is a compelling reason to think about prefixing the content with some kind of indicator or sigil, such as the data URI prefix. At least with a data URI the content is somewhat self-descriptive and people can relatively easily recover the source content. Source code blocks and the HTML block have this same problem: WordPress transformations are complicated and make it hard to find any way to get it to save what you type: KSES in point. Just wanted to share some thought in summary on this. Everyone is doing a great job, this is a difficult and common challenge, so thank you for working on it. For the rushed timeline it may be worth compromises in scope rather than security. The best perk of limiting scope now is that we can announce enhancements in the future rather than apologize for shipping things that are broken. It’s generally easier to accept limitations if they are communicated and then liberated than if those boundaries aren’t knowable until you cross them. |
|
Thanks for helping me understand the issues better, Dennis!
Ultimately I'll defer to Jon's and your wisdom on this matter - I've been reflecting, and I believe the kink in my thinking stemmed from considering all I also get that the surface of CSS processing in WP is a heck of a lot larger than the problem we're facing here, so I appreciate you folks stepping back and reasoning about it from wider design perspective. |


Trac ticket: https://core.trac.wordpress.org/ticket/64771
Summary
WordPress/gutenberg#73959 introduced block-level custom CSS. Everything works as expected unless a user without unfiltered_html edits a page/post with existing block-level custom CSS that includes nested selectors, eg.
This PR fixes double-encoding of HTML entities in per-block custom CSS (
attrs.style.css) when a user without theunfiltered_htmlcapability saves a post that includes block-level custom CSS with nested selectors.The problem
When a user without
unfiltered_html(e.g. an Author) saves a block with custom CSS containing&(CSS nesting selector) or>(child combinator), thefilter_block_content()pipeline corrupts these characters through double-encoding:parse_blocks()—json_decode()decodes\u0026→&filter_block_kses_value()—wp_kses()treats the CSS string as HTML and encodes&→&,>→>serialize_block_attributes()—json_encode()encodes the&in&→\u0026amp;The result is
\u0026amp;in post_content instead of the original\u0026. On the next editor load,json_decode()produces the literal string&instead of&, so the CSS textarea displays corrupted values like&and>. Each subsequent save compounds the corruption further.The fix
After KSES has run on block attributes (and stripped any dangerous HTML tags), decode the specific named entities it introduced in the
style.cssattribute. HTML entities are invalid in CSS, so KSES should not have introduced them.This PR adds:
undo_block_custom_css_kses_entities()— a new function that reverses only the 4 specific named entities thatwp_kses()may introduce (&,>,",'). This is intentionally narrower thanwp_specialchars_decode()to avoid decoding numeric/hex references that KSES may have intentionally preserved.A call in
filter_block_kses()— afterfilter_block_kses_value()has processed all attributes, ifattrs.style.cssexists, it is passed through the decode function before the block is returned for serialization.Why this is safe
<) are affected<is intentionally excluded — KSES strips bare<entirely rather than encoding it, so<in the output would indicate it was already present in the inputattrs.style.cssonly — other block attributes remain entity-encoded as expectedTest steps
Setup
unfiltered_htmlcapability)Test 1: CSS nesting selector (
&)color: blue; & p { color: red; }unfiltered_html, eg.authorand edit the paragraph, eg. make part of string italic. Note you will not see the custom CSS input box when logged in as this user. Just edit the existing paragraph in the post content and save.color: blue; & .child { color: red; }(unchanged)color: blue; & .child { color: red; }Test 2: Child combinator (
>)& > p { margin: 0; }& > p { margin: 0; }(unchanged)& > p { margin: 0; }Test 3: Idempotent saves
Test 4: Frontend rendering
Test 5: Non-CSS attributes are unaffected
&(e.g.foo&bar)attrs.style.cssis decodedThis Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.