Skip to content

Fix problem with encoding of css entities when post with existing block level custom css is edited by user without unfiltered_html#11104

Closed
glendaviesnz wants to merge 5 commits intoWordPress:trunkfrom
glendaviesnz:fix/block-custom-css-bug
Closed

Fix problem with encoding of css entities when post with existing block level custom css is edited by user without unfiltered_html#11104
glendaviesnz wants to merge 5 commits intoWordPress:trunkfrom
glendaviesnz:fix/block-custom-css-bug

Conversation

@glendaviesnz
Copy link

@glendaviesnz glendaviesnz commented Mar 1, 2026

Trac ticket: https://core.trac.wordpress.org/ticket/64771

Summary

WordPress/gutenberg#73959 introduced block-level custom CSS. Everything works as expected unless a user without unfiltered_html edits a page/post with existing block-level custom CSS that includes nested selectors, eg.

color: green;
& p {color: blue}

This PR fixes double-encoding of HTML entities in per-block custom CSS (attrs.style.css) when a user without the unfiltered_html capability saves a post that includes block-level custom CSS with nested selectors.

The problem

When a user without unfiltered_html (e.g. an Author) saves a block with custom CSS containing & (CSS nesting selector) or > (child combinator), the filter_block_content() pipeline corrupts these characters through double-encoding:

  1. parse_blocks()json_decode() decodes \u0026&
  2. filter_block_kses_value()wp_kses() treats the CSS string as HTML and encodes &&, >>
  3. serialize_block_attributes()json_encode() encodes the & in &\u0026amp;

The result is \u0026amp; in post_content instead of the original \u0026. On the next editor load, json_decode() produces the literal string & instead of &, so the CSS textarea displays corrupted values like & and >. Each subsequent save compounds the corruption further.

The fix

After KSES has run on block attributes (and stripped any dangerous HTML tags), decode the specific named entities it introduced in the style.css attribute. HTML entities are invalid in CSS, so KSES should not have introduced them.

This PR adds:

  1. undo_block_custom_css_kses_entities() — a new function that reverses only the 4 specific named entities that wp_kses() may introduce (&, >, ", '). This is intentionally narrower than wp_specialchars_decode() to avoid decoding numeric/hex references that KSES may have intentionally preserved.

  2. A call in filter_block_kses() — after filter_block_kses_value() has processed all attributes, if attrs.style.css exists, it is passed through the decode function before the block is returned for serialization.

Why this is safe

  • KSES runs first — any actual HTML tags in the CSS value are already stripped before we decode entities
  • Only 4 specific named entities are decoded — no numeric/hex character references (e.g. <) are affected
  • &lt; is intentionally excluded — KSES strips bare < entirely rather than encoding it, so &lt; in the output would indicate it was already present in the input
  • Scoped to attrs.style.css only — other block attributes remain entity-encoded as expected

Test steps

Setup

  1. Create a test user with the Author role (no unfiltered_html capability)
  2. Log in as that Author

Test 1: CSS nesting selector (&)

  1. Create a new post as an admin user
  2. Add a Group block with a nested Paragraph block
  3. Open the block's Advanced panel → Additional CSS textarea
  4. Enter: color: blue; & p { color: red; }
  5. Save the post
  6. Log in as a user without unfiltered_html, eg. author and edit the paragraph, eg. make part of string italic. Note you will not see the custom CSS input box when logged in as this user. Just edit the existing paragraph in the post content and save.
  7. Save again and then log back in as an admin user
  8. Open the Additional CSS textarea again
  9. Expected: CSS shows color: blue; & .child { color: red; } (unchanged)
  10. Before fix: CSS shows color: blue; &amp; .child { color: red; }

Test 2: Child combinator (>)

  1. Follow the same flow as above with admin and author users, but this time in the same or a new block, enter CSS: & > p { margin: 0; }
  2. Save and reload
  3. Expected: CSS shows & > p { margin: 0; } (unchanged)
  4. Before fix: CSS shows &amp; &gt; p { margin: 0; }

Test 3: Idempotent saves

  1. With the CSS from Test 1 or 2, save the post 3-4 times, with admin and author users reloading between each save
  2. Expected: The CSS remains identical after every save — no progressive corruption

Test 4: Frontend rendering

  1. View the post on the frontend
  2. Expected: The custom CSS is applied correctly (e.g. child elements styled as specified)

Test 5: Non-CSS attributes are unaffected

  1. As the Author, add a Paragraph block
  2. In the Advanced panel, set the Additional CSS class(es) to something containing & (e.g. foo&bar)
  3. Save and reload
  4. Expected: The className attribute is still processed by KSES as before (entity-encoded) — only attrs.style.css is decoded

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

…custom css is edited by user without unfiltered_html
@glendaviesnz glendaviesnz self-assigned this Mar 1, 2026
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props glendaviesnz, ramonopoly, jonsurrell, dmsnell, isabel_brison.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@glendaviesnz
Copy link
Author

FYI - I will add tests for this once there is confirmation that this is the correct approach for fixing this bug. There may be a better solution. If there is feel free to close this PR and open an alternative.


return str_replace(
array( '&amp;', '&gt;', '&quot;', '&#039;' ),
array( '&', '>', '"', "'" ),
Copy link
Author

@glendaviesnz glendaviesnz Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not doubt need to account for other values here, but we can work out exactly what needs to be covered once there is some agreement on the best way to solve this problem - I imagine there will be a smarter solution - so didn't spend too much time finessing this one.

@@ -2077,6 +2077,17 @@ function _filter_block_content_callback( $matches ) {
function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) {
$block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block );
Copy link
Member

@ramonjd ramonjd Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting up a fix!

I was wondering: what is the important sanitization step for block CSS attributes?

wp_kses() treats the CSS string as HTML. But should it?

I'm wondering if an alternative solution would be to target the css attribute and run it through wp_strip_all_tags rather than wp_kses.

Or running through something similar (or reuse this same validation in a helper) that @sirreal and @dmsnell worked on in WP_REST_Global_Styles_Controller::validate_custom_css() for https://core.trac.wordpress.org/ticket/64418

There, for users without unfiltered_html, & and > in block custom CSS were being double-encoded by KSES + JSON, so the CSS broke.

(Sorry Jon and Dennis - you've become my default go-to brains trust for this stuff 😄 )

@glendaviesnz
Copy link
Author

glendaviesnz commented Mar 2, 2026

Related: https://core.trac.wordpress.org/changeset/61486 / #10641 fixed the same class of KSES-mangling issue for Global Styles custom CSS by pre-escaping the JSON with JSON_HEX_TAG | JSON_HEX_AMP. This PR addresses the same problem for per-block custom CSS (attrs.style.css), which goes through the separate filter_block_kses() pipeline. I don't think the same pre-escaping approach can work in this case, as parse_blocks() calls json_decode() on the entire attributes object, which converts \u0026 back to & before KSES runs - but I do not know a lot about these flows, and don't have time to look closer as currenlty travelling - so could be completely wrong about this.

@sirreal
Copy link
Member

sirreal commented Mar 2, 2026

I had some trouble reproducing the issue.

In order to test this, I had to enable a recent version of the Gutenberg plugin. The individual block CSS feature is not yet available in Core yet, is it?

When I used an author role, I don't see the additional CSS panel for blocks. When I used an editor role, I was unable to reproduce the issue because it seems to have unfiltered_html capability already.

Am I doing something wrong in the reproduction steps?


Some thoughts based on the issue:

  • KSES is unsuitable for processing data that is not HTML. This type of issue appears again and again.
  • The KSES filters were added in r46896 / 7c38cf1. That seems to be a security fix with limited public information.
  • I wish we could avoid applying KSES (HTML) filtering everywhere, but that seems unlikely at this time.
  • I see "double-encoding" mentioned a few times. That doesn't seem accurate given the description. CSS text that will be used in rawtext STYLE (where HTML character references like &amp; are not used) has had HTML character reference escaping applied to it. &amp; appears in the CSS text because the character references will never be decoded in this context. It seems more accurate to talk about "mis-encoded" or even just "mangled." This is akin to applying any other unsuitable escaping mechanism.

I was very happy with the solution in r61486. Post content exclusively contained JSON which has some flexibility in escaping. JSON can be made to be plain HTML text by escaping HTML syntax characters (<>&). KSES ignores this. This approach escapes data before KSES can mangle it.

This PR tries to recover after KSES has mangled the data. That seems inherently more risky. It should be possible to decode HTML character references (like done here) but what if KSES starts to remove things that look like tags? What if I'd like to use content: '<data> here';? (KSES will likely strip <data> from this).

Another option is to protect the data before KSES can mangle it by encoding it ourselves in an HTML-text safe way. A few quick options come to mind:

Of course, before using the value it will need to be decoded appropriately. Either of these seem likely to prevent the issue by ensuring HTML syntax <>& are not present, so KSES should not take any action.

@glendaviesnz
Copy link
Author

I had some trouble reproducing the issue.
In order to test this, I had to enable a recent version of the Gutenberg plugin. The individual block CSS feature is not yet available in Core yet, is it?
When I used an author role, I don't see the additional CSS panel for blocks. When I used an editor role, I was unable to reproduce the issue because it seems to have unfiltered_html capability already.
Am I doing something wrong in the reproduction steps?

@sirreal when logged in as the author user you do not need to see the custom CSS input box, you just need to edit the post content with the existing custom CSS in place that was added when you created the post as the admin user. This bug only occurs if a user without unfiltered_html edits a post that had block level customCSS add by a higher level user.

@ramonjd
Copy link
Member

ramonjd commented Mar 3, 2026

A few quick options come to mind:

Thanks a lot @sirreal, this is great.

Could "don’t run KSES on block attribute attrs.style.css" be another option?

Above, I was thinking of an allowlist of “non-HTML” attribute paths that are not HTML, e.g. ['css'], and in filter_block_kses_value(), when the current path is in that list, use some other sanitizer, e.g. wp_strip_all_tags or a variant of it

If there are hidden gotchas there...

protect the data before KSES can mangle it by encoding it ourselves in an HTML-text safe way

Maybe @glendaviesnz can answer this: let's say we encode in filter_block_kses (and decode when output in custom-css.php), do we need to worry about backwards compat at all? For example, for folks that have already used this feature in the plugin or elsewhere, would we have to infer “is this encoded or plain?”

@sirreal
Copy link
Member

sirreal commented Mar 3, 2026

when logged in as the author user you… need to edit the post content with the existing custom CSS in place

Got it, that worked. I did have to change the post author so that the author role user could edit the post.

Could "don’t run KSES on block attribute attrs.style.css" be another option?

That's a way to prevent this issue. The problem is that exceptions like that often create vulnerabilities. If a bad actor knows attrs.style.css will not be sanitized, they can often find a way to abuse it.

@dmsnell
Copy link
Member

dmsnell commented Mar 3, 2026

would we have to infer “is this encoded or plain?”

this is going to be a dead-end, because it’s largely not possible to do that. we can build in signals into the storage to communicate it though. for instance, prefix a base64-encoded string.

or in that same vein but better, store the attribute as a data URI which says explicitly what the content is.

{
	"style": {
		"css": "data:text/css;base64,eyBjb2xvcjogcmVkOyB9"
	}	
}

for the sake of transparency we can always escape the CSS from any characters that would be “dangerous,” but I think we’ve seen a number of cases where this has gone wrong because downstream code likes to unescape and re-escape, which ends up eliminating the escaping we intentionally applied.

wp_strip_all_tags

wp_strip_all_tags() is never going to be appropriate for CSS, but CSS should still go through some process like KSES, which is what functions like safecss_filter_attr() are for. there are rules applied to things like URLs inside of CSS declarations which WordPress will want to apply.


the $context parameter of filter_block_kses_value() offers a potential place to raise the bar on CSS handling. if we had a sentinel value indicating that the attribute is supposed to be CSS we could apply more appropriate sanitization, but we would want to make sure we don’t make it easy for people to set that context from user-supplied inputs.

@sirreal
Copy link
Member

sirreal commented Mar 4, 2026

Why are these attributes allowed at all for folks without the appropriate capability?

  • The panel is not displayed for those users, suggesting that the intention is to prevent them from adding the CSS.
  • This PR and ticket 64771 indicate that they do have access to author the CSS.

That is incoherent. If they can use the feature, let's show the UI (and make sure it works correctly). Otherwise, they should not be able to add custom CSS at all.

I've just confirmed that an author can add this and access the feature.

<!-- wp:paragraph {"style":{"css":"color: blue"}} -->
<p class="has-custom-css">asdf</p>
<!-- /wp:paragraph -->

How about a completely different approach:

  • Strip an individual block's custom CSS for users without the correct capability.
  • Display a warning in the editor on blocks that have custom CSS for users that will cause the custom CSS to be lost. "Warning: This block contains custom CSS and you do not have the appropriate capability. If you update this post, the custom CSS will be removed." (or something along those lines).

@glendaviesnz glendaviesnz changed the title Fix problem with encoding of css entities when post with block level custom css is edited by user without unfiltered_html Fix problem with encoding of css entities when post with existing block level custom css is edited by user without unfiltered_html Mar 4, 2026
@glendaviesnz
Copy link
Author

glendaviesnz commented Mar 4, 2026

This PR and ticket 64771 indicate that they do have access to author the CSS.
That is incoherent.

Apologies, the wording of the ticket and PR was unclear, I have updated it to
Existing block level custom CSS in a post breaks when the post is edited by user without unfiltered_html to make it clearer.

Before finalising a fix for this, we probably need a decision on whether users without unfiltered_html should or should not be able to add/edit block-level custom CSS, eg. just show them the box if they can edit/add and fix the KSES issue, or just strip them and add the warning as @sirreal suggests.

I personally don't see any issue with allowing this user level to edit/add these attributes - seems very different to unfiltered_html to me - but this is not my area of expertise.

@ramonjd - any thoughts on the best way to get a decision on that?

@ramonjd
Copy link
Member

ramonjd commented Mar 5, 2026

Before finalising a fix for this, we probably need a decision on whether users without unfiltered_html should or should not be able to add/edit block-level custom CSS

My bag of 2c coins:

Here's the scenario I've been working with (I'm using authors as catch-all role for no unfiltered_html permissions):

  1. An author creates a post called Y. Nice.
  2. An admin/editor (or anyone with unfiltered_html permissions) logs in and edits post Y, adding custom CSS to a block.
  3. Our author returns and edits anything in post Y (not custom CSS), then saves the post.
  4. Custom CSS is mangled silently.

So they can "edit" it technically, because they can edit the post content, but in the regular editor UI flow they cannot. I get @sirreal's point about incoherency.

Following that I see the choice between:

A) keep the status quo and deal with CSS integrity preservation/kses mangling when saving the post OR
B) opening up custom CSS to users without unfiltered_html permissions OR
C) stripping with a warning (a general rule for life!)

With A, we have the very helpful suggestions from John Jon, Dennis and folks.

In relation to B, I'm not sure we'd want to add new, potential security holes. Authors can't currently add CSS, so we probably shouldn't let them. Happy to be persuaded on this and all points.

As for C, my instinct was that stripping wouldn't be appropriate, because authors by default can't see the custom css field (at least in my testing), so from their point of view they're not editing that attribute at all. And editors might wonder why the custom CSS they created is broken or stripped, BUT I was chatting with @tellthemachines, who made a good point to check the HTML block's behaviour in this regard.

Authors can't add CSS/JSS, and <style> tags are stripped. Any subsequent changes made by editors to the same block will be flagged in the editor the next time an author attempts to save the post:

Screenshot 2026-03-05 at 10 49 17 am

So stripping would be more consistent with that block, and also JohnJon's "alternative" approach. The only difference is that it's not immediately visible to the author (without some sort of warning).

any thoughts on the best way to get a decision on that?

Extending authors' permissions, I expect, would be something that needs to be run past the core team and security folks.

Honestly, I think the quickest and safest approach right now is the HTML-block/Sirreal™️ approach because:

  • it's consistent with existing flows (HTML block)
  • it maintains current security/permission arrangements

Optionally, there could be some help text underneath the UI control to tell editors that

I say "for now" because maybe there's a better way down the road that preserves permissions and the intentions of admins when they are making changes, and, furthermore, promotes CSS content as generally safe under the right conditions.

I threw up a rough PR to help my brain work through this:

Screenshot 2026-03-05 at 12 32 59 pm

@glendaviesnz
Copy link
Author

glendaviesnz commented Mar 5, 2026

Unfortunately, stripping it is not a good solution for us in our multi-site scenario where the site admins do not have unfiltered_html permissions, and the block custom CSS was going to be generated by AI agents. We might just have to abandon the plans we had for this if stripping it is the only option for now.

@ramonjd
Copy link
Member

ramonjd commented Mar 5, 2026

We might just have to abandon the plans we had for this if stripping it is the only option for now.

I don't have a strong opinion. Maybe block css is the exception and it can be preserved?

Also, sorry @sirreal I spelled your name wrong (Jon with an h).

@sirreal
Copy link
Member

sirreal commented Mar 5, 2026

I want to clarify one thing. An author role user, right now, can access the code editor and write this:

<!-- wp:paragraph {"style":{"css":"color: blue"}} -->
<p class="has-custom-css">asdf</p>
<!-- /wp:paragraph -->

They have direct access to this feature right now. The can reach this point by editing another user's post, but that's just another example of this.

Either the UI should provide access for them, or they should not be allowed to save that css attribute in the post content. That's what I see as incoherent.

@ramonjd
Copy link
Member

ramonjd commented Mar 6, 2026

Either the UI should provide access for them, or they should not be allowed to save that css attribute in the post content. That's what I see as incoherent.

Thanks for confirming. That was my take away.

If authors can already set classnames, and inline styles (via block supports) and save the CSS attributes like you say, then I think that dilutes the value of stripping as an interim patch. Glen's use case suggests it probably isn't. I don't know.

CSS should still go through some process like KSES

Yeah, so if unfiltered_html is the wrong capability gate for CSS, and if KSES is the wrong sanitizer for CSS, then maybe a CSS-aware processor1 or wp_kses_css_block() is required.

I'm just trying to get my head around the scope/policy: would we need a real CSS sanitizer (nested selectors, at rules, URLs...) or is the main security concern here </style> injection?

Pursuant to the latter, I messed around with skipping KSES and leaning on the validate_custom_css work done previously in https://core.trac.wordpress.org/changeset/61486. It's pretty bent, but I just wanted to try it.

Footnotes

  1. 😉😉😉😉😉

@glendaviesnz
Copy link
Author

Either the UI should provide access for them, or they should not be allowed to save that css attribute in the post
content. That's what I see as incoherent.

Yes, that makes no sense. I think the approach @ramonjd suggests is a better solution than deleting existing CSS if edited by a user without unfiltered_html

@glendaviesnz
Copy link
Author

@sirreal, @ramonjd I pushed a change that rather than letting KSES process the CSS and then trying to undo the damage (the previous approach in this PR), the CSS is extracted before KSES runs and sanitized using methods appropriate for CSS.

How it works

In filter_block_kses():

  1. Extract $block['attrs']['style']['css'] and temporarily unset it
  2. Run KSES on the remaining block attributes as normal
  3. Sanitize the extracted CSS using wp_sanitize_block_custom_css(), which applies:
    • wp_strip_all_tags() — valid CSS never contains HTML tags
    • wp_validate_css_for_style_element() — rejects CSS containing </style> (or partial prefixes), preventing breakout from the <style> element
  4. Reinsert the sanitized CSS into the block attributes

KSES never sees the CSS string, so it cannot mangle it. If we decide that this is appropriate then I suggest we also add a GB PR that shows the block level CSS input to users without unfiltered_html.

I have not spent any time testing or tidying up the approach, I will worry about that if there is some agreement that this might be a valid approach.

}

// Strip HTML tags — valid CSS never contains them.
$css = wp_strip_all_tags( $css );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed this but I think there were doubts about using this for CSS

#11104 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the changes are just to push the discussion forward at this stage - if someone has ideas for an alternative, they are welcome change this, but it seems like we should be taking steps to remove anything that is obviously not CSS from those strings - but I do not have an opinion on this.

@tellthemachines
Copy link
Contributor

CSS should still go through some process like KSES, which is what functions like safecss_filter_attr() are for. there are rules applied to things like URLs inside of CSS declarations which WordPress will want to apply.

This is a good point. safecss_filter_attr() checks blocks of property/value pairs, so in order to make it work here we'd probably have to run it on the top-level declarations and then separately on any declarations inside selectors/brackets. But it would be worth doing to ensure the actual CSS isn't dodgy.

ramonjd added a commit to ramonjd/wordpress-develop that referenced this pull request Mar 9, 2026
…nitize() and validate()

Introduces `WP_CSS_Token_Processor`, a new class in `src/wp-includes/css-api/`
modelled after `WP_HTML_Tag_Processor`. It tokenizes a CSS string into a typed
token stream and exposes two high-level consumers:

- `sanitize(): string` — strips unsafe tokens/rules (injection guard, CDO/CDC,
  bad tokens, disallowed URL schemes, non-allowlisted at-rules) and returns a
  safe CSS string. Idempotent: sanitize(sanitize($css)) === sanitize($css).

- `validate(): true|WP_Error` — returns true if the CSS is safe, or a WP_Error
  with a specific error code (css_injection, css_html_comment, css_malformed_token,
  css_unsafe_url, css_disallowed_at_rule) on the first violation found.

The primary motivation is fixing the compounding corruption bug (PR WordPress#11104) where
wp_kses() — an HTML sanitizer — was applied to CSS, mangling & and > characters
used in CSS nesting selectors on each save for users without unfiltered_html.

Security policy:
- </style anywhere → sanitize() returns ''; validate() returns css_injection error
- url() with javascript:, data:, or non-wp_allowed_protocols() scheme → stripped
- @import, @charset, @namespace, unknown at-rules → stripped (safety-first)
- bad-url-token, bad-string-token → stripped
- CDO/CDC (<!-- / -->) → stripped
- Null bytes → stripped in constructor

Allowed at-rules: @media, @supports, @Keyframes, @-webkit-keyframes, @layer,
@container, @font-face.

Also adds low-level navigation (next_token, get_token_type, get_token_value,
get_block_depth) and non-destructive modification (remove_token, set_token_value,
get_updated_css) APIs, plus get_removed_tokens() for sanitize() introspection.

Integration with filter_block_kses_value() in blocks.php is a follow-on PR.

Includes:
- src/wp-includes/css-api/class-wp-css-token-processor.php (~1,250 lines)
- src/wp-includes/css-api/README.md
- tests/phpunit/tests/css-api/WpCssTokenProcessorTest.php (67 tests)
- tests/phpunit/tests/css-api/WpCssTokenSanitizeTest.php (40 tests)
- tests/phpunit/tests/css-api/WpCssTokenValidateTest.php (14 tests + data provider)
- docs/plans/2026-03-06-wp-css-token-processor-design.md
- docs/plans/2026-03-06-wp-css-token-processor.md

Fixes #64771

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ramonjd added a commit to ramonjd/wordpress-develop that referenced this pull request Mar 9, 2026
…nitize() and validate()

Introduces `WP_CSS_Token_Processor`, a new class in `src/wp-includes/css-api/`
modelled after `WP_HTML_Tag_Processor`. It tokenizes a CSS string into a typed
token stream and exposes two high-level consumers:

- `sanitize(): string` — strips unsafe tokens/rules (injection guard, CDO/CDC,
  bad tokens, disallowed URL schemes, non-allowlisted at-rules) and returns a
  safe CSS string. Idempotent: sanitize(sanitize($css)) === sanitize($css).

- `validate(): true|WP_Error` — returns true if the CSS is safe, or a WP_Error
  with a specific error code (css_injection, css_html_comment, css_malformed_token,
  css_unsafe_url, css_disallowed_at_rule) on the first violation found.

The primary motivation is fixing the compounding corruption bug (PR WordPress#11104) where
wp_kses() — an HTML sanitizer — was applied to CSS, mangling & and > characters
used in CSS nesting selectors on each save for users without unfiltered_html.

Security policy:
- </style anywhere → sanitize() returns ''; validate() returns css_injection error
- url() with javascript:, data:, or non-wp_allowed_protocols() scheme → stripped
- @import, @charset, @namespace, unknown at-rules → stripped (safety-first)
- bad-url-token, bad-string-token → stripped
- CDO/CDC (<!-- / -->) → stripped
- Null bytes → stripped in constructor

Allowed at-rules: @media, @supports, @Keyframes, @-webkit-keyframes, @layer,
@container, @font-face.

Also adds low-level navigation (next_token, get_token_type, get_token_value,
get_block_depth) and non-destructive modification (remove_token, set_token_value,
get_updated_css) APIs, plus get_removed_tokens() for sanitize() introspection.

Integration with filter_block_kses_value() in blocks.php is a follow-on PR.

Includes:
- src/wp-includes/css-api/class-wp-css-token-processor.php (~1,250 lines)
- src/wp-includes/css-api/README.md
- tests/phpunit/tests/css-api/WpCssTokenProcessorTest.php (67 tests)
- tests/phpunit/tests/css-api/WpCssTokenSanitizeTest.php (40 tests)
- tests/phpunit/tests/css-api/WpCssTokenValidateTest.php (14 tests + data provider)
- docs/plans/2026-03-06-wp-css-token-processor-design.md
- docs/plans/2026-03-06-wp-css-token-processor.md

Fixes #64771

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sirreal
Copy link
Member

sirreal commented Mar 9, 2026

It doesn't seem like a good idea to me to hide specific attributes from KSES processing. Perhaps if there were a more general sanitization system, but 1-off exceptions to KSES content filtering isn't something I'm in favor of introducing here.

I don't think we should introduce a new wp_validate_css_for_style_element() function. There's some duplication, but I don't think this function should exist or be used in new features. The only reason the logic exists was because it was considered risky to allow </style> in those pre-existing systems.

With the HTML API, anything that's newly developed does not need to worry about whether the contents of a <style> tag are safe. This will produce a safe style tag with arbitrary contents:

$processor = new WP_HTML_Tag_Processor( '<style></style>' );
$processor->next_tag();
$processor->set_attribute( 'id', "{$handle}-inline-css" );
$processor->set_modifiable_text( "\n{$output}\n" );
echo "{$processor->get_updated_html()}\n";


wp_strip_all_tags() — valid CSS never contains HTML tags

This is not appropriate for sanitizing CSS. It function removes text that looks like tags, and valid CSS certainly may contain things that look like tags. See this ticket for example:

@property --animate {
  syntax: "<custom-ident>";
  inherits: true;
  initial-value: false;
}

It's also worth considering any of the arbitrary strings that can be used in CSS, for example:

div:has(input:first-child)::before {
  content: "<text>";
}
ul {
  list-style: "<li>";
}

@sirreal
Copy link
Member

sirreal commented Mar 9, 2026

Using an HTML- and KSES-safe encoding for the serialized data seemed promising. Has that been explored?

Another option is to protect the data before KSES can mangle it by encoding it ourselves in an HTML-text safe way. A few quick options come to mind:

@ramonjd
Copy link
Member

ramonjd commented Mar 9, 2026

base64 the CSS string

I looked into this briefly and found that it required some identifiable token in order to flag it as "to-be-decoded" on the way out. Similar to what Dennis spoke of above.

I baulked at this initially for the reason that, once it's there, we'll have to support it forever even if some CSS processor comes along down the track to replace it. Maybe I'm overthinking it.

Yes I was overthinking it, sorry. Now that I've had some breakfast I see we could strip the token after kses, before saving.

function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) {
	// check for closing style tag then encode $block['attrs']['style']['css'] => WP_BLOCK_CUSTOM_CSS_KSES_PREFIX . base64_encode( $css )

	$block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block );

	// strip WP_BLOCK_CUSTOM_CSS_KSES_PREFIX and decode
   
    // the rest...
}

@glendaviesnz
Copy link
Author

glendaviesnz commented Mar 10, 2026

This commit uses JSON encode as @sirreal suggested. @ramonjd this seems a little less heavy-handed than base64 and matches existing approaches.

So, in summary, as the PR stands now:

Instead of hiding CSS from KSES or trying to undo KSES damage after the fact, we encode the CSS value using wp_json_encode() with JSON_HEX_TAG | JSON_HEX_AMP before KSES runs. This converts <, >, and & into JSON unicode escapes (\u003C, \u003E, \u0026) that KSES passes through untouched. After KSES, we json_decode() them back to the original characters.

src/wp-includes/blocks.phpfilter_block_kses()

  • Before KSES: encode attrs.style.css via wp_json_encode($css, JSON_HEX_TAG | JSON_HEX_AMP)
  • After KSES: decode via json_decode()
  • Removed undo_block_custom_css_kses_entities() — no longer needed

src/wp-includes/block-supports/custom-css.phpwp_render_custom_css_support_styles()

  • Removed preg_match( '#</?\w+#', $custom_css ) check — this was overly broad and rejected valid CSS (e.g. @property syntax with <custom-ident>, or content: "<text>"). The HTML API's set_modifiable_text() already handles safe <style> output, making this input-side check unnecessary for new features (per sirreal's feedback).

I haven't done any manual testing, or rewritten the tests to cover this yet. I will worry about that if there is agreement on this approach.

This, of course, only prevents the existing KSES from mangling the CSS attribute, it does not address effective sanitisation of the CSS string. What thoughts do people have about how to address that for 7.0?

@sirreal
Copy link
Member

sirreal commented Mar 10, 2026

I don't think transforming the data before and after KSES runs is the right approach. The system should be responsible for transforming its data without requiring exceptions built into general filters.

KSES is designed to prevent problematic data from being stored. When KSES is circumvented, it often leads to things like stored XSS vulnerabilities. It's annoying, but that's kind of the point. The goal is to find ways to store the content to satisfy KSES, not circumvent it.

The encoding and decoding of the data should happen in the system using the data. General filters should not have exceptional behavior to deal with specific attributes. The system that is storing the data should encode it on save and decode it for use. @dmsnell shared a good idea for how to indicate the encoding by using a data-uri.


it does not address effective sanitisation of the CSS string. What thoughts do people have about how to address that for 7.0?

I don't expect to have any CSS API for 7.0, which greatly limits the options. I still think the most reasonable thing at this point is to remove the custom block CSS from the post if a user without the capability tries to save it.

@glendaviesnz
Copy link
Author

@sirreal, @ramonjd serializeAttributes() already encodes the CSS in the serialised content, but parse_blocks() → json_decode() reverses that before it is passed into KSES, so it looks like base64 is the best option, but does that go against any Gutenberg principles that the block attribute content should be human-readable?

I am not aware of any other attributes that you can't make sense of by reading the serialised block content, and I have a vague feeling that this is by design, but I could be wrong.

If we are all happy with base64 I will close this PR and open an new Gutenberg PR to make that change.

@sirreal
Copy link
Member

sirreal commented Mar 10, 2026

The essential idea is that the data stored in the attribute is something that KSES leaves alone because it's harmless.

Either base64 (more opaque) or serializeAttributes (more readable, but perhaps more surprising) seem well suited to the task. Storing a string that is a base64- or json-encoded (via serializeAttributes) string of CSS text should serve. Note that either or these is an addition encoding of the stored data.


Edit:

The idea is that

parse_blocks() → json_decode()

Still contains safely encoded data

@ramonjd
Copy link
Member

ramonjd commented Mar 10, 2026

This seems to work for me with base64_decode
diff --git a/src/wp-includes/blocks.php b/src/wp-includes/blocks.php
index 89007d0d0d..7e577a4dd0 100644
--- a/src/wp-includes/blocks.php
+++ b/src/wp-includes/blocks.php
@@ -2075,8 +2075,38 @@ function _filter_block_content_callback( $matches ) {
  * @return array The filtered and sanitized block object result.
  */
 function filter_block_kses( $block, $allowed_html, $allowed_protocols = array() ) {
+	/*
+	 * Per-block custom CSS is not HTML; encode it before KSES (so it is not mangled)
+	 * and decode it back to plain CSS immediately after.
+	 */
+	if ( isset( $block['attrs']['style']['css'] ) && is_string( $block['attrs']['style']['css'] ) ) {
+		$css = trim( $block['attrs']['style']['css'] );
+		if ( '' !== $css ) {
+			/**
+			 * Prefix applied to block custom CSS when base64-encoded for the KSES pass.
+			 * The prefix lets wp_decode_block_custom_css_after_kses() recognise values it encoded.
+			 */
+			$block['attrs']['style']['css'] = 'data:text/css;base64,' . base64_encode( $css );
+		}
+	}
+
 	$block['attrs'] = filter_block_kses_value( $block['attrs'], $allowed_html, $allowed_protocols, $block );
 
+	if ( isset( $block['attrs']['style']['css'] ) && is_string( $block['attrs']['style']['css'] ) ) {
+		/**
+		 * Decodes block custom CSS from base64 back to plain CSS after the KSES pass.
+		 *
+		 * Only decodes values that start with prefix so we never
+		 * attempt to decode CSS that wasn't encoded above.
+		 */
+		$css = $block['attrs']['style']['css'];
+		if ( str_starts_with( $css, 'data:text/css;base64,' ) ) {
+			$decoded  = base64_decode( substr( $css, strlen( 'data:text/css;base64,' ) ), true );
+			// If the decoded string contains characters from outside the base64 alphabet or a null byte, set the CSS to an empty string.
+			$block['attrs']['style']['css'] = ( false === $decoded || str_contains( $decoded, "\0" ) ) ? '' : $decoded;
+		}
+	}
+
 	if ( is_array( $block['innerBlocks'] ) ) {
 		foreach ( $block['innerBlocks'] as $i => $inner_block ) {
 			$block['innerBlocks'][ $i ] = filter_block_kses( $inner_block, $allowed_html, $allowed_protocols );

I don't think transforming the data before and after KSES runs is the right approach. The system should be responsible for transforming its data without requiring exceptions built into general filters

Fair enough, I just added the diff above for the record.

The motivation was to keep "encode and decode" in PHP and also keep plain CSS in the database (unless I'm reading the feedback incorrectly, the implication is to store the encoded string?). Also to cover all save paths (REST and direct wp_insert_post() calls).

@sirreal
Copy link
Member

sirreal commented Mar 11, 2026

The motivation was to keep "encode and decode" in PHP and also keep plain CSS in the database

I don't think this is feasible.

the implication is to store the encoded string?

Yes, that's my thinking.

Here is my reasoning:

  • The CSS has to be stored in block attributes somehow.
  • KSES should run on what's stored in the DB, transforms after KSES tend to be dangerous.
  • KSES will mangle some plain CSS text.

If those hold true, it seems to follow that the data stored in the DB must be encoded in some way that doesn't contain special HTML characters.

I have my own lacunas! There may be a completely different way to achieve this, but encoding the data in the attribute seems like a straightforward and relatively simple way to resolve this issue.


It's still possible to strip this data for users without the appropriate capability. That's probably the simplest thing to do for WordPress 7.0 which is right around the corner. That wouldn't involve any other changes and would just rely on KSES not operating. That may not be the best solution in the long term.


I still think the data-URIs are very interesting:

const originalCSS =
`@property --animate {
  syntax: "<custom-ident>";
  inherits: true;
  initial-value: false;
}
div:has(input:first-child)::before {
  content: "<text>";
}
ul {
  list-style: "<li>";
}`;
const dataURI = `data:text/css,${ encodeURIComponent( originalCSS ) }`;
const parsedCSS = ( await import( dataURI, { with: { type: 'css' } } ) ).default;
const reconstructedCSS = Object.values( parsedCSS.cssRules ).map( rule => rule.cssText ).join( '\n' );

The result in reconstructedCSS is:

@property --animate { syntax: "<custom-ident>"; inherits: true; initial-value: false; }
div:has(input:first-child)::before { content: "<text>"; }
ul { list-style: "<li>"; }

Of course, I think the style system will need to handle decoding the data-URI in PHP, but the native browser APIs in this space are exciting.

@sirreal
Copy link
Member

sirreal commented Mar 11, 2026

Of course, I think the style system will need to handle decoding the data-URI in PHP, but the native browser APIs in this space are exciting.

Actually, PHP seems to handle base64 or %-encoded data URIs just fine in my testing:

<?php
$c = file_get_contents( 'data:text/css;base64,QHByb3BlcnR5IC0tYW5pbWF0ZSB7CiAgc3ludGF4OiAiPGN1c3RvbS1pZGVudD4iOwogIGluaGVyaXRzOiB0cnVlOwogIGluaXRpYWwtdmFsdWU6IGZhbHNlOwp9CmRpdjpoYXMoaW5wdXQ6Zmlyc3QtY2hpbGQpOjpiZWZvcmUgewogIGNvbnRlbnQ6ICI8dGV4dD4iOwp9CnVsIHsKICBsaXN0LXN0eWxlOiAiPGxpPiI7Cn0=' );
$c2 = file_get_contents( 'data:text/css,%40property%20--animate%20%7B%0A%20%20syntax%3A%20%22%3Ccustom-ident%3E%22%3B%0A%20%20inherits%3A%20true%3B%0A%20%20initial-value%3A%20false%3B%0A%7D%0Adiv%3Ahas(input%3Afirst-child)%3A%3Abefore%20%7B%0A%20%20content%3A%20%22%3Ctext%3E%22%3B%0A%7D%0Aul%20%7B%0A%20%20list-style%3A%20%22%3Cli%3E%22%3B%0A%7D' );
echo "{$c}\n{$c2}";

@ramonjd
Copy link
Member

ramonjd commented Mar 12, 2026

The CSS has to be stored in block attributes somehow.
KSES should run on what's stored in the DB, transforms after KSES tend to be dangerous.
KSES will mangle some plain CSS text.

I see, thanks for laying that out. 🍺

I was thinking about how folks reading content programmatically might need to handle this, e.g.,
WP-CLI or get_post().. even SELECT * FROM wp_posts WHERE post_content LIKE '%color: red%'.

Or we make it part of the contract that they have to decode or run the raw post content through parse blocks anyway.

I don't really know either. What smells to me though is that we're dancing around KSES's limitations for a compromise.

For WP 7.0, and assuming there is some time pressure, maybe storing as a data URI is pragmatic.

Later, if there's some CSS API™️ that leaves no fingerprints on the stored format, we'd have to support both formats forever.

@glendaviesnz
Copy link
Author

I am going to close this PR as if taking the suggested approach, this is going to need a Gutenberg fix.

See WordPress/gutenberg#76472

@dmsnell
Copy link
Member

dmsnell commented Mar 13, 2026

it looks like base64 is the best option, but does that go against any Gutenberg principles that the block attribute content should be human-readable?

how folks reading content programmatically might need to handle this

It can be liberating and helpful to start with the primary thing: we’re attempting to make it possible for folks to add CSS to their blocks individually. To me, the most important “Gutenberg principle” is that the feature works for the editor and doesn’t lose their trust by randomly breaking without explanation. I don’t want people to spend hours working on a design, have someone else look at it who saves the post, maybe because they correct some unrelated typo, and now the page is broken on render and that person who invested their time has proverbial “egg on their face.”

To that end I want to be open-minded about what technical challenges present themselves as obstacles to that goal. And granted, I think you both are raising awesome and exceptional points: having human-readable plaintext is the most open and transparent and interoperable thing we can reach for.

but unfortunately I think the situation we find ourselves in is a huge quagmire with no easy way to achieve all those goals. we can also note that plenty of Core blocks already hide their content from post_content: from the very beginning we had recent-posts and recent-comments blocks with zero content.

we're dancing around KSES's limitations for a compromise

There might be misunderstanding on what @sirreal is saying. KSES has its issues, but in fairness, it’s designed to “sanitize” HTML, not CSS. We had a long call yesterday to discuss the entire concept of CSS sanitization and it seems even more nebulous to discuss than HTML sanitization anyway.

There’s a reason we don’t run eslint against PHP code — they are different languages. The problem is that right now KSES is applying HTML logic against CSS. So it’s not that Jon is recommending a compromise, but separating out these two concerns.

The database stores HTML and received kind of broad blanket HTML-based linting/sanitization. The CSS needs to run through policy-based transformations (sanitization), but the only pragmatic tool we may have is to do that at the point of demarcation between the raw CSS string and when it is rendered back into an HTML document into either a STYLE element or style attribute. First we process it as CSS, then that provides us a new raw CSS value, then we run it through the appropriate HTML processing (which because it’s CSS and going to these two destinations, requires almost no further transformation).

we'd have to support both formats forever.

This is a compelling reason to think about prefixing the content with some kind of indicator or sigil, such as the data URI prefix. At least with a data URI the content is somewhat self-descriptive and people can relatively easily recover the source content.

Source code blocks and the HTML block have this same problem: WordPress transformations are complicated and make it hard to find any way to get it to save what you type: KSES in point.


Just wanted to share some thought in summary on this. Everyone is doing a great job, this is a difficult and common challenge, so thank you for working on it.

For the rushed timeline it may be worth compromises in scope rather than security. The best perk of limiting scope now is that we can announce enhancements in the future rather than apologize for shipping things that are broken. It’s generally easier to accept limitations if they are communicated and then liberated than if those boundaries aren’t knowable until you cross them.

@ramonjd
Copy link
Member

ramonjd commented Mar 13, 2026

Thanks for helping me understand the issues better, Dennis!

KSES has its issues, but in fairness, it’s designed to “sanitize” HTML, not CSS. We had a long call yesterday to discuss the entire concept of CSS sanitization and it seems even more nebulous to discuss than HTML sanitization anyway.

Ultimately I'll defer to Jon's and your wisdom on this matter - I've been reflecting, and I believe the kink in my thinking stemmed from considering all post_content as valid input for wp_kses. Right or wrong, that's why I was fixated on adding some short circuit there.

I also get that the surface of CSS processing in WP is a heck of a lot larger than the problem we're facing here, so I appreciate you folks stepping back and reasoning about it from wider design perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants