html_escape: Avoid buffer allocation for strings with no escapable character #87

noteflakes · 2025-10-11T11:29:41Z

The existing html_escape implementation always allocates buffer space (6 times
the length of the input string), even when the input string does not contain any
character that needs to be escaped.

This PR modifies the implementation of optimized_escape_html to not
pre-allocate an output buffer, but instead allocate it on the first occurence of
a character that needs escaping. In addition, instead of copying non-escaped
characters one by one to the output buffer, continuous non-escaped segments of
characters are copied using memcpy.

A synthetic benchmark employing the input strings used in the test_html_escape
method in test/test_erb.rb shows the modified implementation to be about 35%
faster than the original:

ruby 3.5.0preview1 (2025-04-18 master d06ec25be4) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
          escape old   273.773k i/100ms
          escape new   369.558k i/100ms
Calculating -------------------------------------
          escape old      2.766M (± 1.6%) i/s  (361.48 ns/i) -     13.962M in   5.048625s
          escape new      3.765M (± 2.0%) i/s  (265.58 ns/i) -     18.847M in   5.007869s

Comparison:
          escape old:  2766396.0 i/s
          escape new:  3765317.7 i/s - 1.36x  faster

…aracter This change improves reduces allocations and makes `html_escape` ~35% faster in a benchmark with escaped strings taken from the `test_html_escape` test in `test/test_erb.rb`. - Perform buffer allocation on first instance of escapable character. - Instead of copying characters one at a time, copy unescaped segments using `memcpy`.

escapable character (ruby/erb#87) This change improves reduces allocations and makes `html_escape` ~35% faster in a benchmark with escaped strings taken from the `test_html_escape` test in `test/test_erb.rb`. - Perform buffer allocation on first instance of escapable character. - Instead of copying characters one at a time, copy unescaped segments using `memcpy`. ruby/erb@aa482890fe

Fix ruby#87

Fix ruby/erb#87 ruby/erb@75764f022b

ext/erb/escape/escape.c

p8 · 2025-10-13T08:46:14Z

Thanks @noteflakes ! Is this optimization also possible in CGI#escape_html?
Also wondering if this should be duplicated in both gems...
https://github.com/ruby/cgi/blob/9d9a2eb868483fbf71d8e47d51d7b237d3867409/ext/cgi/escape/escape.c

noteflakes · 2025-10-13T09:11:50Z

Thanks @noteflakes ! Is this optimization also possible in CGI#escape_html? Also wondering if this should be duplicated in both gems... https://github.com/ruby/cgi/blob/9d9a2eb868483fbf71d8e47d51d7b237d3867409/ext/cgi/escape/escape.c

Sure, I'll make a PR for CGI too.

k0kubun · 2025-10-13T12:39:22Z

Also wondering if this should be duplicated in both gems...

They have slightly different hehaviors. ERB's is faster than CGI's, and replacing CGI's with ERB's would be a breaking change, which is why they are deliberately separate.

escapable character (ruby/erb#87) This change improves reduces allocations and makes `html_escape` ~35% faster in a benchmark with escaped strings taken from the `test_html_escape` test in `test/test_erb.rb`. - Perform buffer allocation on first instance of escapable character. - Instead of copying characters one at a time, copy unescaped segments using `memcpy`. ruby/erb@aa482890fe

Fix ruby/erb#87 ruby/erb@75764f022b

k0kubun approved these changes Oct 11, 2025

View reviewed changes

k0kubun merged commit aa48289 into ruby:master Oct 11, 2025
8 checks passed

nobu added a commit to nobu/erb that referenced this pull request Oct 12, 2025

Fix integer overflow

75764f0

Fix ruby#87

matzbot pushed a commit to ruby/ruby that referenced this pull request Oct 12, 2025

[ruby/erb] Fix integer overflow

7cc3191

Fix ruby/erb#87 ruby/erb@75764f022b

byroot reviewed Oct 12, 2025

View reviewed changes

ext/erb/escape/escape.c Show resolved Hide resolved

This was referenced Oct 13, 2025

Refactor html_escape #88

Merged

escape_html: Avoid buffer allocation for strings with no escapable character ruby/cgi#85

Open

noteflakes deleted the optimized_html_escape branch October 13, 2025 18:20

aidenfoxivey pushed a commit to aidenfoxivey/ruby that referenced this pull request Oct 17, 2025

[ruby/erb] Fix integer overflow

8c527d8

Fix ruby/erb#87 ruby/erb@75764f022b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

html_escape: Avoid buffer allocation for strings with no escapable character #87

html_escape: Avoid buffer allocation for strings with no escapable character #87

Uh oh!

noteflakes commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

p8 commented Oct 13, 2025

Uh oh!

noteflakes commented Oct 13, 2025

Uh oh!

k0kubun commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

html_escape: Avoid buffer allocation for strings with no escapable character #87

html_escape: Avoid buffer allocation for strings with no escapable character #87

Uh oh!

Conversation

noteflakes commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

p8 commented Oct 13, 2025

Uh oh!

noteflakes commented Oct 13, 2025

Uh oh!

k0kubun commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants