Skip to content

Perf: Skip string normalization when possible#10116

Merged
MichaReiser merged 1 commit into
mainfrom
perf-string-normalization
Feb 26, 2024
Merged

Perf: Skip string normalization when possible#10116
MichaReiser merged 1 commit into
mainfrom
perf-string-normalization

Conversation

@MichaReiser
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser commented Feb 25, 2024

Summary

Most strings don't contain any quotes or other characters that need to be normalized. This PR makes use of this fact and short circuits choose_quotes and normalize for strings that contain no such characters. This is done by having one simple loop that searches for the occurrences of any character that needs normalization. This is much faster than the complicated normalization loop.

This removes the calls to StringNormalizer almost entirely from the flamegraphs (except for users that use \r or \r\n newlines that always need normalisation ;( )

Test Plan

cargo test

@MichaReiser MichaReiser added performance Potential performance improvement formatter Related to the formatter labels Feb 25, 2024
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Feb 25, 2024

CodSpeed Performance Report

Merging #10116 will improve performances by 18.79%

Comparing perf-string-normalization (f7a3f92) with main (15b87ea)

Summary

⚡ 2 improvements
✅ 28 untouched benchmarks

Benchmarks breakdown

Benchmark main perf-string-normalization Change
formatter[numpy/ctypeslib.py] 10.2 ms 9.7 ms +4.62%
formatter[numpy/globals.py] 1,176.1 µs 990.1 µs +18.79%

@MichaReiser MichaReiser force-pushed the perf-string-normalization branch from 1c7e244 to b87fbfb Compare February 25, 2024 10:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 25, 2024

ruff-ecosystem results

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/How_to_handle_rate_limits.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: trailing comma at line 47 column 4

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/How_to_handle_rate_limits.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: trailing comma at line 47 column 4

@MichaReiser MichaReiser marked this pull request as ready for review February 25, 2024 10:42
let raw_content = locator.slice(string.content_range());
let first_quote_or_normalized_char_offset = raw_content
.bytes()
.position(|b| matches!(b, b'\\' | b'"' | b'\'' | b'\r' | b'{'));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not an f-string, you can omit {, I think? Similarly, if it's a raw string, you can omit \\... (In that case, you could actually use memchr3, but perhaps not worth it.)

Copy link
Copy Markdown
Member Author

@MichaReiser MichaReiser Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered special casing but decided against it because I want to avoid the extra complexity and it doesn't have to be perfect, for as long as it avoids the "expensive" normalization for most strings.

Using memchr would be nice... but normalize is already now almost gone from the benchmark profiles. That's why I consider this as "good enough".

@MichaReiser MichaReiser force-pushed the perf-string-normalization branch from b87fbfb to f7a3f92 Compare February 26, 2024 17:28
@MichaReiser MichaReiser enabled auto-merge (squash) February 26, 2024 17:28
@MichaReiser MichaReiser merged commit 8dc22d5 into main Feb 26, 2024
@MichaReiser MichaReiser deleted the perf-string-normalization branch February 26, 2024 17:35
nkxxll pushed a commit to nkxxll/ruff that referenced this pull request Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter performance Potential performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants