⚡️ Speed up function _version_split by 38%#19
Open
codeflash-ai[bot] wants to merge 1 commit intoopt-attempt-2from
Open
⚡️ Speed up function _version_split by 38%#19codeflash-ai[bot] wants to merge 1 commit intoopt-attempt-2from
_version_split by 38%#19codeflash-ai[bot] wants to merge 1 commit intoopt-attempt-2from
Conversation
The optimization adds a **pre-filter check** before invoking the expensive regex operation, avoiding regex overhead for components that cannot possibly match the pattern. **What changed:** The code now checks `if item and item[0].isdigit() and not item.isdigit()` before calling `_prefix_regex.fullmatch(item)`. This condition identifies items that: 1. Are non-empty 2. Start with a digit 3. Are NOT purely numeric (meaning they contain mixed alphanumeric content) Only these candidates can match the pattern `([0-9]+)((?:a|b|c|rc)[0-9]+)` (e.g., "1a2", "3rc5"). The optimization skips regex matching for purely numeric items (like "1", "2", "3") and non-numeric items (like "foo", ""), which comprise the majority of version components in typical usage. **Why it's faster:** Regex operations in Python have significant overhead - even for non-matches, the engine must parse and evaluate the pattern. The line profiler shows: - **Original**: `_prefix_regex.fullmatch(item)` was called 10,411 times, consuming 5.2 million nanoseconds (25.8% of total time) - **Optimized**: The regex is now called only 2,638 times (a 75% reduction), consuming just 1.2 million nanoseconds (8% of total time) The fast-path checks (`item[0].isdigit()` and `not item.isdigit()`) are native string operations implemented in C, making them orders of magnitude faster than regex compilation and matching. **Impact on workloads:** Based on `function_references`, this function is called from `_compare_compatible` and `_compare_equal` in specifier matching logic - potentially hot paths during dependency resolution where many version strings are compared. The test results show: - **Pure numeric versions** (e.g., "1.2.3", "1.2.3.4"): 18-35% faster - these benefit most since regex is completely avoided - **Versions with many empty components** (e.g., "...", 999 dots): 64-277% faster - empty strings bypass regex - **Long numeric components** (e.g., 999-digit strings): 187-587% faster - avoids regex on large strings - **Versions with pre-release suffixes** (e.g., "1.2a1", "1.2rc4"): Mostly neutral or slightly slower (2-10%) due to extra conditional checks, but the regex still runs when needed The optimization trades a small overhead for suffix-heavy versions (where most components match the pattern) for substantial gains on common numeric-only versions. Given typical version strings in package management rarely have pre-release suffixes on every component, the overall impact is positive, as evidenced by the 37% average speedup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 38% (0.38x) speedup for
_version_splitinsrc/packaging/specifiers.py⏱️ Runtime :
1.93 milliseconds→1.40 milliseconds(best of6runs)📝 Explanation and details
The optimization adds a pre-filter check before invoking the expensive regex operation, avoiding regex overhead for components that cannot possibly match the pattern.
What changed:
The code now checks
if item and item[0].isdigit() and not item.isdigit()before calling_prefix_regex.fullmatch(item). This condition identifies items that:Only these candidates can match the pattern
([0-9]+)((?:a|b|c|rc)[0-9]+)(e.g., "1a2", "3rc5"). The optimization skips regex matching for purely numeric items (like "1", "2", "3") and non-numeric items (like "foo", ""), which comprise the majority of version components in typical usage.Why it's faster:
Regex operations in Python have significant overhead - even for non-matches, the engine must parse and evaluate the pattern. The line profiler shows:
_prefix_regex.fullmatch(item)was called 10,411 times, consuming 5.2 million nanoseconds (25.8% of total time)The fast-path checks (
item[0].isdigit()andnot item.isdigit()) are native string operations implemented in C, making them orders of magnitude faster than regex compilation and matching.Impact on workloads:
Based on
function_references, this function is called from_compare_compatibleand_compare_equalin specifier matching logic - potentially hot paths during dependency resolution where many version strings are compared. The test results show:The optimization trades a small overhead for suffix-heavy versions (where most components match the pattern) for substantial gains on common numeric-only versions. Given typical version strings in package management rarely have pre-release suffixes on every component, the overall impact is positive, as evidenced by the 37% average speedup.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
⏪ Click to see Replay Tests
test_benchmark_py__replay_test_0.py::test_src_packaging_specifiers__version_split🔎 Click to see Concolic Coverage Tests
codeflash_concolic_ui1l843q/tmp13d42exe/test_concolic_coverage.py::test__version_splitTo edit these changes
git checkout codeflash/optimize-_version_split-mjjlqufyand push.