Skip to content

⚡️ Speed up function _cmpkey by 11%#16

Open
codeflash-ai[bot] wants to merge 1 commit intoopt-attempt-2from
codeflash/optimize-_cmpkey-mjjkdxbo
Open

⚡️ Speed up function _cmpkey by 11%#16
codeflash-ai[bot] wants to merge 1 commit intoopt-attempt-2from
codeflash/optimize-_cmpkey-mjjkdxbo

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Dec 24, 2025

📄 11% (0.11x) speedup for _cmpkey in src/packaging/version.py

⏱️ Runtime : 734 microseconds 660 microseconds (best of 6 runs)

📝 Explanation and details

The optimized code achieves an 11% speedup through three strategic improvements:

1. Fast-Path Optimization in _strip_trailing_zeros (Primary Speedup)

The original code always iterates backward through the tuple. The optimization adds:

  • Early return for empty tuples (avoids iteration setup)
  • Fast-path check: If the last element is non-zero (common case), return immediately without iteration
  • Cached length: Avoids repeated len() calls

Why this matters: Test results show significant gains when trailing zeros are absent:

  • test_large_scale_release_1000_elements (last element non-zero): 63.8% faster (1.50μs → 916ns)
  • test_basic_no_pre_post_dev_local: 57.2% faster (2.29μs → 1.46μs)
  • Versions with trailing zeros show minimal regression (1-2%), indicating the fast-path successfully handles the common case

2. Type Check Optimization: type(i) is int vs isinstance(i, int)

Replaced isinstance(i, int) with type(i) is int in the local segment parsing.

Why this is faster: type(i) is int performs a direct identity comparison, while isinstance() checks the entire class hierarchy. Since the type hints guarantee exact types (Union[int, str]), we can use the faster check. Tests with local segments show consistent improvements:

  • test_edge_local_all_strs: 30.3% faster (57.4μs → 44.0μs)
  • test_edge_local_mixed_types: 40.0% faster (2.33μs → 1.67μs)

3. Streamlined Conditional Logic

Restructured the pre/post/dev assignments:

  • Nested conditionals for _pre: Reduces redundant post is None checks by nesting the dev check
  • Inline ternary expressions for _post and _dev: More compact and eliminates branch prediction overhead

Impact: Basic tests without complex segments show strong improvements (30-62% faster), while tests with all segments present show modest gains or slight regressions (controlled by local segment processing).

Workload Impact Assessment

Based on function_references, _cmpkey is called from _key() with caching, meaning it's invoked once per Version object creation. The optimization benefits:

  • Version parsing in hot paths (e.g., dependency resolution, package comparisons)
  • Workloads processing many versions where the fast-path handles versions without trailing zeros (most semantic versions like "1.2.3")
  • Scenarios with local version identifiers (e.g., "1.2.3+local.build"), where the type check optimization provides 13-40% gains

The optimization is particularly effective for typical semantic versions (no trailing zeros, minimal local segments) while maintaining acceptable performance for edge cases with extensive trailing zeros or large local segments.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 59 Passed
⏪ Replay Tests 255 Passed
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
# function to test
from src.packaging._structures import Infinity, NegativeInfinity
from src.packaging.version import _cmpkey

# unit tests

# --- Basic Test Cases ---

def test_basic_no_pre_post_dev_local():
    # All segments except epoch/release are None
    codeflash_output = _cmpkey(0, (1, 2, 3), None, None, None, None); result = codeflash_output # 2.29μs -> 1.46μs (57.2% faster)

def test_basic_with_pre_post_dev_local():
    # All segments present
    codeflash_output = _cmpkey(1, (1, 2, 0), ("a", 1), ("post", 2), ("dev", 3), (42, "abc")); result = codeflash_output # 3.00μs -> 3.46μs (13.2% slower)

def test_basic_only_dev():
    # Only dev present, no pre/post
    codeflash_output = _cmpkey(0, (1, 0, 0), None, None, ("dev", 0), None); result = codeflash_output # 1.54μs -> 1.67μs (7.44% slower)

def test_basic_only_pre():
    # Only pre present
    codeflash_output = _cmpkey(0, (1, 0), ("b", 2), None, None, None); result = codeflash_output # 1.38μs -> 1.38μs (0.000% faster)

def test_basic_only_post():
    # Only post present
    codeflash_output = _cmpkey(0, (2, 0), None, ("post", 1), None, None); result = codeflash_output # 1.38μs -> 1.38μs (0.000% faster)

def test_basic_only_local():
    # Only local present
    codeflash_output = _cmpkey(0, (1, 0), None, None, None, (123, "xyz")); result = codeflash_output # 2.50μs -> 2.42μs (3.48% faster)

# --- Edge Test Cases ---

def test_edge_release_all_zeros():
    # Release tuple is all zeros
    codeflash_output = _cmpkey(0, (0, 0, 0), None, None, None, None); result = codeflash_output # 1.25μs -> 1.33μs (6.30% slower)

def test_edge_release_empty_tuple():
    # Release tuple is empty
    codeflash_output = _cmpkey(0, (), None, None, None, None); result = codeflash_output # 1.17μs -> 916ns (27.4% faster)

def test_edge_local_all_ints():
    # Local segment is all ints
    codeflash_output = _cmpkey(0, (1,), None, None, None, (1, 2, 3)); result = codeflash_output # 2.17μs -> 1.92μs (13.1% faster)

def test_edge_local_all_strs():
    # Local segment is all strings
    codeflash_output = _cmpkey(0, (1,), None, None, None, ("a", "b", "c")); result = codeflash_output # 2.00μs -> 1.62μs (23.1% faster)

def test_edge_local_mixed_types():
    # Local segment is mixed ints and strings
    codeflash_output = _cmpkey(0, (1,), None, None, None, (99, "foo", 0, "bar")); result = codeflash_output # 2.33μs -> 1.67μs (40.0% faster)

def test_edge_pre_post_dev_none_and_zero():
    # pre, post, dev are None and zero values
    codeflash_output = _cmpkey(0, (1, 0), None, None, None, None); result = codeflash_output # 1.12μs -> 1.29μs (12.9% slower)
    codeflash_output = _cmpkey(0, (1, 0), None, None, ("dev", 0), None); result2 = codeflash_output # 833ns -> 750ns (11.1% faster)

def test_edge_pre_post_dev_minimal():
    # pre, post, dev are minimal values
    codeflash_output = _cmpkey(0, (1,), ("a", 0), ("post", 0), ("dev", 0), None); result = codeflash_output # 1.08μs -> 833ns (30.1% faster)

def test_edge_local_empty_tuple():
    # Local segment is empty tuple
    codeflash_output = _cmpkey(0, (1,), None, None, None, ()); result = codeflash_output # 1.88μs -> 1.62μs (15.4% faster)

def test_edge_epoch_negative():
    # Negative epoch
    codeflash_output = _cmpkey(-1, (1,), None, None, None, None); result = codeflash_output # 1.04μs -> 750ns (38.8% faster)

def test_edge_large_release_tuple():
    # Large release tuple with trailing zeros
    release = tuple([1] + [0]*999)
    codeflash_output = _cmpkey(0, release, None, None, None, None); result = codeflash_output # 20.9μs -> 21.3μs (1.76% slower)

# --- Large Scale Test Cases ---

def test_large_scale_local_1000_elements():
    # Local segment with 1000 elements, alternating int and str
    local = tuple(i if i % 2 == 0 else f"s{i}" for i in range(1000))
    codeflash_output = _cmpkey(0, (1,), None, None, None, local); result = codeflash_output # 67.4μs -> 63.7μs (5.83% faster)

def test_large_scale_release_1000_elements():
    # Release tuple with 1000 elements, last one nonzero
    release = tuple([0]*999 + [5])
    codeflash_output = _cmpkey(0, release, None, None, None, None); result = codeflash_output # 1.50μs -> 916ns (63.8% faster)
    # Now with trailing zeros
    release2 = tuple([5] + [0]*999)
    codeflash_output = _cmpkey(0, release2, None, None, None, None); result2 = codeflash_output # 20.0μs -> 20.5μs (2.23% slower)

def test_large_scale_all_segments_present():
    # All segments present, large local
    local = tuple(i if i % 2 == 0 else f"l{i}" for i in range(1000))
    codeflash_output = _cmpkey(2, (9, 8, 7, 0, 0), ("rc", 99), ("post", 123), ("dev", 456), local); result = codeflash_output # 52.5μs -> 45.1μs (16.4% faster)

def test_large_scale_epoch_release_variants():
    # Multiple epochs and releases, check sorting keys
    for epoch in range(5):
        release = tuple([epoch] + [0]*999)
        codeflash_output = _cmpkey(epoch, release, None, None, None, None); result = codeflash_output # 98.0μs -> 98.6μs (0.590% slower)

def test_large_scale_pre_post_dev_variants():
    # Many combinations of pre/post/dev
    for i in range(10):
        pre = ("pre", i) if i % 3 == 0 else None
        post = ("post", i) if i % 3 == 1 else None
        dev = ("dev", i) if i % 3 == 2 else None
        codeflash_output = _cmpkey(0, (1,), pre, post, dev, None); result = codeflash_output # 4.84μs -> 3.79μs (27.6% faster)
        if pre is None and post is None and dev is not None:
            pass
        elif pre is None:
            pass
        else:
            pass
        if post is None:
            pass
        else:
            pass
        if dev is None:
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from typing import Tuple, Union

# imports
import pytest
# function to test
from src.packaging._structures import Infinity, NegativeInfinity
from src.packaging.version import _cmpkey

LocalType = Tuple[Union[int, str], ...]
from src.packaging.version import _cmpkey

# unit tests

# --- BASIC TEST CASES ---

def test_basic_no_pre_post_dev_local():
    # Test with all optional segments as None, simple release
    codeflash_output = _cmpkey(0, (1, 2, 3), None, None, None, None); result = codeflash_output # 1.08μs -> 833ns (30.1% faster)

def test_basic_with_pre():
    # Test with a pre-release segment
    codeflash_output = _cmpkey(0, (1, 2, 3), ("a", 1), None, None, None); result = codeflash_output # 1.08μs -> 667ns (62.4% faster)

def test_basic_with_post():
    # Test with a post-release segment
    codeflash_output = _cmpkey(0, (1, 2, 3), None, ("post", 1), None, None); result = codeflash_output # 1.08μs -> 791ns (37.0% faster)

def test_basic_with_dev():
    # Test with a dev-release segment
    codeflash_output = _cmpkey(0, (1, 2, 3), None, None, ("dev", 1), None); result = codeflash_output # 1.08μs -> 750ns (44.4% faster)

def test_basic_with_local_int():
    # Test with a local segment containing an int
    codeflash_output = _cmpkey(0, (1, 2, 3), None, None, None, (42,)); result = codeflash_output # 1.88μs -> 1.62μs (15.4% faster)

def test_basic_with_local_str():
    # Test with a local segment containing a str
    codeflash_output = _cmpkey(0, (1, 2, 3), None, None, None, ("abc",)); result = codeflash_output # 1.88μs -> 1.42μs (32.3% faster)

def test_basic_with_all_segments():
    # Test with all segments present
    codeflash_output = _cmpkey(1, (1, 0, 0), ("rc", 2), ("post", 3), ("dev", 4), ("xyz", 99)); result = codeflash_output # 2.21μs -> 2.25μs (1.87% slower)

# --- EDGE TEST CASES ---

def test_edge_release_trailing_zeros():
    # Release tuple with trailing zeros
    codeflash_output = _cmpkey(0, (1, 2, 0, 0, 0), None, None, None, None); result = codeflash_output # 1.21μs -> 1.42μs (14.7% slower)

def test_edge_release_all_zeros():
    # Release tuple with all zeros
    codeflash_output = _cmpkey(0, (0, 0, 0), None, None, None, None); result = codeflash_output # 1.08μs -> 1.21μs (10.4% slower)

def test_edge_local_multiple_segments():
    # Local segment with multiple mixed types
    codeflash_output = _cmpkey(0, (1,), None, None, None, ("abc", 42, "def", 7)); result = codeflash_output # 2.17μs -> 1.88μs (15.6% faster)

def test_edge_local_empty_tuple():
    # Local segment as empty tuple
    codeflash_output = _cmpkey(0, (1,), None, None, None, ()); result = codeflash_output # 1.79μs -> 1.50μs (19.5% faster)

def test_edge_pre_post_dev_none_and_values():
    # All combinations of None and values for pre, post, dev
    codeflash_output = _cmpkey(0, (1,), ("a", 1), ("post", 2), ("dev", 3), None); result = codeflash_output # 1.12μs -> 750ns (50.0% faster)

def test_edge_pre_none_post_value_dev_none():
    # Only post is set
    codeflash_output = _cmpkey(0, (1,), None, ("post", 2), None, None); result = codeflash_output # 1.08μs -> 750ns (44.4% faster)

def test_edge_pre_none_post_none_dev_value():
    # Only dev is set, triggers "trick" for pre
    codeflash_output = _cmpkey(0, (1,), None, None, ("dev", 3), None); result = codeflash_output # 1.12μs -> 750ns (50.0% faster)

def test_edge_local_str_and_int_order():
    # Local segment with str and int, order matters
    codeflash_output = _cmpkey(0, (1,), None, None, None, ("z", 2, "a", 1)); result = codeflash_output # 2.08μs -> 1.71μs (21.9% faster)

def test_edge_epoch_negative():
    # Negative epoch value
    codeflash_output = _cmpkey(-1, (1,), None, None, None, None); result = codeflash_output # 1.04μs -> 750ns (38.8% faster)

def test_edge_release_empty_tuple():
    # Release tuple is empty
    codeflash_output = _cmpkey(0, (), None, None, None, None); result = codeflash_output # 1.08μs -> 834ns (30.0% faster)

# --- LARGE SCALE TEST CASES ---

def test_large_release_tuple():
    # Large release tuple, trailing zeros
    release = tuple(list(range(1, 501)) + [0]*499)
    codeflash_output = _cmpkey(0, release, None, None, None, None); result = codeflash_output # 12.9μs -> 13.1μs (1.59% slower)

def test_large_local_segment():
    # Large local segment with alternating int/str
    local = tuple(i if i % 2 == 0 else f"str{i}" for i in range(1000))
    codeflash_output = _cmpkey(0, (1,), None, None, None, local); result = codeflash_output # 53.0μs -> 47.0μs (13.0% faster)

def test_large_all_segments():
    # Large release, local, and all segments set
    release = tuple(range(1, 1001))
    local = tuple(f"l{i}" for i in range(1000))
    pre = ("pre", 123)
    post = ("post", 456)
    dev = ("dev", 789)
    codeflash_output = _cmpkey(2, release, pre, post, dev, local); result = codeflash_output # 57.6μs -> 44.3μs (30.0% faster)

def test_large_release_all_zeros():
    # Large release tuple, all zeros
    release = tuple([0] * 1000)
    codeflash_output = _cmpkey(0, release, None, None, None, None); result = codeflash_output # 20.3μs -> 20.5μs (0.606% slower)

def test_large_local_all_ints():
    # Large local segment, all ints
    local = tuple(range(1000))
    codeflash_output = _cmpkey(0, (1,), None, None, None, local); result = codeflash_output # 44.7μs -> 43.5μs (2.88% faster)

def test_large_local_all_strs():
    # Large local segment, all strs
    local = tuple(f"s{i}" for i in range(1000))
    codeflash_output = _cmpkey(0, (1,), None, None, None, local); result = codeflash_output # 57.4μs -> 44.0μs (30.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from src.packaging.version import _cmpkey

def test__cmpkey():
    _cmpkey(0, (), None, None, None, (0))

def test__cmpkey_2():
    _cmpkey(0, (), ('', 0), ('', 0), ('', 0), None)

def test__cmpkey_3():
    _cmpkey(0, (2), None, None, ('', 0), ())
⏪ Click to see Replay Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_benchmark_py__replay_test_0.py::test_src_packaging_version__cmpkey 170μs 148μs 14.7%✅
🔎 Click to see Concolic Coverage Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_ui1l843q/tmp8h843e1l/test_concolic_coverage.py::test__cmpkey_2 1.33μs 875ns 52.5%✅

To edit these changes git checkout codeflash/optimize-_cmpkey-mjjkdxbo and push.

Codeflash Static Badge

The optimized code achieves an **11% speedup** through three strategic improvements:

## 1. Fast-Path Optimization in `_strip_trailing_zeros` (Primary Speedup)

The original code always iterates backward through the tuple. The optimization adds:
- **Early return for empty tuples** (avoids iteration setup)
- **Fast-path check**: If the last element is non-zero (common case), return immediately without iteration
- **Cached length**: Avoids repeated `len()` calls

**Why this matters**: Test results show significant gains when trailing zeros are absent:
- `test_large_scale_release_1000_elements` (last element non-zero): **63.8% faster** (1.50μs → 916ns)
- `test_basic_no_pre_post_dev_local`: **57.2% faster** (2.29μs → 1.46μs)
- Versions with trailing zeros show minimal regression (1-2%), indicating the fast-path successfully handles the common case

## 2. Type Check Optimization: `type(i) is int` vs `isinstance(i, int)`

Replaced `isinstance(i, int)` with `type(i) is int` in the local segment parsing.

**Why this is faster**: `type(i) is int` performs a direct identity comparison, while `isinstance()` checks the entire class hierarchy. Since the type hints guarantee exact types (`Union[int, str]`), we can use the faster check. Tests with local segments show consistent improvements:
- `test_edge_local_all_strs`: **30.3% faster** (57.4μs → 44.0μs)
- `test_edge_local_mixed_types`: **40.0% faster** (2.33μs → 1.67μs)

## 3. Streamlined Conditional Logic

Restructured the pre/post/dev assignments:
- **Nested conditionals for `_pre`**: Reduces redundant `post is None` checks by nesting the dev check
- **Inline ternary expressions** for `_post` and `_dev`: More compact and eliminates branch prediction overhead

**Impact**: Basic tests without complex segments show strong improvements (30-62% faster), while tests with all segments present show modest gains or slight regressions (controlled by local segment processing).

## Workload Impact Assessment

Based on `function_references`, `_cmpkey` is called from `_key()` with caching, meaning it's invoked once per Version object creation. The optimization benefits:
- **Version parsing in hot paths** (e.g., dependency resolution, package comparisons)
- **Workloads processing many versions** where the fast-path handles versions without trailing zeros (most semantic versions like "1.2.3")
- **Scenarios with local version identifiers** (e.g., "1.2.3+local.build"), where the type check optimization provides 13-40% gains

The optimization is particularly effective for typical semantic versions (no trailing zeros, minimal local segments) while maintaining acceptable performance for edge cases with extensive trailing zeros or large local segments.
@codeflash-ai codeflash-ai Bot requested a review from KRRT7 December 24, 2025 05:19
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants