Skip to content

⚡️ Speed up function _has_sorted_sa_indices by 8%#127

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_has_sorted_sa_indices-mkpgddfz
Open

⚡️ Speed up function _has_sorted_sa_indices by 8%#127
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_has_sorted_sa_indices-mkpgddfz

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Jan 22, 2026

📄 8% (0.08x) speedup for _has_sorted_sa_indices in quantecon/markov/utilities.py

⏱️ Runtime : 146 microseconds 135 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves an ~8% speedup through two key changes that reduce array access overhead in Numba's nopython mode:

1. Early Exit for Trivial Cases

if L <= 1:
    return True

This avoids unnecessary loop setup and iteration for empty or single-element arrays, which are trivially sorted. Test results show this helps edge cases like test_edge_empty_arrays (8.86% faster) and test_edge_single_element (5-12% faster).

2. Reduced Array Indexing Through Caching
The original code accesses s_indices[i], s_indices[i+1], a_indices[i], and a_indices[i+1] on each iteration—up to 4 array reads per comparison. The optimized version caches previous values:

prev_s = s_indices[0]
prev_a = a_indices[0]
for i in range(1, L):
    s = s_indices[i]
    # Compare prev_s with s (no array access)
    # Update prev_s = s, prev_a = a_indices[i]

This reduces array accesses from ~4 per iteration to ~2, which matters in Numba's nopython mode where array bounds checking and indexing have overhead. The test results confirm this helps across the board, with larger gains in tests with more iterations:

  • Large-scale tests show 13-21% speedups (test_large_scale_sorted_ascending: 13.7%, test_large_scale_many_same_states: 21.4%)
  • Small arrays still benefit (4-12% gains) from reduced indexing

Impact on Production Usage
Looking at function_references, this function is called during DiscreteDP.__init__() to check if state-action indices are pre-sorted. If unsorted, expensive sorting and data reorganization occurs. The optimization:

  • Speeds up the sorted path (common case when data is already organized), making initialization faster
  • Speeds up early violation detection in the unsorted path, allowing the expensive sorting fallback to trigger sooner

Since DiscreteDP is likely instantiated in performance-critical economic simulations (per the quantecon library context), even an 8% speedup in this validation check can compound when constructing multiple DDPs or in hot initialization loops.

The optimization is most effective for moderate-to-large arrays (100-1000+ elements) where iteration dominates, but provides consistent gains across all array sizes due to reduced indexing overhead.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 47 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import numba  # required by the original function decorator
import numpy as np  # used to create typed numeric arrays for JIT compatibility
# imports
import pytest  # used for our unit tests
from quantecon.markov.utilities import _has_sorted_sa_indices

def test_basic_sorted_sequence():
    # Basic: simple lexicographically sorted example.
    # s increases when appropriate and a strictly increases when s ties.
    s = np.array([0, 0, 1, 1, 2], dtype=np.int64)  # state indices
    a = np.array([0, 1, 0, 1, 0], dtype=np.int64)  # action indices
    # The sequence is lexicographically sorted, expect True.
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.96μs -> 2.78μs (6.73% faster)

def test_basic_unsorted_s_indices():
    # Basic: s_indices decreases at index 1 -> should be unsorted
    s = np.array([0, 2, 1], dtype=np.int64)  # 2 > 1 at consecutive positions
    a = np.array([0, 0, 0], dtype=np.int64)  # a irrelevant here
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.96μs -> 2.75μs (7.75% faster)

def test_basic_unsorted_a_with_equal_s():
    # Basic: s indices equal but a indices non-increasing (1 >= 0) -> unsorted
    s = np.array([5, 5], dtype=np.int64)
    a = np.array([1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.91μs -> 2.69μs (8.23% faster)

def test_edge_empty_arrays():
    # Edge: empty arrays should be considered sorted (no violations)
    s = np.array([], dtype=np.int64)
    a = np.array([], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.87μs -> 2.64μs (8.86% faster)

def test_edge_single_element():
    # Edge: single-element arrays are trivially sorted
    s = np.array([7], dtype=np.int64)
    a = np.array([3], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.94μs -> 2.79μs (5.30% faster)

def test_edge_equal_a_values_with_equal_s():
    # Edge: equal s with equal a values -> should be considered unsorted because
    # the implementation rejects a[i] >= a[i+1] when s[i] == s[i+1]
    s = np.array([1, 1, 1], dtype=np.int64)
    a = np.array([0, 0, 1], dtype=np.int64)  # first pair 0 >= 0 triggers violation
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 2.92μs -> 2.79μs (4.88% faster)

    # And when a strictly increases for equal s, result should be True
    a_ok = np.array([0, 1, 2], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s, a_ok); res_ok = codeflash_output # 633ns -> 627ns (0.957% faster)

def test_dtype_variations_int32():
    # Edge/compatibility: ensure different integer dtypes (int32) work correctly
    s = np.array([0, 0, 1], dtype=np.int32)
    a = np.array([0, 1, 0], dtype=np.int32)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 3.06μs -> 2.75μs (11.3% faster)

def test_non_contiguous_views_sorted():
    # Edge: non-contiguous numpy views should still be handled
    # Build contiguous arrays then take every second element to create a view.
    full_s = np.repeat(np.arange(4, dtype=np.int64), 3)  # [0,0,0,1,1,1,2,2,2,3,3,3]
    full_a = np.tile(np.arange(3, dtype=np.int64), 4)    # [0,1,2,0,1,2,...]
    # Take every second element to create a non-contiguous view; lengths remain equal.
    s_view = full_s[::2]
    a_view = full_a[::2]
    # The tiling/repeating structure preserves lexicographic ordering on these slices.
    codeflash_output = _has_sorted_sa_indices(s_view, a_view); res = codeflash_output # 2.88μs -> 2.77μs (3.86% faster)

def test_large_scale_sorted():
    # Large scale: create a moderately large (but <= 1000 elements) test
    # Use blocks where s is constant and a strictly increases within each block.
    block_size = 4  # small block size
    n_blocks = 125   # total elements = 4 * 125 = 500 (well under 1000)
    s = np.repeat(np.arange(n_blocks, dtype=np.int64), block_size)
    # For each block, a should run 0..(block_size-1) so within-block a is strictly increasing.
    a = np.tile(np.arange(block_size, dtype=np.int64), n_blocks)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 3.39μs -> 3.11μs (8.80% faster)

def test_large_scale_violation_inside_block():
    # Large scale negative: start with a valid large structure then introduce a single violation
    block_size = 5
    n_blocks = 100  # total 500 elements
    s = np.repeat(np.arange(n_blocks, dtype=np.int64), block_size)
    a = np.tile(np.arange(block_size, dtype=np.int64), n_blocks)
    # Introduce a violation: pick a position where two consecutive s are equal and make a non-increasing
    # Find index i that is not the last element of its block (so s[i] == s[i+1])
    # Example: choose block 10, position 2 within block gives global index = 10*block_size + 2
    idx = 10 * block_size + 2
    # Make a[idx] >= a[idx+1] by setting a[idx] to a value >= a[idx+1]
    a_mod = a.copy()
    a_mod[idx] = a_mod[idx + 1]  # equality triggers the >= check and should make function return False
    codeflash_output = _has_sorted_sa_indices(s, a_mod); res = codeflash_output # 2.89μs -> 2.73μs (5.67% faster)

def test_small_decrease_near_end():
    # Ensure detection of a late decrease in s_indices
    s = np.array([0, 0, 1, 2, 2, 1], dtype=np.int64)  # the last 2 -> 1 is a decrease
    a = np.array([0, 1, 0, 0, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s, a); res = codeflash_output # 3.01μs -> 2.83μs (6.32% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numba
import numpy as np
import pytest
from quantecon.markov.utilities import _has_sorted_sa_indices

def test_basic_sorted_ascending_state_indices():
    """
    Test that arrays with strictly ascending state indices return True.
    State indices: [0, 1, 2], Action indices: [0, 0, 0]
    """
    s_indices = np.array([0, 1, 2], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.22μs -> 3.02μs (6.59% faster)

def test_basic_sorted_with_action_indices():
    """
    Test that arrays sorted lexicographically (state first, then action) return True.
    State indices: [0, 0, 1], Action indices: [0, 1, 0]
    """
    s_indices = np.array([0, 0, 1], dtype=np.int64)
    a_indices = np.array([0, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.10μs -> 2.94μs (5.33% faster)

def test_basic_sorted_complex_sequence():
    """
    Test a more complex sorted sequence with repeated states.
    State indices: [0, 0, 0, 1, 1, 2], Action indices: [0, 1, 2, 0, 1, 0]
    """
    s_indices = np.array([0, 0, 0, 1, 1, 2], dtype=np.int64)
    a_indices = np.array([0, 1, 2, 0, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.15μs -> 2.81μs (12.2% faster)

def test_unsorted_descending_state_indices():
    """
    Test that arrays with descending state indices return False.
    State indices: [2, 1, 0], Action indices: [0, 0, 0]
    """
    s_indices = np.array([2, 1, 0], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.08μs -> 2.78μs (10.6% faster)

def test_unsorted_action_indices_same_state():
    """
    Test that same state but unsorted action indices return False.
    State indices: [0, 0, 0], Action indices: [2, 1, 0]
    """
    s_indices = np.array([0, 0, 0], dtype=np.int64)
    a_indices = np.array([2, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.04μs -> 2.89μs (5.40% faster)

def test_unsorted_duplicate_action_indices():
    """
    Test that duplicate action indices for same state return False.
    State indices: [0, 0], Action indices: [1, 1]
    """
    s_indices = np.array([0, 0], dtype=np.int64)
    a_indices = np.array([1, 1], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.06μs -> 2.85μs (7.27% faster)

def test_mixed_unsorted():
    """
    Test mixed unsorted sequence with both state and action violations.
    State indices: [0, 1, 0], Action indices: [0, 0, 0]
    """
    s_indices = np.array([0, 1, 0], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.07μs -> 2.79μs (10.2% faster)

def test_edge_single_element():
    """
    Test with single element arrays (always sorted).
    State indices: [5], Action indices: [10]
    """
    s_indices = np.array([5], dtype=np.int64)
    a_indices = np.array([10], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.99μs -> 2.67μs (12.3% faster)

def test_edge_two_elements_sorted():
    """
    Test with two elements in sorted order.
    State indices: [0, 1], Action indices: [0, 0]
    """
    s_indices = np.array([0, 1], dtype=np.int64)
    a_indices = np.array([0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.05μs -> 2.71μs (12.5% faster)

def test_edge_two_elements_unsorted_state():
    """
    Test with two elements where states are not sorted.
    State indices: [1, 0], Action indices: [0, 0]
    """
    s_indices = np.array([1, 0], dtype=np.int64)
    a_indices = np.array([0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.96μs -> 2.81μs (5.37% faster)

def test_edge_two_elements_same_state_unsorted_action():
    """
    Test with two elements same state but unsorted actions.
    State indices: [0, 0], Action indices: [1, 0]
    """
    s_indices = np.array([0, 0], dtype=np.int64)
    a_indices = np.array([1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.95μs -> 2.75μs (7.42% faster)

def test_edge_two_elements_same_state_same_action():
    """
    Test with two elements with identical state and action (boundary condition).
    State indices: [0, 0], Action indices: [0, 0]
    """
    s_indices = np.array([0, 0], dtype=np.int64)
    a_indices = np.array([0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.01μs -> 2.82μs (6.74% faster)

def test_edge_large_state_indices():
    """
    Test with very large state index values.
    State indices: [1000000, 1000001], Action indices: [0, 0]
    """
    s_indices = np.array([1000000, 1000001], dtype=np.int64)
    a_indices = np.array([0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.07μs -> 2.83μs (8.78% faster)

def test_edge_large_action_indices():
    """
    Test with very large action index values.
    State indices: [0, 0], Action indices: [1000000, 1000001]
    """
    s_indices = np.array([0, 0], dtype=np.int64)
    a_indices = np.array([1000000, 1000001], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.02μs -> 2.78μs (8.53% faster)

def test_edge_zero_indices():
    """
    Test with all zero indices (boundary value).
    State indices: [0, 0, 0], Action indices: [0, 0, 0]
    """
    s_indices = np.array([0, 0, 0], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.92μs -> 2.80μs (4.47% faster)

def test_edge_alternating_states():
    """
    Test with alternating state values (pattern that might cause issues).
    State indices: [0, 1, 0, 1], Action indices: [0, 0, 0, 0]
    """
    s_indices = np.array([0, 1, 0, 1], dtype=np.int64)
    a_indices = np.array([0, 0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.02μs -> 2.79μs (8.29% faster)

def test_edge_state_increases_action_decreases():
    """
    Test where states increase but actions decrease (should fail on action check).
    State indices: [0, 1, 2], Action indices: [2, 1, 0]
    """
    s_indices = np.array([0, 1, 2], dtype=np.int64)
    a_indices = np.array([2, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.90μs -> 2.81μs (3.24% faster)

def test_edge_negative_indices():
    """
    Test with negative index values (boundary value).
    State indices: [-2, -1, 0], Action indices: [0, 0, 0]
    """
    s_indices = np.array([-2, -1, 0], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.00μs -> 2.81μs (6.69% faster)

def test_edge_negative_unsorted():
    """
    Test with negative indices that are unsorted.
    State indices: [0, -1, -2], Action indices: [0, 0, 0]
    """
    s_indices = np.array([0, -1, -2], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.04μs -> 2.82μs (7.88% faster)

def test_edge_action_equal_at_boundary():
    """
    Test that equal action indices when state is same fails at comparison.
    This tests the >= operator in the condition.
    State indices: [0, 0, 1], Action indices: [1, 1, 0]
    """
    s_indices = np.array([0, 0, 1], dtype=np.int64)
    a_indices = np.array([1, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.98μs -> 2.85μs (4.38% faster)

def test_edge_violation_at_end():
    """
    Test where the violation occurs at the very end of the array.
    State indices: [0, 1, 2, 2], Action indices: [0, 0, 1, 0]
    """
    s_indices = np.array([0, 1, 2, 2], dtype=np.int64)
    a_indices = np.array([0, 0, 1, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.96μs -> 2.84μs (4.44% faster)

def test_edge_violation_at_start():
    """
    Test where violation occurs at the start.
    State indices: [1, 0, 2], Action indices: [0, 0, 0]
    """
    s_indices = np.array([1, 0, 2], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.94μs -> 2.76μs (6.48% faster)

def test_edge_all_same_state_ascending_action():
    """
    Test all same state with ascending action indices (should be True).
    State indices: [5, 5, 5, 5], Action indices: [0, 1, 2, 3]
    """
    s_indices = np.array([5, 5, 5, 5], dtype=np.int64)
    a_indices = np.array([0, 1, 2, 3], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.07μs -> 2.72μs (12.7% faster)

def test_large_scale_sorted_ascending():
    """
    Test with a large sorted array (1000 elements) with ascending state indices.
    All elements have strictly increasing states with action index 0.
    """
    s_indices = np.arange(1000, dtype=np.int64)
    a_indices = np.zeros(1000, dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.86μs -> 3.40μs (13.7% faster)

def test_large_scale_sorted_repeated_states():
    """
    Test large array with repeated states and ascending actions within each state.
    Pattern: 500 pairs of (state, state) with ascending action within each pair.
    """
    s_indices = np.repeat(np.arange(500, dtype=np.int64), 2)
    a_indices = np.tile(np.array([0, 1], dtype=np.int64), 500)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 4.00μs -> 3.84μs (4.30% faster)

def test_large_scale_unsorted_at_end():
    """
    Test large sorted array with unsorted violation at the very end.
    999 correct elements followed by one violation.
    """
    s_indices = np.arange(1000, dtype=np.int64)
    a_indices = np.zeros(1000, dtype=np.int64)
    # Create violation at the end
    s_indices[999] = 998  # Out of order
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.85μs -> 3.33μs (15.5% faster)

def test_large_scale_unsorted_at_middle():
    """
    Test large sorted array with violation in the middle.
    """
    s_indices = np.arange(1000, dtype=np.int64)
    a_indices = np.zeros(1000, dtype=np.int64)
    # Create violation in the middle
    s_indices[500] = 499  # Out of order at middle
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.41μs -> 3.13μs (9.04% faster)

def test_large_scale_many_same_states():
    """
    Test large array where many elements share the same state.
    Tests lexicographic sorting heavily on action indices.
    """
    # Create 100 states, each repeated 10 times with ascending actions
    s_indices = np.repeat(np.arange(100, dtype=np.int64), 10)
    a_indices = np.tile(np.arange(10, dtype=np.int64), 100)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.98μs -> 3.28μs (21.4% faster)

def test_large_scale_many_same_states_unsorted_action():
    """
    Test large array with same states but with action violation.
    100 elements with state=0 and descending action indices.
    """
    s_indices = np.zeros(100, dtype=np.int64)
    a_indices = np.arange(99, -1, -1, dtype=np.int64)  # Descending: 99, 98, ..., 0
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.03μs -> 2.77μs (9.50% faster)

def test_large_scale_large_index_values():
    """
    Test with large index values (near int64 limits) to check for overflow issues.
    """
    max_val = 2**50  # Large but safe from overflow
    s_indices = np.array([max_val, max_val + 1, max_val + 2], dtype=np.int64)
    a_indices = np.array([0, 0, 0], dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.98μs -> 2.64μs (12.8% faster)

def test_large_scale_complex_pattern():
    """
    Test with complex pattern: multiple state groups with ascending actions within each.
    10 state groups, each with 100 elements and ascending action indices.
    """
    s_indices = np.repeat(np.arange(10, dtype=np.int64), 100)
    a_indices = np.tile(np.arange(100, dtype=np.int64), 10)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 4.08μs -> 3.47μs (17.6% faster)

def test_large_scale_mostly_sorted_one_violation():
    """
    Test with 500 sorted pairs followed by one violation in action index.
    This tests performance when violation occurs late in large sequence.
    """
    s_indices = np.repeat(np.arange(500, dtype=np.int64), 2)
    a_indices = np.tile(np.array([0, 1], dtype=np.int64), 500)
    # Introduce violation at position 500 (state 250 with actions 1, then 0)
    a_indices[500] = 0  # This creates action[250,a]=1 followed by action[250,b]=0
    a_indices[501] = 0
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.44μs -> 3.25μs (5.87% faster)

def test_large_scale_ascending_both_indices():
    """
    Test with large array where both state and action indices are ascending.
    Creates a grid-like pattern.
    """
    # Create state pairs: [0,0,1,1,2,2,...,499,499]
    s_indices = np.repeat(np.arange(500, dtype=np.int64), 2)
    # Create action pairs: [0,1,0,1,0,1,...,0,1]
    a_indices = np.tile(np.array([0, 1], dtype=np.int64), 500)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 4.03μs -> 3.80μs (6.11% faster)

def test_large_scale_boundary_consecutive_states():
    """
    Test performance with many consecutive state transitions.
    Each state appears exactly once: [0, 1, 2, ..., 999]
    """
    s_indices = np.arange(1000, dtype=np.int64)
    a_indices = np.arange(1000, dtype=np.int64)
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 3.89μs -> 3.43μs (13.6% faster)

def test_large_scale_scattered_violations():
    """
    Test that first violation is detected immediately (early exit).
    Violation at position 10 out of 1000.
    """
    s_indices = np.arange(1000, dtype=np.int64)
    a_indices = np.zeros(1000, dtype=np.int64)
    # Create violation at position 10
    s_indices[10] = s_indices[9] - 1  # Break sorting early
    codeflash_output = _has_sorted_sa_indices(s_indices, a_indices); result = codeflash_output # 2.88μs -> 2.79μs (3.05% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_has_sorted_sa_indices-mkpgddfz and push.

Codeflash Static Badge

The optimized code achieves an ~8% speedup through two key changes that reduce array access overhead in Numba's nopython mode:

**1. Early Exit for Trivial Cases**
```python
if L <= 1:
    return True
```
This avoids unnecessary loop setup and iteration for empty or single-element arrays, which are trivially sorted. Test results show this helps edge cases like `test_edge_empty_arrays` (8.86% faster) and `test_edge_single_element` (5-12% faster).

**2. Reduced Array Indexing Through Caching**
The original code accesses `s_indices[i]`, `s_indices[i+1]`, `a_indices[i]`, and `a_indices[i+1]` on each iteration—up to 4 array reads per comparison. The optimized version caches previous values:
```python
prev_s = s_indices[0]
prev_a = a_indices[0]
for i in range(1, L):
    s = s_indices[i]
    # Compare prev_s with s (no array access)
    # Update prev_s = s, prev_a = a_indices[i]
```

This reduces array accesses from ~4 per iteration to ~2, which matters in Numba's nopython mode where array bounds checking and indexing have overhead. The test results confirm this helps across the board, with larger gains in tests with more iterations:
- Large-scale tests show 13-21% speedups (`test_large_scale_sorted_ascending`: 13.7%, `test_large_scale_many_same_states`: 21.4%)
- Small arrays still benefit (4-12% gains) from reduced indexing

**Impact on Production Usage**
Looking at `function_references`, this function is called during `DiscreteDP.__init__()` to check if state-action indices are pre-sorted. If unsorted, expensive sorting and data reorganization occurs. The optimization:
- **Speeds up the sorted path** (common case when data is already organized), making initialization faster
- **Speeds up early violation detection** in the unsorted path, allowing the expensive sorting fallback to trigger sooner

Since `DiscreteDP` is likely instantiated in performance-critical economic simulations (per the quantecon library context), even an 8% speedup in this validation check can compound when constructing multiple DDPs or in hot initialization loops.

The optimization is most effective for moderate-to-large arrays (100-1000+ elements) where iteration dominates, but provides consistent gains across all array sizes due to reduced indexing overhead.
@codeflash-ai codeflash-ai Bot requested a review from aseembits93 January 22, 2026 12:53
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants