⚡️ Speed up function _has_sorted_sa_indices by 8%#127
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function _has_sorted_sa_indices by 8%#127codeflash-ai[bot] wants to merge 1 commit intomainfrom
_has_sorted_sa_indices by 8%#127codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves an ~8% speedup through two key changes that reduce array access overhead in Numba's nopython mode:
**1. Early Exit for Trivial Cases**
```python
if L <= 1:
return True
```
This avoids unnecessary loop setup and iteration for empty or single-element arrays, which are trivially sorted. Test results show this helps edge cases like `test_edge_empty_arrays` (8.86% faster) and `test_edge_single_element` (5-12% faster).
**2. Reduced Array Indexing Through Caching**
The original code accesses `s_indices[i]`, `s_indices[i+1]`, `a_indices[i]`, and `a_indices[i+1]` on each iteration—up to 4 array reads per comparison. The optimized version caches previous values:
```python
prev_s = s_indices[0]
prev_a = a_indices[0]
for i in range(1, L):
s = s_indices[i]
# Compare prev_s with s (no array access)
# Update prev_s = s, prev_a = a_indices[i]
```
This reduces array accesses from ~4 per iteration to ~2, which matters in Numba's nopython mode where array bounds checking and indexing have overhead. The test results confirm this helps across the board, with larger gains in tests with more iterations:
- Large-scale tests show 13-21% speedups (`test_large_scale_sorted_ascending`: 13.7%, `test_large_scale_many_same_states`: 21.4%)
- Small arrays still benefit (4-12% gains) from reduced indexing
**Impact on Production Usage**
Looking at `function_references`, this function is called during `DiscreteDP.__init__()` to check if state-action indices are pre-sorted. If unsorted, expensive sorting and data reorganization occurs. The optimization:
- **Speeds up the sorted path** (common case when data is already organized), making initialization faster
- **Speeds up early violation detection** in the unsorted path, allowing the expensive sorting fallback to trigger sooner
Since `DiscreteDP` is likely instantiated in performance-critical economic simulations (per the quantecon library context), even an 8% speedup in this validation check can compound when constructing multiple DDPs or in hot initialization loops.
The optimization is most effective for moderate-to-large arrays (100-1000+ elements) where iteration dominates, but provides consistent gains across all array sizes due to reduced indexing overhead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 8% (0.08x) speedup for
_has_sorted_sa_indicesinquantecon/markov/utilities.py⏱️ Runtime :
146 microseconds→135 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves an ~8% speedup through two key changes that reduce array access overhead in Numba's nopython mode:
1. Early Exit for Trivial Cases
This avoids unnecessary loop setup and iteration for empty or single-element arrays, which are trivially sorted. Test results show this helps edge cases like
test_edge_empty_arrays(8.86% faster) andtest_edge_single_element(5-12% faster).2. Reduced Array Indexing Through Caching
The original code accesses
s_indices[i],s_indices[i+1],a_indices[i], anda_indices[i+1]on each iteration—up to 4 array reads per comparison. The optimized version caches previous values:This reduces array accesses from ~4 per iteration to ~2, which matters in Numba's nopython mode where array bounds checking and indexing have overhead. The test results confirm this helps across the board, with larger gains in tests with more iterations:
test_large_scale_sorted_ascending: 13.7%,test_large_scale_many_same_states: 21.4%)Impact on Production Usage
Looking at
function_references, this function is called duringDiscreteDP.__init__()to check if state-action indices are pre-sorted. If unsorted, expensive sorting and data reorganization occurs. The optimization:Since
DiscreteDPis likely instantiated in performance-critical economic simulations (per the quantecon library context), even an 8% speedup in this validation check can compound when constructing multiple DDPs or in hot initialization loops.The optimization is most effective for moderate-to-large arrays (100-1000+ elements) where iteration dominates, but provides consistent gains across all array sizes due to reduced indexing overhead.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-_has_sorted_sa_indices-mkpgddfzand push.