⚡️ Speed up function solve_discrete_riccati_system by 20%#124
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function solve_discrete_riccati_system by 20%#124codeflash-ai[bot] wants to merge 1 commit intomainfrom
solve_discrete_riccati_system by 20%#124codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **19% speedup** by reducing redundant matrix operations in the computationally intensive inner loop of the Riccati equation solver. ## Key Optimizations **1. Precomputed Transposes (Lines 70-73)** The original code computed `As[i].T`, `Bs[i].T`, and `Ns[i].T` repeatedly inside nested loops. The optimized version precomputes these once before the main iteration loop, eliminating ~13,000+ redundant transpose operations across all iterations. **2. Reuse of Matrix Products (Lines 103-104)** The critical optimization is computing `Ps[j] @ As[i]` and `Ps[j] @ Bs[i]` **once per j** and reusing them: - `PtA = Ps[j] @ As[i]` is used in both sum1 (line 107) and RHS (line 113) - `PtB = Ps[j] @ Bs[i]` is used in K (line 111) and left (line 118) This reduces the matrix multiplications from ~27,000 to ~13,000 operations, cutting the compute-heavy portion nearly in half. The line profiler shows the solve() call dropping from 52.9% to 47.7% of runtime as a percentage because surrounding operations became faster. **3. Loop-Invariant Hoisting (Lines 92-101)** By extracting `As[i]`, `Bs[i]`, `Qs[i]`, `Rs[i]`, `Ns[i]`, and their transposes into local variables before the inner j-loop, we avoid ~6,700 array indexing operations per outer loop iteration. **4. Buffer Swapping (Line 123)** Instead of copying the entire Ps array with `Ps[:, :, :] = Ps1[:, :, :]`, the optimized code swaps references (`Ps, Ps1 = Ps1, Ps`), eliminating ~3,200 full array copies. **5. Efficient Zero-Filling (Lines 89-90)** Using `sum1.fill(0.)` instead of `sum1[:, :] = 0.` is slightly more efficient as it's a direct C-level operation rather than a slice assignment. ## Test Case Performance The optimizations are particularly effective for: - **Multi-state systems** (20-23% faster): test_large_scale_many_states, test_basic_two_state_markov - **Larger matrices** (20-21% faster): test_large_scale_dimension_10, test_large_scale_five_states - **Many iterations** (14-18% faster): test_basic_single_state_identity_matrices, test_edge_very_tight_tolerance The gains are consistent across all test cases (10-23% improvement), with larger systems and more iterations showing greater benefits due to the compounding effect of reduced matrix operations. ## Impact on Workloads Based on `function_references`, this function is called from `stationary_values()` in the Markov Jump Linear Quadratic control solver. Since that method involves solving the Riccati system once per invocation and then performing additional computations with the result, the 19% improvement in `solve_discrete_riccati_system` translates to meaningful wall-clock savings in any application repeatedly solving these control problems (e.g., economic models, optimal control simulations).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 20% (0.20x) speedup for
solve_discrete_riccati_systeminquantecon/_matrix_eqn.py⏱️ Runtime :
195 milliseconds→163 milliseconds(best of13runs)📝 Explanation and details
The optimized code achieves a 19% speedup by reducing redundant matrix operations in the computationally intensive inner loop of the Riccati equation solver.
Key Optimizations
1. Precomputed Transposes (Lines 70-73)
The original code computed
As[i].T,Bs[i].T, andNs[i].Trepeatedly inside nested loops. The optimized version precomputes these once before the main iteration loop, eliminating ~13,000+ redundant transpose operations across all iterations.2. Reuse of Matrix Products (Lines 103-104)
The critical optimization is computing
Ps[j] @ As[i]andPs[j] @ Bs[i]once per j and reusing them:PtA = Ps[j] @ As[i]is used in both sum1 (line 107) and RHS (line 113)PtB = Ps[j] @ Bs[i]is used in K (line 111) and left (line 118)This reduces the matrix multiplications from ~27,000 to ~13,000 operations, cutting the compute-heavy portion nearly in half. The line profiler shows the solve() call dropping from 52.9% to 47.7% of runtime as a percentage because surrounding operations became faster.
3. Loop-Invariant Hoisting (Lines 92-101)
By extracting
As[i],Bs[i],Qs[i],Rs[i],Ns[i], and their transposes into local variables before the inner j-loop, we avoid ~6,700 array indexing operations per outer loop iteration.4. Buffer Swapping (Line 123)
Instead of copying the entire Ps array with
Ps[:, :, :] = Ps1[:, :, :], the optimized code swaps references (Ps, Ps1 = Ps1, Ps), eliminating ~3,200 full array copies.5. Efficient Zero-Filling (Lines 89-90)
Using
sum1.fill(0.)instead ofsum1[:, :] = 0.is slightly more efficient as it's a direct C-level operation rather than a slice assignment.Test Case Performance
The optimizations are particularly effective for:
The gains are consistent across all test cases (10-23% improvement), with larger systems and more iterations showing greater benefits due to the compounding effect of reduced matrix operations.
Impact on Workloads
Based on
function_references, this function is called fromstationary_values()in the Markov Jump Linear Quadratic control solver. Since that method involves solving the Riccati system once per invocation and then performing additional computations with the result, the 19% improvement insolve_discrete_riccati_systemtranslates to meaningful wall-clock savings in any application repeatedly solving these control problems (e.g., economic models, optimal control simulations).✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-solve_discrete_riccati_system-mkpb8ongand push.