⚡️ Speed up function solve_discrete_riccati by 7%#123
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function solve_discrete_riccati by 7%#123codeflash-ai[bot] wants to merge 1 commit intomainfrom
solve_discrete_riccati by 7%#123codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **6% speedup** by **batching multiple linear system solves into single operations**, reducing the overhead of repeated matrix factorizations. ## Key Optimization: Batched Solve Operations **What changed:** Instead of calling `solve()` three separate times with the same coefficient matrix `Z` (or `R_hat`, or `C2`), the code now: 1. Concatenates multiple right-hand sides using `np.concatenate()` 2. Solves once with the batched matrix 3. Extracts the individual solutions via slicing **Example from the gamma selection loop:** ```python # Original: 3 separate solves (~38ms total in profiler) Q_tilde = -Q + (N.T @ solve(Z, N + gamma * BTA)) + gamma * I G0 = B @ solve(Z, B.T) A0 = (I - gamma * G0) @ A - (B @ solve(Z, N)) # Optimized: 1 batched solve (~14ms in profiler) rhs_stacked = np.concatenate((N + gamma * BTA, B.T, N), axis=1) sol = solve(Z, rhs_stacked) sol1, sol2, sol3 = sol[:, :k], sol[:, k:2*k], sol[:, 2*k:] Q_tilde = -Q + (N.T @ sol1) + gamma * I G0 = B @ sol2 A0 = (I - gamma * G0) @ A - (B @ sol3) ``` **Why this is faster:** - `solve()` performs expensive matrix factorization (LU decomposition) on the coefficient matrix - With batched operations, the factorization happens **once** instead of multiple times - Line profiler shows this reduces time from ~38ms to ~14ms in the gamma loop (63% faster for that section) ## Impact on Main Loop The same optimization applies to the iterative doubling loop: ```python # Original: 2 separate solves per iteration G1 = G0 + ((A0 @ G0) @ solve(I + (H0 @ G0), A0.T)) H1 = H0 + (A0.T @ solve(I + (H0 @ G0), (H0 @ A0))) # Optimized: 1 batched solve per iteration rhs_c2 = np.concatenate((A0.T, H0 @ A0), axis=1) sol_c2 = solve(C2, rhs_c2) G1 = G0 + ((A0 @ G0) @ sol_c2[:, :k]) H1 = H0 + (A0.T @ sol_c2[:, k:]) ``` This cuts the solve operations in half within the convergence loop, which runs ~177 times in typical cases. ## Performance Characteristics Based on the annotated tests: - **Best speedup (10-14%)**: Small to medium systems (k=2-20) where the solve overhead is most significant relative to total runtime - **Moderate speedup (6-10%)**: Larger systems where other operations (matrix multiplications, condition number calculations) dominate - **Minimal impact**: Cases using `method='qz'` (bypasses the doubling algorithm entirely) ## Workload Impact The `function_references` shows this is called from `Kalman.stationary_values()`, which computes steady-state Kalman gains. The optimization will benefit: - Scenarios requiring repeated Riccati solutions (e.g., parameter sweeps, Monte Carlo simulations) - Real-time applications where every millisecond counts - Large-scale economic models where the Kalman filter is in a hot path The speedup is most valuable when `solve_discrete_riccati` is called frequently, as the 6% improvement compounds across multiple invocations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
solve_discrete_riccatiinquantecon/_matrix_eqn.py⏱️ Runtime :
60.7 milliseconds→57.0 milliseconds(best of58runs)📝 Explanation and details
The optimized code achieves a 6% speedup by batching multiple linear system solves into single operations, reducing the overhead of repeated matrix factorizations.
Key Optimization: Batched Solve Operations
What changed:
Instead of calling
solve()three separate times with the same coefficient matrixZ(orR_hat, orC2), the code now:np.concatenate()Example from the gamma selection loop:
Why this is faster:
solve()performs expensive matrix factorization (LU decomposition) on the coefficient matrixImpact on Main Loop
The same optimization applies to the iterative doubling loop:
This cuts the solve operations in half within the convergence loop, which runs ~177 times in typical cases.
Performance Characteristics
Based on the annotated tests:
method='qz'(bypasses the doubling algorithm entirely)Workload Impact
The
function_referencesshows this is called fromKalman.stationary_values(), which computes steady-state Kalman gains. The optimization will benefit:The speedup is most valuable when
solve_discrete_riccatiis called frequently, as the 6% improvement compounds across multiple invocations.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-solve_discrete_riccati-mkpb0d50and push.