Conversation
…copying overhead Implements selective saving of ODE integration steps to reduce array copying overhead identified through profiling. Only saves every Nth step (default N=10) while always preserving steps near rational surfaces for physics accuracy. Changes: - Added save_interval parameter to DconControl (default: 10) - Modified integrator_callback! in Ode.jl to implement smart saving logic: * Always saves first 2 steps after rational surfaces * Always saves last steps before next rational surface * Saves every Nth step in between * Automatically saves final point - Updated Solovev example dcon.toml with save_interval parameter Performance results (DIIID-like example, n=3, numpert_total=60): - Baseline (save_interval=1): 49.25 ± 12.87 seconds - Optimized (save_interval=10): 27.60 ± 1.78 seconds - Speedup: 1.78x (44% faster) - Data reduction: 77% fewer steps saved (1286 → 288) Profiling showed 20% of runtime was array copying in savevalues!/recursivecopy. This optimization directly addresses that bottleneck while preserving all physics-critical data points near rational surface crossings. Note: Parameter named "save_interval" (not "save_stride") to avoid confusion with the STRIDE stability code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… in-place operations Changes: - DconStructs.jl: Replace @kwdef with explicit constructor for OdeState to avoid macro overhead and ensure predictable initialization - Fourfit.jl: Pre-allocate temp1/temp2 workspace matrices, use ldiv! for in-place linear solves instead of allocating with backslash - Ode.jl: Pre-allocate workspace matrices in transform_u! to avoid allocations inside loops Benchmark results (DIIID-like_ideal_example, 3 runs each): Baseline: 53.94 ± 4.16 seconds, 2,923,797 allocations Optimized: 48.88 ± 0.54 seconds, 2,909,699 allocations Speedup: ~9.4% faster, ~14k fewer allocations Variance: Much more consistent timing (±0.54 vs ±4.16) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@claude summarize this branch for the PR |
|
Claude finished @logan-nc's task in 2m 15s —— View job Summary of julia_speedups BranchThis PR introduces significant performance improvements to the DCON (stability analysis) module through targeted optimizations that reduce both runtime and memory allocations. The branch also adds infrastructure for systematic Julia performance optimization. Key Performance Improvements1. Save Interval Optimization (Commit aa8e502)Impact: 1.78x speedup (44% faster) Implemented selective saving of ODE integration steps to reduce array copying overhead:
Benchmark (DIIID-like example, n=3, numpert_total=60):
Technical Details:
2. Allocation Reduction via Explicit Constructor and In-Place Operations (Commit f225a4f)Impact: 9.4% speedup with 14k fewer allocations Three main optimizations targeting memory allocation overhead: a) Explicit Constructor for OdeState (src/DCON/DconStructs.jl:352-454)
b) Pre-allocated Workspace in Fourfit.jl (src/DCON/Fourfit.jl)
c) Pre-allocated Workspace in transform_u! (src/DCON/Ode.jl)
Benchmark (DIIID-like_ideal_example, 3 runs each):
Infrastructure Improvements3. Julia Performance Optimizer Agent (Commit 1fbd460)Added
Files Changed
Combined ImpactThe two main optimizations work together:
TestingAll optimizations maintain correctness:
--- | Branch: julia_speedups |
|
@jhalpern30 this 2x speedup seems promising. The ai also seems to have hinted that parallelizing chunked integration would be within its abilities (i.e. basically stride??) but I decided to leave that large of a change until after #121 is merged into develop. I personally think the change to |
…llocations Changes: - Ode.jl: Replace broadcast .= with copyto! for saving u_store/ud_store - Sing.jl: Use cholesky! with pre-allocated workspace instead of cholesky() - DconStructs.jl: Add chol_workspace field to OdeState Benchmark results (DIIID-like_ideal_example, 3 runs): Previous: 48.88 ± 0.54 seconds Current: 35.28 ± 3.00 seconds Speedup: ~28% faster Note: Eigenvalue workspace optimization was considered but deferred due to low impact (0.3-0.5%) vs high implementation complexity (requires LAPACK). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…zation Changes: - Free.jl: Replace broadcasting `A * coeffs` with mul!(tmp, A, coeffs) + copyto! to avoid allocating temporary matrices in the normalization loop This eliminates ~4 matrix allocations per integration step (229 steps), reducing memory pressure during free boundary computation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@jhalpern30 this is ready for review and I promise not to add anything else! I couldn't help reprofiling and adding the last two minor improvements. But let's get this merged and then your big refractor of the integration bounds in before we dig any deeper |
jhalpern30
left a comment
There was a problem hiding this comment.
Looks great! Just a few requested cleanups, mostly pertaining to optimizations that were added that I don't think add any actual optimization and just reduce clarity, and it should be ready to go
| psi_remaining = abs(integrator.sol.prob.tspan[2] - integrator.t) | ||
| near_end = psi_remaining < 0.05 * psi_range || psi_remaining < 1e-4 | ||
|
|
||
| # Check if we're at the beginning (first 2 steps capture the point right after rational) |
There was a problem hiding this comment.
Overall, I like this change, and the comments make this logic clear. Its not clear to me why we need the first 2 steps to capture the point right after the rational, can you add something extra here? Is it for a derivative?
There was a problem hiding this comment.
@logan-nc almost all of the prep work on this branch is done now, I just need your input on this last thing here. When you get a chance, give this comment a quick look, and also check out issue 1. from Claude below and just confirm that that's not an issue (I don't think it is, but is worth a look).
After this, its ready to get merged in
Performance Optimization ReviewThis PR implements several performance optimizations in the DCON module for Julia speedups. The changes focus on reducing allocations and improving computational efficiency. Here's my analysis: Code Quality & Best PracticesPositive aspects:
Issues identified:
Performance ConsiderationsGood optimizations:
Potential concerns:
Test CoverageAdequate:
Missing:
Recommendations
Minor Issues
SummaryThis PR makes reasonable performance improvements through reduced allocations and in-place operations. However, there are implementation issues (particularly the save_interval logic) that need correction, and some of the "optimizations" may not actually improve performance (Free.jl). The lack of benchmarks makes it hard to assess the actual speedup achieved. I recommend fixing the identified issues and adding performance tests before merging. |
…nto julia_speedups
… PR that don't contribute to speed and cleaning up some sections
|
Summary of my changes: |
Code Review: Julia speedupsThis PR implements several performance optimizations to the DCON module. The changes are focused on reducing allocations and improving memory efficiency in the ODE integration and post-processing steps. Performance ImprovementsOde.jl:323-365 - Selective step saving with save_interval
Ode.jl:563-628 - Reduced allocations in transform_u!
Free.jl:125-134 - Eigenvector normalization optimization
Fourfit.jl:202-203, 304-310 - Matrix factorization optimization
Issues FoundCritical: Bug in Ode.jl:351 The save logic has a flaw. Line 351 uses: This uses the global step counter odet.step, but when save_interval > 1, the step counter doesn't increment every integration step—only when should_save is true. This creates a circular dependency where steps are saved based on a counter that only increments when saving occurs. Fix: Use the integrator's step count or a separate counter that increments every callback. For example, track total callback invocations separately and use that for the modulo check. Minor: Typo in Ode.jl:569 Minor: Missing whitespace in Ode.jl:589 Questions
Minor Formatting
Overall AssessmentThe optimization approach is sound and follows Julia best practices for reducing allocations. However, the save_interval logic bug needs to be fixed before merging as it could cause incorrect results or test failures. Once fixed, this should provide useful performance improvements for large-scale runs. |
@claude summarize this branch for the PR