Add concurrent lp solve and crossover at root node#602
Add concurrent lp solve and crossover at root node#602rapids-bot[bot] merged 66 commits intoNVIDIA:release/25.12from
Conversation
Fix sign bug when crushing dual
… concurrent-root-solve
… concurrent-root-solve
| check_constraint_bounds_sanity<i_t, f_t>(problem); | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> |
There was a problem hiding this comment.
Did I miss where these kernels are used in the PR?
There was a problem hiding this comment.
They are used here https://github.com/hlinsen/cuopt/blob/concurrent-root-solve/cpp/src/mip/diversity/diversity_manager.cu#L342. It usually gave me lower dual inf norm for PDLP and it is low overhead.
|
/ok to test 140a07f |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cpp/src/dual_simplex/branch_and_bound.cpp (1)
1279-1279: Consider validating the dual feasibility residual.The return value
dual_res_inffromcrush_dual_solution(line 1279) is computed but never used or validated. While the function likely has internal assertions for severe violations, explicitly checking this value could provide better diagnostics if the crushed dual solution has unexpectedly high residual.f_t dual_res_inf = crush_dual_solution(original_problem_, original_lp_, new_slacks_, root_crossover_soln_.y, root_crossover_soln_.z, crushed_root_y, crushed_root_z); + if (dual_res_inf > 1e-4) { + settings_.log.printf("Warning: Crushed dual solution has high residual: %e\n", dual_res_inf); + }As per coding guidelines: "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/src/dual_simplex/branch_and_bound.cpp(4 hunks)cpp/src/dual_simplex/branch_and_bound.hpp(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/src/dual_simplex/branch_and_bound.hpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Files:
cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/dual_simplex/branch_and_bound.cpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-12-04T04:11:12.640Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/scaling.cpp:68-76
Timestamp: 2025-12-04T04:11:12.640Z
Learning: In the cuOPT dual simplex solver, CSR/CSC matrices (including the quadratic objective matrix Q) are required to have valid dimensions and indices by construction. Runtime bounds checking in performance-critical paths like matrix scaling is avoided to prevent slowdowns. Validation is performed via debug-only check_matrix() calls wrapped in #ifdef CHECK_MATRIX.
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
🧬 Code graph analysis (1)
cpp/src/dual_simplex/branch_and_bound.cpp (3)
cpp/src/dual_simplex/solve.hpp (1)
solve_linear_program_advanced(43-48)cpp/src/dual_simplex/presolve.hpp (2)
crush_primal_solution(136-140)crush_dual_solution(151-157)cpp/src/dual_simplex/crossover.cpp (3)
crossover(1041-1395)crossover(1041-1046)crossover(1399-1405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: conda-cpp-build / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: conda-cpp-build / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: conda-cpp-build / 12.9.1, 3.10, arm64, rockylinux8
- GitHub Check: conda-cpp-build / 12.9.1, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-server / 12.9.1, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
🔇 Additional comments (3)
cpp/src/dual_simplex/branch_and_bound.cpp (3)
12-12: LGTM - includes and initialization for concurrent crossover.The addition of crossover.hpp and headers, along with the root_crossover_soln_ member initialization, properly support the new concurrent root solve feature.
Also applies to: 28-28, 222-222
1239-1250: Sequential root solve path looks correct.The non-concurrent path (lines 1244-1250) maintains the existing sequential solve behavior. The concurrent_halt pointer setup at line 1242 enables coordinated termination with crossover in the concurrent path.
Note: Based on past review comments, ensure that the concurrent_halt mechanism uses
std::atomicwith proper memory ordering, notvolatile.
1271-1319: Concurrent crossover logic is well-structured.The conditional logic correctly handles the two scenarios:
- Lines 1271-1315: If crossover solution arrives first, crush it, run crossover, and use it if OPTIMAL; otherwise fall back to dual simplex
- Lines 1316-1318: If dual simplex finishes first (concurrent_halt becomes 1), just use its result
The synchronization is correct:
- Line 1308 waits for the async task to complete before modifying shared state (addresses past data race concern)
- Line 1307 only halts dual simplex when crossover succeeds with OPTIMAL (per chris-maes's guidance)
- Lines 1301-1303 avoid logging when crossover was halted by dual simplex (per chris-maes's guidance)
Based on learnings: "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."
| } else { | ||
| // Root node path | ||
| std::future<lp_status_t> root_status_future; | ||
| root_status_future = std::async(std::launch::async, | ||
| &solve_linear_program_advanced<i_t, f_t>, | ||
| std::ref(original_lp_), | ||
| exploration_stats_.start_time, | ||
| std::ref(lp_settings), | ||
| std::ref(root_relax_soln_), | ||
| std::ref(root_vstatus_), | ||
| std::ref(edge_norms_)); | ||
| // Wait for the root relaxation solution to be sent by the diversity manager or dual simplex | ||
| // to finish | ||
| while (!root_crossover_solution_set_.load(std::memory_order_acquire) && | ||
| *get_global_root_concurrent_halt() == 0) { | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(1)); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Add timeout protection to prevent indefinite wait.
The wait loop (lines 1265-1269) could spin indefinitely if both the dual simplex hangs AND the crossover solution never arrives, with no mechanism to escape. While the 1ms sleep prevents a hot busy-spin, there's no timeout to break the wait.
Consider adding a timeout check inside the loop:
// Wait for the root relaxation solution to be sent by the diversity manager or dual simplex
// to finish
+ f_t wait_start = tic();
while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
*get_global_root_concurrent_halt() == 0) {
+ if (toc(wait_start) > settings_.time_limit) {
+ settings_.log.printf("Timeout waiting for root relaxation\n");
+ solver_status_ = mip_exploration_status_t::TIME_LIMIT;
+ set_global_root_concurrent_halt(1); // Signal dual simplex to stop
+ root_status = root_status_future.get(); // Wait for cleanup
+ return set_final_solution(solution, -inf);
+ }
std::this_thread::sleep_for(std::chrono::milliseconds(1));
continue;
}This ensures the solver cannot hang indefinitely and respects the time limit.
As per coding guidelines: "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.cpp around lines 1252-1269, the waiting
loop for root_crossover_solution_set_ can hang indefinitely; add a timeout guard
that measures elapsed time (e.g. steady_clock::now() -
exploration_stats_.start_time or another configured time limit) inside the while
loop and when the elapsed time exceeds the allowed limit break out and trigger a
safe abort path (set the global/manager halt flag or call the existing
concurrent halt setter and/or cancel/handle the root_status_future
appropriately), and log or return an error status so the solver stops rather
than spinning forever.
| // Check if crossover was stopped by dual simplex | ||
| if (crossover_status == crossover_status_t::OPTIMAL) { | ||
| set_global_root_concurrent_halt(1); // Stop dual simplex | ||
| root_status = root_status_future.get(); |
There was a problem hiding this comment.
Add timeout protection to future.get() calls to prevent indefinite blocking.
The root_status_future.get() calls at lines 1308, 1314, and 1317 will block indefinitely if the dual simplex solver hangs or encounters an infinite loop. This could cause the entire MIP solver to hang with no way to recover or respect the time limit.
Unfortunately, C++ standard futures don't support timed waits with std::launch::async. Consider one of these approaches:
Option 1: Use future.wait_for() with periodic checking:
while (root_status_future.wait_for(std::chrono::milliseconds(100)) != std::future_status::ready) {
if (toc(exploration_stats_.start_time) > settings_.time_limit) {
settings_.log.printf("Timeout waiting for dual simplex to complete\n");
// Note: Cannot safely cancel std::async task, but can proceed with timeout status
solver_status_ = mip_exploration_status_t::TIME_LIMIT;
return set_final_solution(solution, -inf);
}
}
root_status = root_status_future.get();Option 2: Ensure dual simplex always checks concurrent_halt and time_limit internally so it cannot hang.
This protection should be added to all three .get() call sites.
As per coding guidelines: "Assess algorithmic complexity for large-scale problems; ensure O(n log n) or better complexity, not O(n²) or worse" and proper timeout/cancellation mechanisms.
Also applies to: 1314-1314, 1317-1317
🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.cpp around lines 1308, 1314 and 1317,
the calls to root_status_future.get() can block indefinitely; wrap each .get()
with a wait_for loop that periodically checks std::future_status::ready (e.g.,
every 50–200ms), and on each iteration verify exploration_stats_.start_time
against settings_.time_limit and concurrent halt flags; if the time limit or
halt is exceeded, log a timeout message, set solver_status_ =
mip_exploration_status_t::TIME_LIMIT (or appropriate halt status), and return
set_final_solution(solution, -inf) (noting you cannot cancel the async task),
otherwise break out and call .get() once ready; apply this pattern to all three
.get() call sites.
| crossover_vstatus_); | ||
|
|
||
| if (crossover_status != crossover_status_t::CONCURRENT_LIMIT) { | ||
| settings_.log.printf("Crossover status: %d\n", crossover_status); |
There was a problem hiding this comment.
I'm not sure we want to print the Crossover status all the time. Imagine the case where dual simplex wins. I don't think we should bother the user with the fact that we halted crossover. Maybe we should only print this when Crossover is optimal?
| root_relax_soln_, | ||
| root_vstatus_, | ||
| edge_norms_); | ||
| simplex_solver_settings_t lp_settings = settings_; |
There was a problem hiding this comment.
Nit: You might consider pulling all of this out into a function. Perhaps something like concurrent_root_solve
chris-maes
left a comment
There was a problem hiding this comment.
Can you silence the log for the concurrent run?
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/dual_simplex/presolve.cpp (1)
510-553: Avoid relying on equality_rows.size() − range_rows.size() invariant when sizing artificial vars
num_artificial_vars = equality_rows.size() - range_rows.size()assumes everyrange_rows[i]is also present inequality_rows. If that invariant is ever broken (e.g., new range-row encodings or future refactors), the loop overequality_rowscan add more artificial columns than preallocated, causingj > num_colsand making theassert(j == num_cols)/assert(p == nnz)fail or, in release builds, leading to memory corruption.You already build
is_range_row, so you can sizenum_artificial_varsrobustly by actually counting non-range equality rows instead of subtracting cardinalities:- const i_t n = problem.num_cols; - const i_t m = problem.num_rows; - const i_t num_artificial_vars = equality_rows.size() - range_rows.size(); + const i_t n = problem.num_cols; + const i_t m = problem.num_rows; + + std::vector<bool> is_range_row(problem.num_rows, false); + for (i_t i : range_rows) { + is_range_row[i] = true; + } + + i_t num_artificial_vars = 0; + for (i_t i : equality_rows) { + if (!is_range_row[i]) { ++num_artificial_vars; } + }(and then keep the existing loop that skips
is_range_row[i]).This keeps behavior identical under current assumptions but makes the function self-contained and safer against changes in how
equality_rows/range_rowsare built.
♻️ Duplicate comments (1)
cpp/src/linear_programming/solve.cu (1)
754-791: Status selection logic after concurrency is consistent with new inside‑MIP behaviorThe updated selection logic:
- Prefers dual simplex only when
!settings.inside_mipand DS reports Optimal/PrimalInfeasible/DualInfeasible.- Otherwise prefers barrier when barrier is Optimal.
- Otherwise prefers PDLP when PDLP is Optimal.
- Outside MIP, falls back to dual simplex when PDLP ended with
ConcurrentLimit.- In all other cases, returns PDLP’s status/solution.
Given the learning that the barrier solver currently never reports Infeasible/Unbounded, this branch ordering is consistent and avoids accidentally using dual simplex results inside MIP while still exploiting dual simplex outside MIP.
The
copy_from(problem.handle_ptr, …)calls and logging use the same handle and are consistent with the rest of the file.
🧹 Nitpick comments (2)
cpp/src/dual_simplex/presolve.hpp (1)
153-159: crush_dual_solution return type change is consistent; consider documenting return semanticsThe new
f_treturn type aligns with the implementation inpresolve.cpp(returning the dual residual infinity norm) and is safe for existing callers that ignore the value. It would be helpful to document that the function returns||Aᵀy + z − c||_∞so new call sites can decide whether to assert/log on it.cpp/src/mip/problem/problem.cuh (1)
209-213: New root‑relaxation callback member is fine; consider renaming for clarityAdding a
std::functioncallback for root‑relaxation data here is reasonable and consistent with the existingbranch_and_bound_callback. The only nit is naming: a data member calledset_root_relaxation_solution_callbackreads like a setter method; something likeroot_relaxation_solution_callback(with a separate setter API if desired) would be clearer.Assuming the constructors default this to an empty
std::function(ornullptr) and all call sites check it before invoking, this change looks good.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
benchmarks/linear_programming/run_mps_files.sh(1 hunks)cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp(1 hunks)cpp/src/dual_simplex/presolve.cpp(8 hunks)cpp/src/dual_simplex/presolve.hpp(1 hunks)cpp/src/dual_simplex/simplex_solver_settings.hpp(1 hunks)cpp/src/linear_programming/solve.cu(9 hunks)cpp/src/mip/problem/problem.cu(2 hunks)cpp/src/mip/problem/problem.cuh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
- cpp/src/dual_simplex/simplex_solver_settings.hpp
- cpp/src/mip/problem/problem.cu
- benchmarks/linear_programming/run_mps_files.sh
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/mip/problem/problem.cuhcpp/src/linear_programming/solve.cu
**/*.{h,hpp,py}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings
Files:
cpp/src/dual_simplex/presolve.hpp
**/*.{cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cpp
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/src/mip/problem/problem.cuhcpp/src/linear_programming/solve.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/src/linear_programming/solve.cu
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
Applied to files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/mip/problem/problem.cuhcpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Applied to files:
cpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/dual_simplex/presolve.cppcpp/src/linear_programming/solve.cu
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Applied to files:
cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/src/linear_programming/solve.cu
🔇 Additional comments (7)
cpp/src/dual_simplex/presolve.cpp (3)
810-813: Barrier‑presolve gating of artificial variables looks reasonable; ensure downstream uses of new_slacks don’t rely on them in barrier modeSkipping
add_artifical_variableswhensettings.barrier_presolveis true is consistent with a barrier-first workflow, but it assumes:
- Barrier presolve/solve never relies on
new_slacksbeing populated for equality rows, and- All callers that later use
new_slacksare on the non‑barrier‑presolve path.Please double‑check those assumptions for the new concurrent/root‑solve flows.
1535-1541: Explicit instantiation updated correctly for new return typeThe explicit instantiation of
crush_dual_solution<int,double>now matches the newf_treturn type and parameter list; no ABI or template mismatch issues here.
1240-1316: Range‑row‑aware dual crushing is consistent with primal transformation; verify all call sites honor row‑count invariantThe updated
crush_dual_solution:
- Correctly sets
z[j] = y[i]for range rows (whose slack columns haveA(i,j) = −1) andz[j] = −y[i]for non‑range slack columns (A(i,j) = +1), soAᵀy + z = cstill holds after introducing slacks.- Builds
is_range_rowfromuser_problem.range_rows, which matches howconvert_range_rowscreates the new slack columns.- Returns
dual_res_inf = ||Aᵀy + z − c||_∞, which is consistent with the new return type and the header declaration.Two points to verify:
- The assertion
assert(user_problem.num_rows == problem.num_rows);is relied upon by the sign logic and indexing; confirm every call site passes aproblemwith the same row count asuser_problem(never called after presolve row aggregation/removal).- If the returned
dual_res_infis intended for use in concurrent/root‑solve plumbing, consider either checking it against a tolerance at call sites instead of asserting, or annotating the function[[nodiscard]]once callers are ready to consume it.cpp/src/linear_programming/solve.cu (4)
538-539: Inside‑MIP hint into PDLP solver is wired cleanlyPassing
settings.inside_mipdown viasolver.set_inside_mip(true)is a straightforward way to specialize PDLP behavior for root LP solves; no issues here as long aspdlp_solver_tuses this only for behavior toggles and not structural assumptions about the problem.
806-807: solve_lp_with_method correctly forwards to updated run_concurrent signatureThe call now passes only
problem,settings,timer, andis_batch_mode, which matches the updatedrun_concurrentsignature (noop_problemparameter). This keeps the public API ofsolve_lp_with_methodaligned with the refactoring.
911-912: solve_lp now delegates through solve_lp_with_method(problem, settings, …) consistentlyThe shift to
auto solution = solve_lp_with_method(problem, settings, lp_timer, is_batch_mode);is consistent with the newsolve_lp_with_methodsignature and keeps the main driver agnostic of the particular LP method (PDLP, dual simplex, barrier, or concurrent). No issues here.
670-744: Concurrent halt flag must use atomic synchronization, not volatileThe code shares
global_concurrent_haltacross PDLP, barrier, and dual simplex threads. In C++,volatile intprovides no inter-thread synchronization guarantees; this is a data race. Replace withstd::atomic<int>and usestore()/load()withstd::memory_order_relaxed:std::atomic<int> global_concurrent_halt{0}; global_concurrent_halt.store(0, std::memory_order_relaxed); settings_pdlp.concurrent_halt = reinterpret_cast<volatile int*>(&global_concurrent_halt);If the downstream signatures can accept
std::atomic<int>*directly, prefer that over thereinterpret_cast. Verify all uses ofconcurrent_haltinrun_pdlp,run_barrier_thread, and related functions accept this change.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
cpp/src/dual_simplex/branch_and_bound.cpp (2)
1265-1269: Add timeout protection to the wait loop.The wait loop spins indefinitely with no timeout check. If both the dual simplex task hangs AND the crossover solution never arrives, this loop will run forever. While the 1ms sleep prevents a hot busy-spin, there's no mechanism to escape if time limits are exceeded.
Add a timeout check inside the loop:
// Wait for the root relaxation solution to be sent by the diversity manager or dual simplex // to finish + f_t wait_start = tic(); while (!root_crossover_solution_set_.load(std::memory_order_acquire) && *get_root_concurrent_halt() == 0) { + if (toc(wait_start) > settings_.time_limit) { + settings_.log.printf("Timeout waiting for root relaxation\n"); + solver_status_ = mip_exploration_status_t::TIME_LIMIT; + set_root_concurrent_halt(1); + root_status_future.wait(); // Ensure task cleanup + return set_final_solution(solution, -inf); + } std::this_thread::sleep_for(std::chrono::milliseconds(1)); continue; }
1306-1318: Protect future.get() calls with timeout checks.The three
root_status_future.get()calls (lines 1308, 1314, 1317) can block indefinitely if the dual simplex solver hangs, enters an infinite loop, or deadlocks. C++ standard futures don't support cancellation, so there's no way to forcibly terminate the async task.Use
wait_for()with periodic timeout checks:if (crossover_status == crossover_status_t::OPTIMAL) { set_root_concurrent_halt(1); // Stop dual simplex - root_status = root_status_future.get(); + // Wait with timeout protection + while (root_status_future.wait_for(std::chrono::milliseconds(100)) != std::future_status::ready) { + if (toc(exploration_stats_.start_time) > settings_.time_limit) { + settings_.log.printf("Timeout waiting for dual simplex to complete\n"); + solver_status_ = mip_exploration_status_t::TIME_LIMIT; + return set_final_solution(solution, -inf); + } + } + root_status = root_status_future.get(); // Override the root relaxation solution with the crossover solutionApply similar protection to the
.get()calls at lines 1314 and 1317.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cpp/src/dual_simplex/branch_and_bound.cpp(4 hunks)cpp/src/dual_simplex/branch_and_bound.hpp(4 hunks)cpp/src/dual_simplex/phase2.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/src/dual_simplex/phase2.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
**/*.{cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
**/*.{h,hpp,py}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings
Files:
cpp/src/dual_simplex/branch_and_bound.hpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-12-04T04:11:12.640Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/scaling.cpp:68-76
Timestamp: 2025-12-04T04:11:12.640Z
Learning: In the cuOPT dual simplex solver, CSR/CSC matrices (including the quadratic objective matrix Q) are required to have valid dimensions and indices by construction. Runtime bounds checking in performance-critical paths like matrix scaling is avoided to prevent slowdowns. Validation is performed via debug-only check_matrix() calls wrapped in #ifdef CHECK_MATRIX.
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/dual_simplex/branch_and_bound.cppcpp/src/dual_simplex/branch_and_bound.hpp
🔇 Additional comments (7)
cpp/src/dual_simplex/branch_and_bound.cpp (4)
12-12: Includes for concurrent crossover are appropriate.The addition of
crossover.hppand<future>correctly supports the new concurrent root solve and crossover functionality.Also applies to: 28-28
222-222: Constructor initialization is correct.The
root_crossover_soln_(1, 1)initialization follows the same pattern asroot_relax_soln_and will be properly resized when populated.
1239-1262: Concurrent root solve setup is mostly correct, but verify volatile int synchronization.The async launch correctly uses
std::refto pass references to the dual simplex solver. However, note thatget_root_concurrent_halt()returns avolatile int*(line 1242), which provides insufficient synchronization guarantees for multi-threaded access. This should be addressed in the header file where the member is declared.
1271-1319: Verify concurrent access to shared state is properly synchronized.The algorithm launches dual simplex asynchronously (writing to
root_relax_soln_,root_vstatus_,edge_norms_) and then on the main thread performs crossover operations. When crossover succeeds (line 1306), it halts dual simplex and overrides the solution. When crossover doesn't succeed (lines 1313-1318), it waits for and uses the dual simplex result.The synchronization appears correct:
future.get()blocks until the async task completes, ensuring all writes by the dual simplex are visible. However, verify that:
- The dual simplex solver respects the
concurrent_haltflag promptly to avoid long blocking on line 1308- No other threads are reading these shared members during this phase
- The crushing operations (lines 1272-1289) correctly map the external PDLP solution space to the
original_lp_problem spacecpp/src/dual_simplex/branch_and_bound.hpp (3)
18-18: Clarify the need for utilities/macros.cuh include.This header includes a CUDA utility header (
<utilities/macros.cuh>) in what appears to be a CPU-side branch-and-bound implementation. If this is only needed for certain macros, ensure it doesn't introduce unnecessary CUDA build dependencies or compilation requirements for CPU-only code paths.
82-98: Publish-subscribe pattern is correctly implemented.The
set_root_relaxation_solutionmethod uses the standard publish-subscribe pattern: multiple non-atomic writes followed by an atomic store withmemory_order_release(line 97). This ensures that all prior writes toroot_crossover_soln_androot_objective_become visible to any thread that observes the flag viamemory_order_acquire(as done in line 1271 of the .cpp file).This is correct as long as:
- Only one thread ever calls this method (the PDLP worker)
- All readers check the atomic flag before accessing the data
164-164: New member variables for crossover handling are appropriate.The additions of
crossover_vstatus_(line 164),root_crossover_soln_(line 167), and the atomic flagroot_crossover_solution_set_(line 169) correctly support the concurrent crossover functionality. The atomic flag properly guards access to the non-atomic members via the publish-subscribe pattern.Also applies to: 167-167, 169-169
| // Set a solution based on the user problem during the course of the solve | ||
| void set_new_solution(const std::vector<f_t>& solution); | ||
|
|
||
| void set_concurrent_lp_root_solve(bool enable) { enable_concurrent_lp_root_solve_ = enable; } |
There was a problem hiding this comment.
Ensure thread-safe access to enable_concurrent_lp_root_solve_ flag.
The enable_concurrent_lp_root_solve_ member (line 170) is a plain bool accessed via setter (line 103) and getter (line 114) without synchronization. If set_concurrent_lp_root_solve() could be called concurrently with enable_concurrent_lp_root_solve() or solve(), this creates a data race (undefined behavior).
Option 1 (Preferred): Make it atomic:
- bool enable_concurrent_lp_root_solve_{false};
+ std::atomic<bool> enable_concurrent_lp_root_solve_{false};Option 2: Document single-threaded initialization requirement:
+ // Must be called before solve() in single-threaded context
void set_concurrent_lp_root_solve(bool enable) { enable_concurrent_lp_root_solve_ = enable; }As per coding guidelines: "Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state."
Also applies to: 114-114, 170-170
| volatile int* get_root_concurrent_halt() { return &root_concurrent_halt_; } | ||
| void set_root_concurrent_halt(int value) { root_concurrent_halt_ = value; } |
There was a problem hiding this comment.
Replace volatile int with std::atomic for thread synchronization.
The root_concurrent_halt_ member (line 171) uses volatile int, which is incorrect for thread synchronization in C++. volatile only prevents compiler optimizations but provides no atomicity or memory ordering guarantees. This can lead to data races, lost writes, and non-deterministic behavior in concurrent code.
The pointer returned by get_root_concurrent_halt() (line 115) is passed to other components (e.g., line 1242 in the .cpp file) and dereferenced in multi-threaded contexts, making this a critical issue.
Fix: Replace with std::atomic<int> and update all access patterns:
- volatile int* get_root_concurrent_halt() { return &root_concurrent_halt_; }
- void set_root_concurrent_halt(int value) { root_concurrent_halt_ = value; }
+ std::atomic<int>* get_root_concurrent_halt() { return &root_concurrent_halt_; }
+ void set_root_concurrent_halt(int value) { root_concurrent_halt_.store(value, std::memory_order_release); }- volatile int root_concurrent_halt_{0};
+ std::atomic<int> root_concurrent_halt_{0};Then update all usages to use .load(std::memory_order_acquire) for reads and .store(val, std::memory_order_release) for writes.
As per coding guidelines: "Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state" and "Ensure race conditions are absent in multi-threaded server implementations."
Also applies to: 171-171
🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.hpp around lines 115-116 (and member at
line 171), replace the volatile int root_concurrent_halt_ with std::atomic<int>
(include <atomic>), change the getter to expose a pointer or reference to
std::atomic<int> (e.g., std::atomic<int>* or std::atomic<int>&) instead of
volatile int*, and update the setter to use .store(value,
std::memory_order_release); then update all call sites that read/write this
variable to use .load(std::memory_order_acquire) for reads and .store(...,
std::memory_order_release) for writes to ensure proper atomicity and memory
ordering.
|
/ok to test cc0b3b8 |
chris-maes
left a comment
There was a problem hiding this comment.
LGTM. Thanks Hugo! This is exciting.
|
/merge |
This PR implements concurrent root solve for MIP
Summary by CodeRabbit
New Features
Performance Improvements
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.