Skip to content

Add concurrent lp solve and crossover at root node#602

Merged
rapids-bot[bot] merged 66 commits intoNVIDIA:release/25.12from
hlinsen:concurrent-root-solve
Dec 9, 2025
Merged

Add concurrent lp solve and crossover at root node#602
rapids-bot[bot] merged 66 commits intoNVIDIA:release/25.12from
hlinsen:concurrent-root-solve

Conversation

@hlinsen
Copy link
Copy Markdown
Contributor

@hlinsen hlinsen commented Nov 18, 2025

This PR implements concurrent root solve for MIP

Summary by CodeRabbit

  • New Features

    • Concurrent root LP solving with optional crossover and callback delivery of root relaxation solutions to downstream components.
    • Exposed settings: enable concurrent root solves, concurrent halt signaling, and GPU count for solver configuration; benchmark default CPU threads adjusted.
  • Performance Improvements

    • Improved primal/dual recomputation, PDLP integration, and early-optimal signaling for faster root conclusions.
  • Refactor

    • Streamlined LP solve paths to respect MIP context and gate concurrency accordingly.

✏️ Tip: You can customize this high-level summary in your review settings.

chris-maes and others added 30 commits June 11, 2025 16:57
Fix sign bug when crushing dual
Comment thread cpp/src/dual_simplex/branch_and_bound.cpp Outdated
Comment thread cpp/src/dual_simplex/branch_and_bound.cpp Outdated
Comment thread cpp/src/dual_simplex/branch_and_bound.cpp Outdated
Comment thread cpp/src/dual_simplex/branch_and_bound.cpp Outdated
check_constraint_bounds_sanity<i_t, f_t>(problem);
}

template <typename i_t, typename f_t>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss where these kernels are used in the PR?

Copy link
Copy Markdown
Contributor Author

@hlinsen hlinsen Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used here https://github.com/hlinsen/cuopt/blob/concurrent-root-solve/cpp/src/mip/diversity/diversity_manager.cu#L342. It usually gave me lower dual inf norm for PDLP and it is low overhead.

@hlinsen
Copy link
Copy Markdown
Contributor Author

hlinsen commented Dec 6, 2025

/ok to test 140a07f

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cpp/src/dual_simplex/branch_and_bound.cpp (1)

1279-1279: Consider validating the dual feasibility residual.

The return value dual_res_inf from crush_dual_solution (line 1279) is computed but never used or validated. While the function likely has internal assertions for severe violations, explicitly checking this value could provide better diagnostics if the crushed dual solution has unexpectedly high residual.

       f_t dual_res_inf = crush_dual_solution(original_problem_,
                                              original_lp_,
                                              new_slacks_,
                                              root_crossover_soln_.y,
                                              root_crossover_soln_.z,
                                              crushed_root_y,
                                              crushed_root_z);
+      if (dual_res_inf > 1e-4) {
+        settings_.log.printf("Warning: Crushed dual solution has high residual: %e\n", dual_res_inf);
+      }

As per coding guidelines: "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65e655b and 140a07f.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/branch_and_bound.cpp (4 hunks)
  • cpp/src/dual_simplex/branch_and_bound.hpp (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/dual_simplex/branch_and_bound.hpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-12-04T04:11:12.640Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/scaling.cpp:68-76
Timestamp: 2025-12-04T04:11:12.640Z
Learning: In the cuOPT dual simplex solver, CSR/CSC matrices (including the quadratic objective matrix Q) are required to have valid dimensions and indices by construction. Runtime bounds checking in performance-critical paths like matrix scaling is avoided to prevent slowdowns. Validation is performed via debug-only check_matrix() calls wrapped in #ifdef CHECK_MATRIX.

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
🧬 Code graph analysis (1)
cpp/src/dual_simplex/branch_and_bound.cpp (3)
cpp/src/dual_simplex/solve.hpp (1)
  • solve_linear_program_advanced (43-48)
cpp/src/dual_simplex/presolve.hpp (2)
  • crush_primal_solution (136-140)
  • crush_dual_solution (151-157)
cpp/src/dual_simplex/crossover.cpp (3)
  • crossover (1041-1395)
  • crossover (1041-1046)
  • crossover (1399-1405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: conda-cpp-build / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: conda-cpp-build / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: conda-cpp-build / 12.9.1, 3.10, arm64, rockylinux8
  • GitHub Check: conda-cpp-build / 12.9.1, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-server / 12.9.1, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
🔇 Additional comments (3)
cpp/src/dual_simplex/branch_and_bound.cpp (3)

12-12: LGTM - includes and initialization for concurrent crossover.

The addition of crossover.hpp and headers, along with the root_crossover_soln_ member initialization, properly support the new concurrent root solve feature.

Also applies to: 28-28, 222-222


1239-1250: Sequential root solve path looks correct.

The non-concurrent path (lines 1244-1250) maintains the existing sequential solve behavior. The concurrent_halt pointer setup at line 1242 enables coordinated termination with crossover in the concurrent path.

Note: Based on past review comments, ensure that the concurrent_halt mechanism uses std::atomic with proper memory ordering, not volatile.


1271-1319: Concurrent crossover logic is well-structured.

The conditional logic correctly handles the two scenarios:

  • Lines 1271-1315: If crossover solution arrives first, crush it, run crossover, and use it if OPTIMAL; otherwise fall back to dual simplex
  • Lines 1316-1318: If dual simplex finishes first (concurrent_halt becomes 1), just use its result

The synchronization is correct:

  • Line 1308 waits for the async task to complete before modifying shared state (addresses past data race concern)
  • Line 1307 only halts dual simplex when crossover succeeds with OPTIMAL (per chris-maes's guidance)
  • Lines 1301-1303 avoid logging when crossover was halted by dual simplex (per chris-maes's guidance)

Based on learnings: "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."

Comment on lines +1252 to +1269
} else {
// Root node path
std::future<lp_status_t> root_status_future;
root_status_future = std::async(std::launch::async,
&solve_linear_program_advanced<i_t, f_t>,
std::ref(original_lp_),
exploration_stats_.start_time,
std::ref(lp_settings),
std::ref(root_relax_soln_),
std::ref(root_vstatus_),
std::ref(edge_norms_));
// Wait for the root relaxation solution to be sent by the diversity manager or dual simplex
// to finish
while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
*get_global_root_concurrent_halt() == 0) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout protection to prevent indefinite wait.

The wait loop (lines 1265-1269) could spin indefinitely if both the dual simplex hangs AND the crossover solution never arrives, with no mechanism to escape. While the 1ms sleep prevents a hot busy-spin, there's no timeout to break the wait.

Consider adding a timeout check inside the loop:

     // Wait for the root relaxation solution to be sent by the diversity manager or dual simplex
     // to finish
+    f_t wait_start = tic();
     while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
            *get_global_root_concurrent_halt() == 0) {
+      if (toc(wait_start) > settings_.time_limit) {
+        settings_.log.printf("Timeout waiting for root relaxation\n");
+        solver_status_ = mip_exploration_status_t::TIME_LIMIT;
+        set_global_root_concurrent_halt(1);  // Signal dual simplex to stop
+        root_status = root_status_future.get();  // Wait for cleanup
+        return set_final_solution(solution, -inf);
+      }
       std::this_thread::sleep_for(std::chrono::milliseconds(1));
       continue;
     }

This ensures the solver cannot hang indefinitely and respects the time limit.

As per coding guidelines: "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.cpp around lines 1252-1269, the waiting
loop for root_crossover_solution_set_ can hang indefinitely; add a timeout guard
that measures elapsed time (e.g. steady_clock::now() -
exploration_stats_.start_time or another configured time limit) inside the while
loop and when the elapsed time exceeds the allowed limit break out and trigger a
safe abort path (set the global/manager halt flag or call the existing
concurrent halt setter and/or cancel/handle the root_status_future
appropriately), and log or return an error status so the solver stops rather
than spinning forever.

// Check if crossover was stopped by dual simplex
if (crossover_status == crossover_status_t::OPTIMAL) {
set_global_root_concurrent_halt(1); // Stop dual simplex
root_status = root_status_future.get();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout protection to future.get() calls to prevent indefinite blocking.

The root_status_future.get() calls at lines 1308, 1314, and 1317 will block indefinitely if the dual simplex solver hangs or encounters an infinite loop. This could cause the entire MIP solver to hang with no way to recover or respect the time limit.

Unfortunately, C++ standard futures don't support timed waits with std::launch::async. Consider one of these approaches:

Option 1: Use future.wait_for() with periodic checking:

while (root_status_future.wait_for(std::chrono::milliseconds(100)) != std::future_status::ready) {
  if (toc(exploration_stats_.start_time) > settings_.time_limit) {
    settings_.log.printf("Timeout waiting for dual simplex to complete\n");
    // Note: Cannot safely cancel std::async task, but can proceed with timeout status
    solver_status_ = mip_exploration_status_t::TIME_LIMIT;
    return set_final_solution(solution, -inf);
  }
}
root_status = root_status_future.get();

Option 2: Ensure dual simplex always checks concurrent_halt and time_limit internally so it cannot hang.

This protection should be added to all three .get() call sites.

As per coding guidelines: "Assess algorithmic complexity for large-scale problems; ensure O(n log n) or better complexity, not O(n²) or worse" and proper timeout/cancellation mechanisms.

Also applies to: 1314-1314, 1317-1317

🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.cpp around lines 1308, 1314 and 1317,
the calls to root_status_future.get() can block indefinitely; wrap each .get()
with a wait_for loop that periodically checks std::future_status::ready (e.g.,
every 50–200ms), and on each iteration verify exploration_stats_.start_time
against settings_.time_limit and concurrent halt flags; if the time limit or
halt is exceeded, log a timeout message, set solver_status_ =
mip_exploration_status_t::TIME_LIMIT (or appropriate halt status), and return
set_final_solution(solution, -inf) (noting you cannot cancel the async task),
otherwise break out and call .get() once ready; apply this pattern to all three
.get() call sites.

crossover_vstatus_);

if (crossover_status != crossover_status_t::CONCURRENT_LIMIT) {
settings_.log.printf("Crossover status: %d\n", crossover_status);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to print the Crossover status all the time. Imagine the case where dual simplex wins. I don't think we should bother the user with the fact that we halted crossover. Maybe we should only print this when Crossover is optimal?

root_relax_soln_,
root_vstatus_,
edge_norms_);
simplex_solver_settings_t lp_settings = settings_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You might consider pulling all of this out into a function. Perhaps something like concurrent_root_solve

Comment thread cpp/src/dual_simplex/branch_and_bound.hpp Outdated
Comment thread cpp/src/dual_simplex/branch_and_bound.hpp Outdated
Comment thread cpp/src/dual_simplex/branch_and_bound.hpp Outdated
Comment thread cpp/src/dual_simplex/phase2.cpp Outdated
Comment thread cpp/src/linear_programming/pdlp.cu Outdated
Copy link
Copy Markdown
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you silence the log for the concurrent run?

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/dual_simplex/presolve.cpp (1)

510-553: Avoid relying on equality_rows.size() − range_rows.size() invariant when sizing artificial vars

num_artificial_vars = equality_rows.size() - range_rows.size() assumes every range_rows[i] is also present in equality_rows. If that invariant is ever broken (e.g., new range-row encodings or future refactors), the loop over equality_rows can add more artificial columns than preallocated, causing j > num_cols and making the assert(j == num_cols)/assert(p == nnz) fail or, in release builds, leading to memory corruption.

You already build is_range_row, so you can size num_artificial_vars robustly by actually counting non-range equality rows instead of subtracting cardinalities:

-  const i_t n                   = problem.num_cols;
-  const i_t m                   = problem.num_rows;
-  const i_t num_artificial_vars = equality_rows.size() - range_rows.size();
+  const i_t n = problem.num_cols;
+  const i_t m = problem.num_rows;
+
+  std::vector<bool> is_range_row(problem.num_rows, false);
+  for (i_t i : range_rows) {
+    is_range_row[i] = true;
+  }
+
+  i_t num_artificial_vars = 0;
+  for (i_t i : equality_rows) {
+    if (!is_range_row[i]) { ++num_artificial_vars; }
+  }

(and then keep the existing loop that skips is_range_row[i]).

This keeps behavior identical under current assumptions but makes the function self-contained and safer against changes in how equality_rows/range_rows are built.

♻️ Duplicate comments (1)
cpp/src/linear_programming/solve.cu (1)

754-791: Status selection logic after concurrency is consistent with new inside‑MIP behavior

The updated selection logic:

  • Prefers dual simplex only when !settings.inside_mip and DS reports Optimal/PrimalInfeasible/DualInfeasible.
  • Otherwise prefers barrier when barrier is Optimal.
  • Otherwise prefers PDLP when PDLP is Optimal.
  • Outside MIP, falls back to dual simplex when PDLP ended with ConcurrentLimit.
  • In all other cases, returns PDLP’s status/solution.

Given the learning that the barrier solver currently never reports Infeasible/Unbounded, this branch ordering is consistent and avoids accidentally using dual simplex results inside MIP while still exploiting dual simplex outside MIP.

The copy_from(problem.handle_ptr, …) calls and logging use the same handle and are consistent with the rest of the file.

🧹 Nitpick comments (2)
cpp/src/dual_simplex/presolve.hpp (1)

153-159: crush_dual_solution return type change is consistent; consider documenting return semantics

The new f_t return type aligns with the implementation in presolve.cpp (returning the dual residual infinity norm) and is safe for existing callers that ignore the value. It would be helpful to document that the function returns ||Aᵀy + z − c||_∞ so new call sites can decide whether to assert/log on it.

cpp/src/mip/problem/problem.cuh (1)

209-213: New root‑relaxation callback member is fine; consider renaming for clarity

Adding a std::function callback for root‑relaxation data here is reasonable and consistent with the existing branch_and_bound_callback. The only nit is naming: a data member called set_root_relaxation_solution_callback reads like a setter method; something like root_relaxation_solution_callback (with a separate setter API if desired) would be clearer.

Assuming the constructors default this to an empty std::function (or nullptr) and all call sites check it before invoking, this change looks good.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 140a07f and 29ba82a.

📒 Files selected for processing (8)
  • benchmarks/linear_programming/run_mps_files.sh (1 hunks)
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1 hunks)
  • cpp/src/dual_simplex/presolve.cpp (8 hunks)
  • cpp/src/dual_simplex/presolve.hpp (1 hunks)
  • cpp/src/dual_simplex/simplex_solver_settings.hpp (1 hunks)
  • cpp/src/linear_programming/solve.cu (9 hunks)
  • cpp/src/mip/problem/problem.cu (2 hunks)
  • cpp/src/mip/problem/problem.cuh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/mip/problem/problem.cu
  • benchmarks/linear_programming/run_mps_files.sh
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/mip/problem/problem.cuh
  • cpp/src/linear_programming/solve.cu
**/*.{h,hpp,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings

Files:

  • cpp/src/dual_simplex/presolve.hpp
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/mip/problem/problem.cuh
  • cpp/src/linear_programming/solve.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/linear_programming/solve.cu
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/mip/problem/problem.cuh
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/problem/problem.cuh
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/linear_programming/solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/linear_programming/solve.cu
🔇 Additional comments (7)
cpp/src/dual_simplex/presolve.cpp (3)

810-813: Barrier‑presolve gating of artificial variables looks reasonable; ensure downstream uses of new_slacks don’t rely on them in barrier mode

Skipping add_artifical_variables when settings.barrier_presolve is true is consistent with a barrier-first workflow, but it assumes:

  • Barrier presolve/solve never relies on new_slacks being populated for equality rows, and
  • All callers that later use new_slacks are on the non‑barrier‑presolve path.

Please double‑check those assumptions for the new concurrent/root‑solve flows.


1535-1541: Explicit instantiation updated correctly for new return type

The explicit instantiation of crush_dual_solution<int,double> now matches the new f_t return type and parameter list; no ABI or template mismatch issues here.


1240-1316: Range‑row‑aware dual crushing is consistent with primal transformation; verify all call sites honor row‑count invariant

The updated crush_dual_solution:

  • Correctly sets z[j] = y[i] for range rows (whose slack columns have A(i,j) = −1) and z[j] = −y[i] for non‑range slack columns (A(i,j) = +1), so Aᵀy + z = c still holds after introducing slacks.
  • Builds is_range_row from user_problem.range_rows, which matches how convert_range_rows creates the new slack columns.
  • Returns dual_res_inf = ||Aᵀy + z − c||_∞, which is consistent with the new return type and the header declaration.

Two points to verify:

  1. The assertion assert(user_problem.num_rows == problem.num_rows); is relied upon by the sign logic and indexing; confirm every call site passes a problem with the same row count as user_problem (never called after presolve row aggregation/removal).
  2. If the returned dual_res_inf is intended for use in concurrent/root‑solve plumbing, consider either checking it against a tolerance at call sites instead of asserting, or annotating the function [[nodiscard]] once callers are ready to consume it.
cpp/src/linear_programming/solve.cu (4)

538-539: Inside‑MIP hint into PDLP solver is wired cleanly

Passing settings.inside_mip down via solver.set_inside_mip(true) is a straightforward way to specialize PDLP behavior for root LP solves; no issues here as long as pdlp_solver_t uses this only for behavior toggles and not structural assumptions about the problem.


806-807: solve_lp_with_method correctly forwards to updated run_concurrent signature

The call now passes only problem, settings, timer, and is_batch_mode, which matches the updated run_concurrent signature (no op_problem parameter). This keeps the public API of solve_lp_with_method aligned with the refactoring.


911-912: solve_lp now delegates through solve_lp_with_method(problem, settings, …) consistently

The shift to auto solution = solve_lp_with_method(problem, settings, lp_timer, is_batch_mode); is consistent with the new solve_lp_with_method signature and keeps the main driver agnostic of the particular LP method (PDLP, dual simplex, barrier, or concurrent). No issues here.


670-744: Concurrent halt flag must use atomic synchronization, not volatile

The code shares global_concurrent_halt across PDLP, barrier, and dual simplex threads. In C++, volatile int provides no inter-thread synchronization guarantees; this is a data race. Replace with std::atomic<int> and use store()/load() with std::memory_order_relaxed:

std::atomic<int> global_concurrent_halt{0};
global_concurrent_halt.store(0, std::memory_order_relaxed);
settings_pdlp.concurrent_halt = reinterpret_cast<volatile int*>(&global_concurrent_halt);

If the downstream signatures can accept std::atomic<int>* directly, prefer that over the reinterpret_cast. Verify all uses of concurrent_halt in run_pdlp, run_barrier_thread, and related functions accept this change.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
cpp/src/dual_simplex/branch_and_bound.cpp (2)

1265-1269: Add timeout protection to the wait loop.

The wait loop spins indefinitely with no timeout check. If both the dual simplex task hangs AND the crossover solution never arrives, this loop will run forever. While the 1ms sleep prevents a hot busy-spin, there's no mechanism to escape if time limits are exceeded.

Add a timeout check inside the loop:

     // Wait for the root relaxation solution to be sent by the diversity manager or dual simplex
     // to finish
+    f_t wait_start = tic();
     while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
            *get_root_concurrent_halt() == 0) {
+      if (toc(wait_start) > settings_.time_limit) {
+        settings_.log.printf("Timeout waiting for root relaxation\n");
+        solver_status_ = mip_exploration_status_t::TIME_LIMIT;
+        set_root_concurrent_halt(1);
+        root_status_future.wait();  // Ensure task cleanup
+        return set_final_solution(solution, -inf);
+      }
       std::this_thread::sleep_for(std::chrono::milliseconds(1));
       continue;
     }

1306-1318: Protect future.get() calls with timeout checks.

The three root_status_future.get() calls (lines 1308, 1314, 1317) can block indefinitely if the dual simplex solver hangs, enters an infinite loop, or deadlocks. C++ standard futures don't support cancellation, so there's no way to forcibly terminate the async task.

Use wait_for() with periodic timeout checks:

       if (crossover_status == crossover_status_t::OPTIMAL) {
         set_root_concurrent_halt(1);  // Stop dual simplex
-        root_status = root_status_future.get();
+        // Wait with timeout protection
+        while (root_status_future.wait_for(std::chrono::milliseconds(100)) != std::future_status::ready) {
+          if (toc(exploration_stats_.start_time) > settings_.time_limit) {
+            settings_.log.printf("Timeout waiting for dual simplex to complete\n");
+            solver_status_ = mip_exploration_status_t::TIME_LIMIT;
+            return set_final_solution(solution, -inf);
+          }
+        }
+        root_status = root_status_future.get();
         // Override the root relaxation solution with the crossover solution

Apply similar protection to the .get() calls at lines 1314 and 1317.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29ba82a and b960154.

📒 Files selected for processing (3)
  • cpp/src/dual_simplex/branch_and_bound.cpp (4 hunks)
  • cpp/src/dual_simplex/branch_and_bound.hpp (4 hunks)
  • cpp/src/dual_simplex/phase2.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/dual_simplex/phase2.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
**/*.{h,hpp,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings

Files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-12-04T04:11:12.640Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/src/dual_simplex/scaling.cpp:68-76
Timestamp: 2025-12-04T04:11:12.640Z
Learning: In the cuOPT dual simplex solver, CSR/CSC matrices (including the quadratic objective matrix Q) are required to have valid dimensions and indices by construction. Runtime bounds checking in performance-critical paths like matrix scaling is avoided to prevent slowdowns. Validation is performed via debug-only check_matrix() calls wrapped in #ifdef CHECK_MATRIX.

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
  • cpp/src/dual_simplex/branch_and_bound.hpp
🔇 Additional comments (7)
cpp/src/dual_simplex/branch_and_bound.cpp (4)

12-12: Includes for concurrent crossover are appropriate.

The addition of crossover.hpp and <future> correctly supports the new concurrent root solve and crossover functionality.

Also applies to: 28-28


222-222: Constructor initialization is correct.

The root_crossover_soln_(1, 1) initialization follows the same pattern as root_relax_soln_ and will be properly resized when populated.


1239-1262: Concurrent root solve setup is mostly correct, but verify volatile int synchronization.

The async launch correctly uses std::ref to pass references to the dual simplex solver. However, note that get_root_concurrent_halt() returns a volatile int* (line 1242), which provides insufficient synchronization guarantees for multi-threaded access. This should be addressed in the header file where the member is declared.


1271-1319: Verify concurrent access to shared state is properly synchronized.

The algorithm launches dual simplex asynchronously (writing to root_relax_soln_, root_vstatus_, edge_norms_) and then on the main thread performs crossover operations. When crossover succeeds (line 1306), it halts dual simplex and overrides the solution. When crossover doesn't succeed (lines 1313-1318), it waits for and uses the dual simplex result.

The synchronization appears correct: future.get() blocks until the async task completes, ensuring all writes by the dual simplex are visible. However, verify that:

  1. The dual simplex solver respects the concurrent_halt flag promptly to avoid long blocking on line 1308
  2. No other threads are reading these shared members during this phase
  3. The crushing operations (lines 1272-1289) correctly map the external PDLP solution space to the original_lp_ problem space
cpp/src/dual_simplex/branch_and_bound.hpp (3)

18-18: Clarify the need for utilities/macros.cuh include.

This header includes a CUDA utility header (<utilities/macros.cuh>) in what appears to be a CPU-side branch-and-bound implementation. If this is only needed for certain macros, ensure it doesn't introduce unnecessary CUDA build dependencies or compilation requirements for CPU-only code paths.


82-98: Publish-subscribe pattern is correctly implemented.

The set_root_relaxation_solution method uses the standard publish-subscribe pattern: multiple non-atomic writes followed by an atomic store with memory_order_release (line 97). This ensures that all prior writes to root_crossover_soln_ and root_objective_ become visible to any thread that observes the flag via memory_order_acquire (as done in line 1271 of the .cpp file).

This is correct as long as:

  1. Only one thread ever calls this method (the PDLP worker)
  2. All readers check the atomic flag before accessing the data

164-164: New member variables for crossover handling are appropriate.

The additions of crossover_vstatus_ (line 164), root_crossover_soln_ (line 167), and the atomic flag root_crossover_solution_set_ (line 169) correctly support the concurrent crossover functionality. The atomic flag properly guards access to the non-atomic members via the publish-subscribe pattern.

Also applies to: 167-167, 169-169

// Set a solution based on the user problem during the course of the solve
void set_new_solution(const std::vector<f_t>& solution);

void set_concurrent_lp_root_solve(bool enable) { enable_concurrent_lp_root_solve_ = enable; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure thread-safe access to enable_concurrent_lp_root_solve_ flag.

The enable_concurrent_lp_root_solve_ member (line 170) is a plain bool accessed via setter (line 103) and getter (line 114) without synchronization. If set_concurrent_lp_root_solve() could be called concurrently with enable_concurrent_lp_root_solve() or solve(), this creates a data race (undefined behavior).

Option 1 (Preferred): Make it atomic:

-  bool enable_concurrent_lp_root_solve_{false};
+  std::atomic<bool> enable_concurrent_lp_root_solve_{false};

Option 2: Document single-threaded initialization requirement:

+  // Must be called before solve() in single-threaded context
   void set_concurrent_lp_root_solve(bool enable) { enable_concurrent_lp_root_solve_ = enable; }

As per coding guidelines: "Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state."

Also applies to: 114-114, 170-170

Comment on lines +115 to +116
volatile int* get_root_concurrent_halt() { return &root_concurrent_halt_; }
void set_root_concurrent_halt(int value) { root_concurrent_halt_ = value; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace volatile int with std::atomic for thread synchronization.

The root_concurrent_halt_ member (line 171) uses volatile int, which is incorrect for thread synchronization in C++. volatile only prevents compiler optimizations but provides no atomicity or memory ordering guarantees. This can lead to data races, lost writes, and non-deterministic behavior in concurrent code.

The pointer returned by get_root_concurrent_halt() (line 115) is passed to other components (e.g., line 1242 in the .cpp file) and dereferenced in multi-threaded contexts, making this a critical issue.

Fix: Replace with std::atomic<int> and update all access patterns:

- volatile int* get_root_concurrent_halt() { return &root_concurrent_halt_; }
- void set_root_concurrent_halt(int value) { root_concurrent_halt_ = value; }
+ std::atomic<int>* get_root_concurrent_halt() { return &root_concurrent_halt_; }
+ void set_root_concurrent_halt(int value) { root_concurrent_halt_.store(value, std::memory_order_release); }
- volatile int root_concurrent_halt_{0};
+ std::atomic<int> root_concurrent_halt_{0};

Then update all usages to use .load(std::memory_order_acquire) for reads and .store(val, std::memory_order_release) for writes.

As per coding guidelines: "Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state" and "Ensure race conditions are absent in multi-threaded server implementations."

Also applies to: 171-171

🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.hpp around lines 115-116 (and member at
line 171), replace the volatile int root_concurrent_halt_ with std::atomic<int>
(include <atomic>), change the getter to expose a pointer or reference to
std::atomic<int> (e.g., std::atomic<int>* or std::atomic<int>&) instead of
volatile int*, and update the setter to use .store(value,
std::memory_order_release); then update all call sites that read/write this
variable to use .load(std::memory_order_acquire) for reads and .store(...,
std::memory_order_release) for writes to ensure proper atomicity and memory
ordering.

@hlinsen
Copy link
Copy Markdown
Contributor Author

hlinsen commented Dec 9, 2025

/ok to test cc0b3b8

Copy link
Copy Markdown
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Hugo! This is exciting.

@hlinsen
Copy link
Copy Markdown
Contributor Author

hlinsen commented Dec 9, 2025

/merge

@rapids-bot rapids-bot bot merged commit 465f89f into NVIDIA:release/25.12 Dec 9, 2025
181 of 183 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants