[WIP] Unified threading model in MIP solver by nguidotti · Pull Request #1099 · NVIDIA/cuopt

nguidotti · 2026-04-14T16:27:37Z

In this PR, we migrate almost all parts of the MIP solver from std::thread to OpenMP (in particular, the tasking model of OpenMP). The only exception is the Papilo presolver that uses Intel TBB and the LP solver.

More specifically, this PR

Solves the CPU oversubscription problem. The solver now respect the number of threads set to the user, with the exception of Papilo or threads created by the CUDA runtime.
Removes overheads from creating and destroying std::thread.
Migrates RINS from std::thread to omp task. Similar to previous logic, one instance of RINS can run at a time.
Migrates CPU FJ from std::thread to omp task. There are a few limitations
- scratch_cpu_fj_on_lp_opt and scratch_cpu_fj are running for the entire program. This essentially allocate two dedicated threads to these functions, while other routines needs to share the remaining CPU resources. This may hurt the performance for low core count CPUs.
- Since there is a small delay between the task creation and its start (since the threads may be busy), the GPU FJ may finish before the CPU FJ even start when racing.
Migrate early FJ to omp task.
Now, only a single parallel region created at the beginning of the solver, so it can be shared across the MIP solver.
Eliminate cpu_worker_thread and other redundant code

Pending: Requires full MIPLIB run. Performance tuning.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

… shares the same thread pool. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

copy-pr-bot · 2026-04-14T16:27:54Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-14T16:50:23Z

@CodeRabbit review

coderabbitai · 2026-04-14T16:50:30Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-14T17:07:10Z

📝 Walkthrough

Walkthrough

The PR refactors parallelism infrastructure from thread-based (std::thread) to OpenMP task-based concurrency across branch-and-bound, heuristics, presolve, and local search components. The cpu_worker_thread_base_t CRTP infrastructure is removed entirely, replaced by OpenMP task directives with explicit default(none) data-sharing clauses throughout.

Changes

Cohort / File(s)	Summary
Branch-and-bound core `cpp/src/branch_and_bound/branch_and_bound.cpp`, `cpp/src/branch_and_bound/pseudo_costs.cpp`	Refactored task affinity and data-sharing in branch-and-bound coordination; replaced `std::future`-based async with OpenMP tasks; tightened variable scoping with `default(none)` and explicit `firstprivate`/`shared` clauses; replaced synchronization primitives from `future.get()` to `#pragma omp taskwait`.
RINS thread replacement `cpp/src/mip_heuristics/diversity/lns/rins.cu`, `cpp/src/mip_heuristics/diversity/lns/rins.cuh`	Removed `rins_thread_t` worker-thread infrastructure; replaced thread lifecycle with OpenMP task execution via `#pragma omp task`; converted mutex/atomic synchronization to `omp_atomic_t` and task dependencies; removed public `stop_rins()` method; updated PDLP and DS branching to use task-based execution with explicit data-sharing.
Feasibility jump CPU solver refactoring `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`, `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh`, `cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu`, `cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cuh`, `cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh`	Removed `cpu_fj_thread_t` wrapper and replaced thread-based CPU FJ solving with standalone `cpufj_solve()` function called via OpenMP tasks; changed solver loop signature from returning `bool` to `void` with pointer parameter; removed public `cpu_solve()` method; updated internal member types from `cpu_fj_thread_t` to `fj_cpu_climber_t`.
Local search CPU FJ integration `cpp/src/mip_heuristics/local_search/local_search.cu`, `cpp/src/mip_heuristics/local_search/local_search.cuh`	Replaced `cpu_fj_thread_t` with direct `fj_cpu_climber_t` pointers in vectors; switched CPU solver orchestration from explicit thread start/stop to OpenMP task-based execution; added instance members `omp_atomic_t<f_t> local_search_best_obj` and `population_t* pop_ptr` for shared state; refactored improvement callback access patterns.
Presolve threading configuration `cpp/src/mip_heuristics/presolve/bounds_presolve.cuh`, `cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu`, `cpp/src/mip_heuristics/presolve/probing_cache.cu`	Changed concurrency model from thread-based (`num_threads`) to task-based (`num_tasks`); converted `#pragma omp parallel for` to `#pragma omp taskloop` with explicit `default(none)` clauses; refactored per-worker pool sizing and indexing to use task indices instead of thread IDs; removed thread-count-based parallelism configuration.
Diversity and top-level solver `cpp/src/mip_heuristics/diversity/diversity_manager.cu`, `cpp/src/mip_heuristics/solve.cu`, `cpp/src/mip_heuristics/solver.cu`	Removed early-exit calls to `stop_rins()`; refactored branch-and-bound launch from `std::async`/`std::future` to OpenMP taskgroup with task-based execution; added OpenMP master region setup with nested parallelism depth configuration; simplified thread derivation logic for B&B threading; replaced future-based synchronization with task dependencies.
Infrastructure removal `cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh`	Removed entire `cpu_worker_thread_base_t<Derived>` CRTP base class including lifecycle management (`start_cpu_solver()`, `wait_for_cpu_solver()`, `request_termination()`), threading primitives (`std::thread`, `std::mutex`, `std::condition_variable`), and state atomics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title '[WIP] Unified threading model in MIP solver' accurately describes the main objective: migrating the MIP solver from std::thread to OpenMP's tasking model.
Description check	✅ Passed	The PR description is directly related to the changeset, providing detailed context about the threading model migration from std::thread to OpenMP across multiple solver components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/mip_heuristics/diversity/lns/rins.cu (1)

102-105: ⚠️ Potential issue | 🟠 Major

Reset launch_new_task with a guard, not manual scattered returns.

The early return at Line 104 leaves launch_new_task == false, so one transient !dm.population.is_feasible() result can permanently disable all future RINS launches.

💡 Suggested fix

 void rins_t<i_t, f_t>::run_rins()
 {
   raft::common::nvtx::range fun_scope("Running RINS");
+
+  struct launch_flag_guard_t {
+    rins_t* self;
+    ~launch_flag_guard_t() { self->launch_new_task = true; }
+  } launch_flag_guard{this};
 
   RAFT_CUDA_TRY(cudaSetDevice(context.handle_ptr->get_device()));
@@
   {
     std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex);
     if (!dm.population.is_feasible()) return;
@@
-  if (!best_sol.get_feasible()) {
-    launch_new_task = true;
-    return;
-  }
+  if (!best_sol.get_feasible()) { return; }
@@
   if (fractional_ratio < settings.min_fractional_ratio) {
     CUOPT_LOG_TRACE("RINS fractional ratio too low, aborting");
-    launch_new_task = true;
     return;
   }
@@
   if (n_to_fix == 0) {
     CUOPT_LOG_DEBUG("RINS no variables to fix");
-    launch_new_task = true;
     return;
   }
@@
-  launch_new_task = true;
 }

Also applies to: 118-121, 143-146, 168-171, 343-343

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/lns/rins.cu` around lines 102 - 105, The
code currently does an early return when dm.population.is_feasible() is false
which leaves the global/outer flag launch_new_task permanently false; wrap the
critical section with a RAII-style guard that saves the current launch_new_task
value and restores it on scope exit (or use std::scope_exit) so any early return
will reset launch_new_task back to its prior value; apply this pattern around
the blocks that check dm.population.is_feasible() (the locations using
std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex) and the
subsequent checks) so launch_new_task is restored even when returning early.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1054-1139: The OpenMP task/taskloop regions in strong_branching()
(invoking batch_pdlp_strong_branching_task and strong_branch_helper) assume an
active parallel team; wrap the task-emitting block so it always executes inside
a parallel single region when not already in a parallel region: check
omp_in_parallel() and, if false, create a local `#pragma` omp parallel { `#pragma`
omp single nowait { ... } } around the task/taskloop code (preserving
firstprivate/shared clauses and concurrent_halt, pc, sb_view, etc.); otherwise
leave the existing task emission unchanged. Ensure the same guard or local
parallel/single scaffold is applied to reliable_variable_selection()'s analogous
task regions so tasks don’t silently run serially.

In `@cpp/src/mip_heuristics/diversity/lns/rins.cu`:
- Around line 84-87: The cudaSetDevice call is incorrectly guarded by if
(total_calls == 0) so threads other than the first may not set the correct
device; remove that guard and call
RAFT_CUDA_TRY(cudaSetDevice(context.handle_ptr->get_device())) unconditionally
at the start of the RINS execution (near fun_scope) so every invocation/thread
sets the proper CUDA device (refer to total_calls and the
RAFT_CUDA_TRY(cudaSetDevice(...)) call).

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1423-1436: The code builds time_limit by casting in_time_limit to
int then to milliseconds before checking for infinity, which can overflow when
in_time_limit == infinity(); modify cpufj_solve() to test
std::isfinite(in_time_limit) (or in_time_limit < infinity()) first and only
construct time_limit when finite (e.g., assign
std::optional<std::chrono::milliseconds> time_limit or a boolean
has_time_limit), then in the loop replace the current comparison with a guarded
check using has_time_limit and time_limit against now - loop_time_start; update
references to time_limit, loop_time_start and the in_time_limit conversion
accordingly to avoid the int cast on infinity.

In `@cpp/src/mip_heuristics/local_search/local_search.cu`:
- Around line 247-252: The OpenMP taskloop uses num_tasks(ls_cpu_fj.size())
which is invalid when ls_cpu_fj.size()==0 (e.g., num_cpufj_threads == 0); guard
the pragma by checking that ls_cpu_fj is non-empty (or num_cpufj_threads > 0)
before emitting the `#pragma` omp taskgroup / `#pragma` omp taskloop block so the
num_tasks clause is only evaluated for a positive task count; locate the block
around ls_cpu_fj and cpufj_solve and wrap the pragmas and for-loop in an if
(ls_cpu_fj.empty() == false) (or equivalent) conditional to skip the pragmas
when there are zero CPUFJ threads.

In `@cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu`:
- Around line 249-251: The OpenMP taskloop with default(none) uses
problem.n_constraints in the loop bound but never declares it in the
data-sharing clause; hoist the bound into a local int (e.g., int n_constraints =
problem.n_constraints;) and change the for-loop to use n_constraints, then add
that local to the pragma as firstprivate(n_constraints) so the loop bound is
explicitly available to each task (update the pragma on the taskloop that
currently names cnstr_pair and shared(offsets, variables, reverse_offsets,
reverse_constraints, constraint_pairs_h)).

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu`:
- Around line 913-925: The loop computes chunk offsets with begin/end relative
to the window [step_start, step_end) but then indexes priority_indices using i
directly and logs using an undefined id; fix by adding the step_start offset
when computing the index (use priority_indices[step_start + i] or adjust
begin/end to be absolute) and replace the undefined log identifier in the
CUOPT_LOG_TRACE call with the correct thread/task identifier (use task_id or
another existing variable such as multi_probe_presolve_pool index) so the code
compiles and processes the correct slice without reprocessing the prefix.

In `@cpp/src/mip_heuristics/solve.cu`:
- Around line 577-586: The code currently forces any explicit
settings_const.num_cpu_threads value below 4 up to 4 and logs an error; instead
preserve the user's explicit cap: when settings_const.num_cpu_threads >= 0 set
num_threads = settings_const.num_cpu_threads (or at minimum 1 if you want to
guard against zero/negative inputs), but do not silently raise 1..3 to 4; if you
still want to inform users about potentially low thread counts emit a warning
via CUOPT_LOG_ERROR or CUOPT_LOG_WARN referencing settings_const.num_cpu_threads
and num_threads, and leave the omp_get_max_threads() branch unchanged so
omp_get_max_threads() continues to control threads when the setting is negative.
- Around line 588-590: The call to omp_set_max_active_levels(2) in solve_mip()
changes global OpenMP state and must be restored before returning; save the
previous value (e.g., int prev = omp_get_max_active_levels()) immediately before
calling omp_set_max_active_levels(2) and restore it after the nested parallel
region or at function exit (ensure restoration on all return paths and
exceptions), or encapsulate this logic in a small RAII-style helper so
solve_mip() leaves OpenMP nesting levels unchanged for callers.

In `@cpp/src/mip_heuristics/solver.cu`:
- Around line 452-459: The omp task calling branch_and_bound->solve(...) can
throw and will cause std::terminate if the exception escapes the task; capture
exceptions inside the task into a std::exception_ptr (e.g., named bb_exception)
instead of letting them propagate, set branch_and_bound_status and
branch_and_bound_solution as before, then after the `#pragma` omp taskgroup
completes check if bb_exception is non-null and rethrow it
(std::rethrow_exception) so outer solve_mip() handlers can catch it; reference
the omp task that assigns branch_and_bound_status and uses
branch_and_bound_solution and branch_and_bound->solve to add this
capture-and-rethrow behavior.

---

Outside diff comments:
In `@cpp/src/mip_heuristics/diversity/lns/rins.cu`:
- Around line 102-105: The code currently does an early return when
dm.population.is_feasible() is false which leaves the global/outer flag
launch_new_task permanently false; wrap the critical section with a RAII-style
guard that saves the current launch_new_task value and restores it on scope exit
(or use std::scope_exit) so any early return will reset launch_new_task back to
its prior value; apply this pattern around the blocks that check
dm.population.is_feasible() (the locations using
std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex) and the
subsequent checks) so launch_new_task is restored even when returning early.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fca14163-5249-40cb-87eb-077397db18d0

📥 Commits

Reviewing files that changed from the base of the PR and between b497a5b and 1bb5d10.

📒 Files selected for processing (18)

cpp/src/branch_and_bound/branch_and_bound.cpp
cpp/src/branch_and_bound/pseudo_costs.cpp
cpp/src/mip_heuristics/diversity/diversity_manager.cu
cpp/src/mip_heuristics/diversity/lns/rins.cu
cpp/src/mip_heuristics/diversity/lns/rins.cuh
cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu
cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cuh
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
cpp/src/mip_heuristics/local_search/local_search.cu
cpp/src/mip_heuristics/local_search/local_search.cuh
cpp/src/mip_heuristics/presolve/bounds_presolve.cuh
cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu
cpp/src/mip_heuristics/presolve/probing_cache.cu
cpp/src/mip_heuristics/solve.cu
cpp/src/mip_heuristics/solver.cu
cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh

💤 Files with no reviewable changes (3)

cpp/src/mip_heuristics/diversity/diversity_manager.cu
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh

coderabbitai · 2026-04-14T17:07:13Z

  auto loop_start      = std::chrono::high_resolution_clock::now();
  auto time_limit      = std::chrono::milliseconds((int)(in_time_limit * 1000));
  auto loop_time_start = std::chrono::high_resolution_clock::now();

  // Initialize feature tracking
-  fj_cpu.last_feature_log_time = loop_start;
-  fj_cpu.prev_best_objective   = fj_cpu.h_best_objective;
-  fj_cpu.iterations_since_best = 0;
+  fj_cpu->last_feature_log_time = loop_start;
+  fj_cpu->prev_best_objective   = fj_cpu->h_best_objective;
+  fj_cpu->iterations_since_best = 0;

-  while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) {
+  while (!fj_cpu->halted && !fj_cpu->preemption_flag.load()) {
    // Check if 5 seconds have passed
    auto now = std::chrono::high_resolution_clock::now();
    if (in_time_limit < std::numeric_limits<f_t>::infinity() &&
        now - loop_time_start > time_limit) {


⚠️ Potential issue | 🔴 Critical

Guard the infinite time-limit case before building the duration.

cpufj_solve() defaults in_time_limit to infinity(), but Line 1424 converts it to int first. That can overflow before the later finite check and produce a bogus timeout.

💡 Suggested fix

i_t local_mins = 0; auto loop_start = std::chrono::high_resolution_clock::now(); - auto time_limit = std::chrono::milliseconds((int)(in_time_limit * 1000)); + const bool has_time_limit = std::isfinite(in_time_limit); + auto time_limit = std::chrono::milliseconds::zero(); + if (has_time_limit) { + time_limit = std::chrono::milliseconds( + static_cast<int64_t>(std::ceil(in_time_limit * 1000))); + } auto loop_time_start = std::chrono::high_resolution_clock::now(); @@ - if (in_time_limit < std::numeric_limits<f_t>::infinity() && - now - loop_time_start > time_limit) { + if (has_time_limit && now - loop_time_start > time_limit) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1423 - 1436, The code builds time_limit by casting in_time_limit to int then to milliseconds before checking for infinity, which can overflow when in_time_limit == infinity(); modify cpufj_solve() to test std::isfinite(in_time_limit) (or in_time_limit < infinity()) first and only construct time_limit when finite (e.g., assign std::optional<std::chrono::milliseconds> time_limit or a boolean has_time_limit), then in the loop replace the current comparison with a guarded check using has_time_limit and time_limit against now - loop_time_start; update references to time_limit, loop_time_start and the in_time_limit conversion accordingly to avoid the int cast on infinity.

coderabbitai · 2026-04-14T17:07:14Z

+  // TODO: Remove this after converting deterministic B&B to use tasks. This allows
+  // creating a nested parallel region.
+  omp_set_max_active_levels(2);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is omp_set_max_active_levels process-wide for the OpenMP runtime, and does the setting persist after the current parallel region exits?

💡 Result:

No, omp_set_max_active_levels is not process-wide for the OpenMP runtime. It sets the max-active-levels-var ICV, which has data environment scope (one copy per data environment) in OpenMP 5.1, or per device in OpenMP 5.0 (with one host device). The setting affects parallel regions generated by the binding task and is inherited by child tasks/data environments created from it. Yes, the setting persists after the current parallel region exits, as ICVs with data environment scope are inherited by enclosed tasks and parallel regions, and the routine only has defined effect from sequential parts (where it sets the initial task's data environment), remaining effective for subsequent regions.

Citations:

1: https://www.openmp.org/spec-html/5.0/openmpsu125.html

2: https://www.openmp.org/spec-html/5.1/openmpsu134.html

3: https://www.openmp.org/spec-html/5.0/openmpsu30.html

4: https://www.openmp.org/spec-html/5.1/openmpsu34.html

Restore the caller's OpenMP nesting setting.

omp_set_max_active_levels(2) modifies OpenMP runtime state that persists after solve_mip() returns, affecting subsequent parallel regions in the same data environment. This can unintentionally constrain nested parallelism in unrelated host code calling this library function.

💡 Suggested fix

+ const int previous_max_active_levels = omp_get_max_active_levels(); omp_set_max_active_levels(2); // mip_solution_t<i_t, f_t> sol(mip_termination_status_t::NoTermination, solver_stats_t<i_t, f_t>{}, op_problem.get_handle_ptr()->get_stream()); @@ `#pragma` omp parallel num_threads(num_threads) default(none) \ shared(sol, op_problem, settings_const, exception) { `#pragma` omp master { @@ } } + omp_set_max_active_levels(previous_max_active_levels); if (exception) { std::rethrow_exception(exception); } return sol;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solve.cu` around lines 588 - 590, The call to omp_set_max_active_levels(2) in solve_mip() changes global OpenMP state and must be restored before returning; save the previous value (e.g., int prev = omp_get_max_active_levels()) immediately before calling omp_set_max_active_levels(2) and restore it after the nested parallel region or at function exit (ensure restoration on all return paths and exceptions), or encapsulate this logic in a small RAII-style helper so solve_mip() leaves OpenMP nesting levels unchanged for callers.

coderabbitai · 2026-04-14T17:07:14Z

+#pragma omp taskgroup
+  {
+    if (!context.settings.heuristics_only) {
+#pragma omp task default(none) \
+  shared(branch_and_bound, branch_and_bound_solution, branch_and_bound_status)
+      {
+        branch_and_bound_status = branch_and_bound->solve(branch_and_bound_solution);
+      }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In OpenMP C++, what happens if a C++ exception escapes an omp task, and what is the recommended pattern for propagating it back to the parent task?

💡 Result:

If a C++ exception escapes an OpenMP omp task (task region), it violates the OpenMP specification. The specification requires that a throw executed inside a task region must cause execution to resume within the same task region, caught by the same thread that threw it. Allowing the exception to escape leads to undefined behavior; in practice, implementations like GCC insert implicit try-catch blocks that call std::terminate, crashing the program. The recommended pattern to propagate an exception from a child task back to the parent task is to catch it within the child task, store it in a shared std::exception_ptr (using std::current_exception), protected by synchronization like #pragma omp critical, and then use taskwait in the parent task to synchronize before checking and rethrowing the stored exception with std::rethrow_exception. Example pattern: std::exception_ptr task_exception = nullptr; auto task_work = [&] { try { // code that might throw } catch (...) { #pragma omp critical { if (!task_exception) task_exception = std::current_exception; } } }; #pragma omp task shared(task_exception) // or firstprivate if needed { task_work; } #pragma omp taskwait if (task_exception) std::rethrow_exception(task_exception); This ensures the exception is caught in the task, propagated safely to the parent, and rethrown after synchronization. For multiple tasks, the critical section ensures only one exception is stored (others are discarded), or use a container for all.

Citations:

1: https://stackoverflow.com/questions/69889899/what-is-the-correct-way-for-exception-handling-with-openmp-tasks

2: https://stackoverflow.com/questions/11828539/elegant-exception-handling-in-openmp

3: https://openmp.org/spec-html/5.1/openmpsu54.html

4: https://stackoverflow.com/questions/38737531/throw-stdruntime-error-within-openmp-region-crashes-the-program

5: https://stackoverflow.com/questions/11828539/elegant-exception-handling-in-openmp/13978507

🏁 Script executed:

cd /tmp && find . -name "solver.cu" -path "*/mip_heuristics/*" 2>/dev/null | head -5

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# First, let's find the file fd -t f "solver.cu" | grep mip_heuristics

Repository: NVIDIA/cuopt

Length of output: 122

🏁 Script executed:

# Get more context around the specified lines git ls-files | grep -E "mip_heuristics.*solver\.cu"

Repository: NVIDIA/cuopt

Length of output: 122

🏁 Script executed:

# Check if the file exists and read the relevant section if [ -f "cpp/src/mip_heuristics/solver.cu" ]; then wc -l cpp/src/mip_heuristics/solver.cu echo "---" sed -n '440,475p' cpp/src/mip_heuristics/solver.cu fi

Repository: NVIDIA/cuopt

Length of output: 1369

🏁 Script executed:

# Also check for exception handling patterns in the file rg "try|catch|\#pragma omp" cpp/src/mip_heuristics/solver.cu -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 251

🏁 Script executed:

# Check the entire run_solver function for exception handling and context rg "run_solver|try|catch" cpp/src/mip_heuristics/solver.cu -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 388

🏁 Script executed:

# Look for solve_mip function and its exception handling fd -t f solve.cu | grep -E "mip_heuristics|src" | head -3

Repository: NVIDIA/cuopt

Length of output: 223

🏁 Script executed:

# Check git history or comments about the change git log --oneline -20 -- cpp/src/mip_heuristics/solver.cu 2>/dev/null | head -10

Repository: NVIDIA/cuopt

Length of output: 123

🏁 Script executed:

# Check if there's documented synchronization with the task sed -n '440,490p' cpp/src/mip_heuristics/solver.cu

Repository: NVIDIA/cuopt

Length of output: 2001

🏁 Script executed:

# Look at the full run_solver function to understand its structure sed -n '/template.*run_solver/,/^}/p' cpp/src/mip_heuristics/solver.cu | head -100

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check if branch_and_bound->solve() is defined and what it returns rg "solve\(" cpp/src/mip_heuristics/ -A 2 | grep -E "solve|throw|exception" | head -20

Repository: NVIDIA/cuopt

Length of output: 1808

🏁 Script executed:

# Check if there's exception handling at higher levels rg "solve_mip|try|catch" cpp/src/mip_heuristics/ -B 2 -A 2 | head -40

Repository: NVIDIA/cuopt

Length of output: 3122

🏁 Script executed:

# Look for the mip_status_t enum to understand error reporting rg "mip_status_t" cpp/src/mip_heuristics/solver.cu -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 881

🏁 Script executed:

# Check the branch_and_bound->solve() implementation to see if it can throw rg "class.*branch_and_bound|solve\(\)" cpp/src/ --type cpp --type hpp -B 2 -A 5 | head -60

Repository: NVIDIA/cuopt

Length of output: 83

🏁 Script executed:

# Look for exception declarations in branch_and_bound or related headers find cpp/src -name "*.hpp" -o -name "*.h" | xargs grep -l "branch_and_bound\|dual_simplex" | head -5

Repository: NVIDIA/cuopt

Length of output: 223

🏁 Script executed:

# Check solve.cu to understand the broader exception handling context sed -n '1,80p' cpp/src/mip_heuristics/solve.cu

Repository: NVIDIA/cuopt

Length of output: 3218

🏁 Script executed:

# Search for where mip_solver_t::run_solver is called and whether it's wrapped in try-catch rg "run_solver\(\)" cpp/src/mip_heuristics/ -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 5190

🏁 Script executed:

# Check the context around where run_solver is called in solve.cu sed -n '380,450p' cpp/src/mip_heuristics/solve.cu

Repository: NVIDIA/cuopt

Length of output: 3929

🏁 Script executed:

# Look for try-catch blocks in solve.cu around the solver instantiation rg "try|catch" cpp/src/mip_heuristics/solve.cu -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 3986

🏁 Script executed:

# Verify the entire solve function to understand exception handling scope grep -n "^solve_mip\|^static\|^template" cpp/src/mip_heuristics/solve.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 437

Catch and rethrow exceptions from the B&B task to prevent silent termination.

If branch_and_bound->solve() throws, the exception will not propagate through the OpenMP task boundary. Per the OpenMP specification, exceptions escaping an omp task region cause std::terminate to be called, resulting in process termination instead of reaching the outer solve_mip() exception handlers.

Use std::exception_ptr to capture and rethrow the exception after the taskgroup completes:

Suggested fix

+ std::exception_ptr bb_exception; + `#pragma` omp taskgroup { if (!context.settings.heuristics_only) { `#pragma` omp task default(none) \ - shared(branch_and_bound, branch_and_bound_solution, branch_and_bound_status) + shared(branch_and_bound, branch_and_bound_solution, branch_and_bound_status, bb_exception) { - branch_and_bound_status = branch_and_bound->solve(branch_and_bound_solution); + try { + branch_and_bound_status = branch_and_bound->solve(branch_and_bound_solution); + } catch (...) { + bb_exception = std::current_exception(); + } } } // Start the primal heuristics context.diversity_manager_ptr = &dm; sol = dm.run_solver(); } + + if (bb_exception) { std::rethrow_exception(bb_exception); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solver.cu` around lines 452 - 459, The omp task calling branch_and_bound->solve(...) can throw and will cause std::terminate if the exception escapes the task; capture exceptions inside the task into a std::exception_ptr (e.g., named bb_exception) instead of letting them propagate, set branch_and_bound_status and branch_and_bound_solution as before, then after the `#pragma` omp taskgroup completes check if bb_exception is non-null and rethrow it (std::rethrow_exception) so outer solve_mip() handlers can catch it; reference the omp task that assigns branch_and_bound_status and uses branch_and_bound_solution and branch_and_bound->solve to add this capture-and-rethrow behavior.

As far I am aware, B&B does not throw exceptions. So, a try-catch block is not necessary here.

@nguidotti, understood — thanks for the clarification. Since branch_and_bound->solve() is guaranteed not to throw, the defensive try/catch wrapper is unnecessary here.

✏️ Learnings added

Learnt from: nguidotti Repo: NVIDIA/cuopt PR: 1099 File: cpp/src/mip_heuristics/solver.cu:452-459 Timestamp: 2026-04-15T08:55:03.026Z Learning: In NVIDIA/cuopt, `dual_simplex::branch_and_bound_t<i_t, f_t>::solve()` (called in `cpp/src/mip_heuristics/solver.cu` inside `#pragma omp task`) does not throw C++ exceptions. Therefore, no try-catch / std::exception_ptr wrapper is needed around the call when used inside an OpenMP task region.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: nguidotti Repo: NVIDIA/cuopt PR: 1099 File: cpp/src/branch_and_bound/pseudo_costs.cpp:1054-1139 Timestamp: 2026-04-15T08:42:21.283Z Learning: In NVIDIA/cuopt, `strong_branching()` and `pseudo_costs_t::reliable_variable_selection()` in `cpp/src/branch_and_bound/pseudo_costs.cpp` are only ever called from within the Branch-and-Bound (B&B) execution path, which is always enclosed by the top-level `#pragma omp parallel` region created in `cpp/src/mip_heuristics/solve.cu` (`solve_mip_helper` wrapped in `#pragma omp parallel` + `#pragma omp master`). Therefore, any `#pragma omp task` or `#pragma omp taskloop` inside those functions is always guaranteed to have an active parallel team and will execute concurrently as intended — no local parallel/single scaffolding is needed.

Learnt from: aliceb-nv Repo: NVIDIA/cuopt PR: 899 File: cpp/src/mip_heuristics/solve.cu:169-193 Timestamp: 2026-02-24T14:37:06.700Z Learning: In cuOpt's MIP solver, the sign of objective_scaling_factor encodes the problem direction: >= 0 indicates minimization; < 0 indicates maximization (internally converted to minimization by negating the objective). Therefore, during code reviews, treat the sign of objective_scaling_factor as the source of truth for problem direction and use it to set appropriate 'no bound' values for callbacks accordingly.

Doesn't standard library algorithms throw exceptions like bad_alloc?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

akifcorduk

I think we should run a baseline with probing cache changes as it fixes performance bugs. That way we can measure the performance impact of the threading changes. Otherwise, we are testing multiple things together.

p.s. I didn't review cp_fj changes thoroughly.

akifcorduk · 2026-04-15T09:46:37Z

+    num_threads = omp_get_max_threads();
+  } else {
+    if (settings_const.num_cpu_threads < 4) {
+      CUOPT_LOG_ERROR(


Instead of ERROR, this should be a WARNING.

akifcorduk · 2026-04-15T09:47:26Z

+      } catch (...) {
+        // We cannot throw inside an OpenMP parallel region. So we need to catch and then
+        // re-throw later.
+        exception = std::current_exception();


What if there are multiple exceptions in parallel?

Considering that the exceptions cannot cross thread boundaries, this will only catch the exceptions throw by the master thread.

However, if we want to catch the exception from all threads, then we need a mechanism for collecting and re-throwing in the master thread.

akifcorduk · 2026-04-15T09:48:56Z

 template <typename i_t, typename f_t>
 solution_t<i_t, f_t> mip_solver_t<i_t, f_t>::run_solver()
 {
+  solution_t<i_t, f_t> sol(*context.problem_ptr);


I think problem might be copied or modified in the following calls. I would double check that removed solution constructors below are indeed using the unmodified problem and same as the original one.

…e for root relaxation. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-15T10:23:01Z

/ok to test b4efd67

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-15T11:46:48Z

/ok to test 80b14ab

aliceb-nv

Thanks a lot for that work Nicolas :)
Let me know if you need help with porting the B&B determinism code.

aliceb-nv · 2026-04-15T12:24:52Z

-  for (auto& cpu_fj_ptr : ls_cpu_fj) {
-    cpu_fj_ptr->start_cpu_solver();
-  }
+#pragma omp taskgroup


Just to make sure I understand - there's an implicit taskwait at the end of this taskgroup right? Is that how the CPUFJ tasks are joined after the halt preemption is requested?

Exactly, at the end of the taskgroup there is an implicit barrier that waits for all tasks created inside the group to finish.

Added comments to each implicit barrier

aliceb-nv · 2026-04-15T12:26:35Z

+    CUOPT_LOG_ERROR("The MIP solver requires at least 4 CPU threads!");
+    return mip_solution_t<i_t, f_t>{
+      cuopt::logic_error("The number of CPU threads is below than expected.",
+                         cuopt::error_type_t::RuntimeError),
+      op_problem.get_handle_ptr()->get_stream()};


Is that true? Don't we support running on a single thread?

With low thread count, some vital tasks may not execute. If you have just one thread, then how we suppose to execute the heuristic and the B&B thread at the same time?

We could support low thread count, if either one or another tasks can be interrupted or yielded. But I think this is too much effort for a rare case (I mean, it is pretty common to have CPUs with more than 8 threads nowadays)

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

…nc in CPU FJ. use scope_guard in RINS. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-15T15:51:12Z

/ok to test d7d3dad

…A 12.9 Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-16T10:16:13Z

/ok to test 6ac7882

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti · 2026-04-16T10:18:59Z

/ok to test e152241

nguidotti added 2 commits April 13, 2026 13:36

B&B and heuristics now shares a single omp parallel region

bee2222

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

migrated heuristics and presolve to use OpenMP. now the entire solver…

a50ff74

… shares the same thread pool. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti changed the title ~~[WIP] Unified threading model in MIP solver#820~~ [WIP] Unified threading model in MIP solver Apr 14, 2026

nguidotti added 2 commits April 14, 2026 18:34

merging with main

29c9b40

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

fixed compilation. removed debug code from probing_cache.cu

1bb5d10

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti requested review from Kh4ster, akifcorduk, aliceb-nv and chris-maes April 14, 2026 16:50

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

addressed coderabbit reviews. added some comments and logs

a7f2eb6

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

akifcorduk reviewed Apr 15, 2026

View reviewed changes

nguidotti added 3 commits April 15, 2026 12:07

fixed incorrect starting bounds for probing cache. added depend claus…

50fa3d6

…e for root relaxation. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

reduced the verbosity of the shared clause

180fbd9

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

added missing offset for the end in probing cache

b4efd67

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

fixed missing type conversion

80b14ab

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti self-assigned this Apr 15, 2026

nguidotti added non-breaking Introduces a non-breaking change do not merge Do not merge if this flag is set improvement Improves an existing functionality mip labels Apr 15, 2026

nguidotti added this to the 26.06 milestone Apr 15, 2026

nguidotti linked an issue Apr 15, 2026 that may be closed by this pull request

[FEA] Respect CPU thread limits #1067

Open

aliceb-nv reviewed Apr 15, 2026

View reviewed changes

nguidotti added 3 commits April 15, 2026 14:43

converted GPUFJ to omp task

588b401

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

added more comments about the implicit barriers. removed cudaDeviceSy…

3c569e6

…nc in CPU FJ. use scope_guard in RINS. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

decreased verbosity for logging the task launches

d7d3dad

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti added 3 commits April 16, 2026 12:09

revert scope guard as it was causing the compilation to fail with CUD…

edcec04

…A 12.9 Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

trying again to re-enable scope_guard

ff75e04

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

Merge branch 'main' into omp-migration

6ac7882

nguidotti added 2 commits April 16, 2026 12:18

missed one debug message

11b92ed

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

Merge remote-tracking branch 'origin/omp-migration' into omp-migration

e152241

Conversation

nguidotti commented Apr 14, 2026

Checklist

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

nguidotti commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

akifcorduk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguidotti Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguidotti commented Apr 15, 2026

Uh oh!

nguidotti commented Apr 15, 2026

Uh oh!

aliceb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

coderabbitai bot Apr 14, 2026 •

edited

Loading

nguidotti Apr 15, 2026 •

edited

Loading