Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
f9e8b5d
add exception handling for pdlp
Iroy30 Mar 17, 2026
1652dde
formatting
Iroy30 Mar 17, 2026
3fcf5fb
remove comments
Iroy30 Mar 17, 2026
d9224a6
Merge branch 'main' into add_exception_handling_pdlp
Iroy30 Mar 17, 2026
8777b9a
Merge branch 'main' into add_exception_handling_pdlp
Iroy30 Mar 17, 2026
9caf93c
rethrow and print exception
Iroy30 Mar 18, 2026
d9aef2b
Merge remote-tracking branch 'iroy30/add_exception_handling_pdlp'
Iroy30 Mar 18, 2026
3014bd3
Update solve.cu
Iroy30 Mar 18, 2026
439bf38
Merge branch 'main' into add_exception_handling_pdlp
Iroy30 Mar 18, 2026
5113f67
Disable testing conda temporarily
Iroy30 Mar 18, 2026
2efc6cd
Debugging prints
Iroy30 Mar 18, 2026
2903db8
Merge remote-tracking branch 'iroy30/add_exception_handling_pdlp'
Iroy30 Mar 18, 2026
5c8a52e
Update solve.cu
Iroy30 Mar 18, 2026
705c71f
Update solve.cu
Iroy30 Mar 19, 2026
03faee4
more debug statements
Iroy30 Mar 19, 2026
9ba4e7e
Merge remote-tracking branch 'iroy30/add_exception_handling_pdlp'
Iroy30 Mar 19, 2026
d71bce2
more debug statements
Iroy30 Mar 19, 2026
7017acd
revert yaml changes
rgsl888prabhu Mar 30, 2026
81d3d05
Merge branch 'release/26.04' into add_exception_handling_pdlp
rgsl888prabhu Mar 30, 2026
5e3f201
remove debug
rgsl888prabhu Mar 30, 2026
fd04d94
Merge branch 'add_exception_handling_pdlp' of github.com:Iroy30/Nvidi…
rgsl888prabhu Mar 30, 2026
b6ad396
Update solve.cu
Iroy30 Mar 31, 2026
0c32bb5
Update test_incumbent_callbacks.py
Iroy30 Mar 31, 2026
e89578a
Merge branch 'release/26.04' into add_exception_handling_pdlp
Iroy30 Mar 31, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions cpp/src/pdlp/solve.cu
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@

#include <rmm/cuda_stream.hpp>

#include <thread> // For std::thread
#include <exception>
#include <thread>

#define CUOPT_LOG_CONDITIONAL_INFO(condition, ...) \
if ((condition)) { CUOPT_LOG_INFO(__VA_ARGS__); }
Expand Down Expand Up @@ -1149,13 +1150,11 @@ optimization_problem_solution_t<i_t, f_t> run_concurrent(
auto barrier_handle = raft::handle_t(barrier_stream);
auto barrier_problem = dual_simplex_problem;
barrier_problem.handle_ptr = &barrier_handle;

run_barrier_thread<i_t, f_t>(std::ref(barrier_problem),
std::ref(settings_pdlp),
std::ref(sol_barrier_ptr),
std::ref(timer));
};

if (settings.num_gpus > 1) {
problem.handle_ptr->sync_stream();
raft::device_setter device_setter(1); // Scoped variable
Expand All @@ -1169,8 +1168,20 @@ optimization_problem_solution_t<i_t, f_t> run_concurrent(
if (settings.num_gpus > 1) {
CUOPT_LOG_DEBUG("PDLP device: %d", raft::device_setter::get_current_device());
}
// Run pdlp in the main thread
auto sol_pdlp = run_pdlp(problem, settings_pdlp, timer, is_batch_mode);

// Run pdlp in the main thread.
// Must join all spawned threads before leaving this scope, even on exception,
// because destroying a joinable std::thread calls std::terminate().
std::exception_ptr pdlp_exception;
optimization_problem_solution_t<i_t, f_t> sol_pdlp{pdlp_termination_status_t::NumericalError,
problem.handle_ptr->get_stream()};
try {
sol_pdlp = run_pdlp(problem, settings_pdlp, timer, is_batch_mode);
} catch (...) {
pdlp_exception = std::current_exception();
*settings_pdlp.concurrent_halt = 1;
std::rethrow_exception(pdlp_exception);
}
Comment on lines +1175 to +1184
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

pdlp_exception is captured but never rethrown after thread cleanup.

The exception is captured in pdlp_exception and logged, but the variable is never used after line 1190. After threads join (lines 1193-1195), the function continues and returns sol_pdlp with NumericalError status without rethrowing.

If the intent (per PR description: "rethrowing exceptions") is to propagate the exception after ensuring threads are cleaned up, add a rethrow after the joins:

🛡️ Proposed fix to rethrow after thread cleanup
   barrier_thread.join();
+
+  // Rethrow captured exception after threads are safely joined
+  if (pdlp_exception) {
+    std::rethrow_exception(pdlp_exception);
+  }

   // copy the dual simplex solution to the device

If the current behavior (graceful degradation returning NumericalError status) is intentional, consider removing the unused pdlp_exception variable and directly logging within the catch block, or add a comment clarifying the exception is intentionally swallowed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 1175 - 1190, The code captures exceptions
into pdlp_exception in the run_pdlp try/catch but never rethrows it after thread
cleanup, causing swallowed errors and returning sol_pdlp with
pdlp_termination_status_t::NumericalError; fix by rethrowing pdlp_exception
after the concurrent threads are joined (i.e., after the thread-join/cleanup
section that follows this block) so the caller observes the original exception,
while keeping the existing settings_pdlp.concurrent_halt update and logging in
the catch; alternatively, if swallowing is intentional remove pdlp_exception and
add a clarifying comment—refer to pdlp_exception, run_pdlp,
settings_pdlp.concurrent_halt, and sol_pdlp when making the change.

Comment thread
coderabbitai[bot] marked this conversation as resolved.

// Wait for dual simplex thread to finish
if (!settings.inside_mip) { dual_simplex_thread.join(); }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def set_solution(
@pytest.mark.parametrize(
"file_name",
[
("/mip/swath1.mps"),
# ("/mip/swath1.mps"), # Skipping due to PDLP crash
("/mip/neos5-free-bound.mps"),
],
)
Expand All @@ -115,7 +115,7 @@ def test_incumbent_get_callback(file_name):
@pytest.mark.parametrize(
"file_name",
[
("/mip/swath1.mps"),
# ("/mip/swath1.mps"), # Skipping due to PDLP crash
("/mip/neos5-free-bound.mps"),
],
)
Expand Down
Loading