Cleanup early heuristics logs + unify cutoff/upper_bound in B&B #1033
Cleanup early heuristics logs + unify cutoff/upper_bound in B&B #1033rapids-bot[bot] merged 29 commits intoNVIDIA:release/26.04from
Conversation
|
/ok to test cc7e99f |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces the pruning-only initial cutoff with a seeded global upper bound and explicit solver-space incumbent tracking; extends early-heuristic incumbent callbacks to include heuristic name and assignment; propagates changes through solver context, B&B cutoff usage, solution selection, and detailed logging. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 316-321: The debug message in
branch_and_bound_t::set_initial_upper_bound logs the same internal-space value
twice and incorrectly labels the second as "user-space"; update the
CUOPT_LOG_DEBUG call in set_initial_upper_bound to only report the
solver/internal bound (upper_bound_) or, if you intend to log a user-space
value, add a new parameter for user_bound and pass the correct value from the
caller. Specifically, modify the CUOPT_LOG_DEBUG format string in
set_initial_upper_bound (and its arguments) so it no longer claims a user-space
value when only the internal bound is available.
In `@cpp/src/branch_and_bound/branch_and_bound.hpp`:
- Around line 127-128: The accessor has_solver_space_incumbent() currently reads
incumbent_.has_incumbent without synchronization causing a data race; fix by
making the flag atomic: change the type of incumbent_.has_incumbent to
std::atomic<bool> and update all writes (e.g., in set_incumbent_solution() and
other places that set the flag) to assign to the atomic (you may keep the
existing mutex_upper_ locking for other state updates), and change
has_solver_space_incumbent() to return the atomic's value
(incumbent_.has_incumbent.load()). Ensure all reads/writes to has_incumbent use
atomic operations so no unsynchronized access remains.
In `@cpp/src/mip_heuristics/solve.cu`:
- Around line 552-555: The presolve_time is being added twice when the fallback
solution replaces a postsolved pipeline result: initialize a boolean (e.g.,
presolve_time_consumed = false) before the postsolve block, set
presolve_time_consumed = true in the branch that already adds presolve_time to
stats (the branch guarded by !status_to_skip where sol already carries
presolve_time), and then in the fallback path before doing stats.presolve_time
+= presolve_time only add the time if presolve_time_consumed is false; keep the
existing calls to fallback_sol.post_process_completed and
fallback_sol.get_solution(true, stats) but pass the guarded/updated stats so
presolve_time is not double-counted.
In `@cpp/tests/mip/incumbent_callback_test.cu`:
- Around line 163-184: The test currently overwrites
CUOPT_DISABLE_GPU_HEURISTICS with setenv and always calls unsetenv at the end,
which mutates global state if the variable was originally set; change the
pattern in the test (around the setenv/ unsetenv usage) to capture the original
value via getenv before calling setenv, then after solve_mip restore the prior
state: if the original value was non-null call setenv to restore it, otherwise
call unsetenv to remove the variable; reference the existing
setenv("CUOPT_DISABLE_GPU_HEURISTICS", "1", 1) call and the trailing
unsetenv("CUOPT_DISABLE_GPU_HEURISTICS") to locate where to add the getenv
capture and conditional restore.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c37bc69-158d-4ab4-b1fc-5fb83051cf23
📒 Files selected for processing (8)
cpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/mip_heuristics/diversity/population.cuhcpp/src/mip_heuristics/early_heuristic.cuhcpp/src/mip_heuristics/solve.cucpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver_context.cuhcpp/tests/mip/incumbent_callback_test.cu
|
/ok to test 62b1ba5 |
| (context.problem_ptr->maximize ? cpufj_user_obj > context.initial_cutoff | ||
| : cpufj_user_obj < context.initial_cutoff); | ||
| if (should_update) { context.initial_cutoff = cpufj_user_obj; } | ||
| CUOPT_LOG_INFO("Early CPUFJ found incumbent with user-space objective %g during presolve", |
There was a problem hiding this comment.
Could we also make this a CUOPT_LOG_DEBUG? Since I think you print out the objective of the new solution immediately when you find it
There was a problem hiding this comment.
Woops :) Will fix!
| ctx_ptr->initial_incumbent_assignment = user_assignment; | ||
| ctx_ptr->initial_upper_bound = user_obj; | ||
| CUOPT_LOG_INFO( | ||
| "New solution from early %s (presolved). Objective %g", heuristic_name, user_obj); |
There was a problem hiding this comment.
Could you make this line consistent with the line below:
New solution from early primal heuristics (%s). Objective %g. Time %.2f
There was a problem hiding this comment.
And in Branch and Bound we do
There was a problem hiding this comment.
New solution from primal heuristics. Objective %+.6e. Time %.2f
There was a problem hiding this comment.
Could we use %+.6e for printing the objective? So that we always print it in the same format? And always print the time to two decimal digits?
There was a problem hiding this comment.
That makes perfect sense. Will fix and push ASAP
|
Thanks for this PR Alice. It really cleans things up in branch and bound nicely. I appreciate it. I had a couple minor suggestions to make the log lines consistent. |
|
/ok to test 32cca62 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/mip_heuristics/solve.cu`:
- Around line 471-482: The debug messages for early_gpufj and early_cpufj
currently print solver-space minimized values via get_best_objective(); update
those logs to clarify the space or print the user-facing objective: use
objective_scaling_factor to detect direction (>=0 => minimization, <0 =>
maximization) and either append " (solver-space objective)" to the log or
multiply get_best_objective() by -1 when objective_scaling_factor < 0 to emit
the user-space objective; update the messages surrounding
early_gpufj->get_best_objective() and early_cpufj->get_best_objective()
accordingly so the logs unambiguously reflect solver vs user objective.
- Around line 547-565: The fallback incumbent path can inherit a bogus proven
bound in stats and be mislabeled Optimal; before calling
fallback_sol.get_solution(true, stats, false) detect if
stats.get_solution_bound() is the default/no-bound sentinel (i.e. not a finite
dual bound) and normalize it to a direction-aware “no bound” sentinel based on
problem.presolve_data.objective_scaling_factor (use +infinity for minimization
(>=0) and -infinity for maximization (<0)), or alternatively bypass using stats
and force the returned status to FeasibleFound; update the code around
fallback_sol.get_solution(true, stats, false) to set the sanitized bound on
stats (or call get_solution with a flag to force FeasibleFound) so the fallback
incumbent cannot be misreported as Optimal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 553f9cec-de36-463a-ae54-8f673ffede42
📒 Files selected for processing (5)
cpp/include/cuopt/linear_programming/mip/solver_solution.hppcpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solve.cucpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver_solution.cu
|
/ok to test 35dc789 |
|
/ok to test 6472058 |
1 similar comment
|
/ok to test 6472058 |
|
/ok to test e039851 |
|
/ok to test 4109e00 |
nguidotti
left a comment
There was a problem hiding this comment.
Look good to me. Thanks for the hard work, Alice!
| template <typename i_t, typename f_t> | ||
| void mip_solution_t<i_t, f_t>::log_detailed_summary() const | ||
| { | ||
| CUOPT_LOG_INFO( |
There was a problem hiding this comment.
How about we change to something like this:
Solution status: <OPTIMAL | INFEASIBLE | TIMEOUT>
Solution objective: %f
Solution bound: %f
Relative MIP gap: %f
Presolve time: %f s
Total time: %f s
Max constraint violation: %f
Max integer violation: %f
Max variable bound violation: %f
Nodes explored: %d
Simplex iterations: %d
There was a problem hiding this comment.
In the current state, there is no consistence between the names and it quite difficult to parse everything in one line
There was a problem hiding this comment.
You could also use std::format to align the everything, if you prefer
There was a problem hiding this comment.
That'd mean changing a long-standing log line :) Might break a few of our own scripts. I think we should discuss large log changes with the team (although I agree this kind of summary looks good to me)
There was a problem hiding this comment.
Yeah, let's not break any scripts now. But let's have a larger discussion later. I think that we might want to consolidate this log line with other log lines eventually.
| early_best_user_assignment = assignment; | ||
| double elapsed = | ||
| std::chrono::duration<double>(std::chrono::steady_clock::now() - early_fj_start).count(); | ||
| CUOPT_LOG_INFO("New solution from early primal heuristics (%s). Objective %+.6e. Time %.2f", |
There was a problem hiding this comment.
Nitpick: Logging heuristic name may not be relevant to the end-user
There was a problem hiding this comment.
I tend to agree. But I don't think it is harmful. I defer to Alice.
| void set_initial_upper_bound(f_t bound); | ||
|
|
||
| f_t get_upper_bound() const { return upper_bound_.load(); } | ||
| bool has_solver_space_incumbent() const { return incumbent_.has_incumbent; } |
There was a problem hiding this comment.
Nitpick: correct me if I am wrong, but I think we only have the incumbent in the solver space, right? Maybe you could shorten to has_incumbent()
There was a problem hiding this comment.
That's for clarity - I wanted to make it explicit in the code that "has_incumbent" here means "has solver-space incumbent". I wanted to avoid potential confusion (as I've already made myself confused a few times hah!)
|
/ok to test 6357f8a |
|
/ok to test 14b86e4 |
|
/ok to test 0e5d755 |
|
/ok to test 0e5d755 |
|
/ok to test 3c9b1d1 |
|
/ok to test 19f54b9 |
|
/ok to test 5f12bf3 |
chris-maes
left a comment
There was a problem hiding this comment.
LGTM. Thanks as always for your amazing work Alice.
jameslamb
left a comment
There was a problem hiding this comment.
Approving to unblock you because I know your 26.04 release is very close but please do consider my comments about that libgrpc pin, I hope it can be relaxed.
| - openssl | ||
| - c-ares | ||
| - libgrpc | ||
| - libgrpc >=1.78.0,<1.80.0a0 |
There was a problem hiding this comment.
Pin libgrpc >=1.78.0,<1.80.0a0 to avoid version conflicts with RAPIDS packages (libarrow, libopentelemetry-cpp) that require libgrpc 1.78.x.
What specifically were the conflicts? Would it be sufficient to just make this a floor and not include a ceiling?
A tight pin like this is going to cause environment conflicts, especially in the all-of-RAPIDS conda environment in the devcontainers: https://github.com/rapidsai/devcontainers/blob/51ebbe3e6ae8120b9aba229729ef322bc18bbd4a/.github/workflows/test.yml#L29
I'd love the opportunity to help you find a less-restrictive fix.
There was a problem hiding this comment.
Issue was libcuopt built with latest libgrpc 1.8x, but pyarrow used libgrpc 1.7x, so while testing they diverged and started breaking due to dependency issue. https://github.com/NVIDIA/cuopt/actions/runs/24077889439/job/70243128275?pr=1033
There was a problem hiding this comment.
oy 😫
Repeating what I wrote privately so we have it here in GitHub, I guess it's this part?
│ │ │ │ └─ pyarrow [20.0.0|21.0.0|22.0.0|23.0.0|23.0.1] would require
│ │ │ │ ├─ libparquet [=20.0.0 *|=21.0.0 *|=22.0.0 *|=23.0.1 *] with the potential options
...
│ │ │ │ │ └─ libparquet [20.0.0|21.0.0|22.0.0|23.0.1] would require
│ │ │ │ │ └─ libarrow [==20.0.0 h54ffee7_42_cuda|==20.0.0 h6053680_44_cuda|...|==23.0.1 he2a4bb3_8_cuda], which requires
│ │ │ │ │ ├─ libgoogle-cloud >=3.3.0,<3.4.0a0 * with the potential options
│ │ │ │ │ │ ├─ libgoogle-cloud 3.3.0 would require
│ │ │ │ │ │ │ └─ libgrpc >=1.78.1,<1.79.0a0 *, which can be installed;
│ │ │ │ │ │ └─ libgoogle-cloud 3.3.0 would require
...
│ │ └─ libcuopt =26.4.0a196 *, which requires
│ │ ├─ libabseil >=20260107.1,<20260108.0a0 *, which conflicts with any installable versions previously reported;
│ │ └─ libgrpc >=1.80.0,<1.81.0a0 *, which conflicts with any installable versions previously reported;
cuopt picks up a very-new libgrpc (looks like 1.80.0 was released earlier today), and that's fine at build time but causes issues at runtime because cudf needs pyarrow and pyarrow's dependencies eventually need libgrpc and only support older versions.
blegh let me think a bit about how we could work around this. Please feel free to merge this PR for now if it's blocking other work, we can fix the pin in a follow-up.
|
/merge |
This PR cleans up the logs produced by early heuristics to improve the UX in info-mode.
The B&B code is cleaned up as well to keep a single upper_bound value, and fixes are applied to parts of the code that assumed "upper_bound != inf -> a solver-space incumbent is available".
User-space early heuristics solutions are weaved through the solver and returned at the end of the solve, if no better solution was found.
Also ci changes as follows,
Description
Issue
Checklist