Conversation
|
/ok to test e4985d0 |
📝 WalkthroughWalkthroughAdds a runtime work-accounting parameter ( Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/dual_simplex/basis_updates.hpp (1)
171-194:⚠️ Potential issue | 🔴 Critical
work_estimate_is read before initialization — undefined behavior.Neither constructor initializes
work_estimate_in its member initializer list. Both constructors callclear(), which performswork_estimate_ += 2 * refactor_frequency_andwork_estimate_ += xi_workspace_.size() + x_workspace_.size()(lines 408, 412), reading the uninitialized field via+=. The second constructor additionally callscompute_transposes()(line 375), which also uses+=.Add
work_estimate_(0.0)to both constructors' initializer lists.Proposed fix
For the first constructor (line 171):
basis_update_mpf_t(i_t n, const i_t refactor_frequency) : L0_(n, n, 1), ... - hypersparse_threshold_(0.05) + hypersparse_threshold_(0.05), + work_estimate_(0.0) {For the second constructor (line 196):
basis_update_mpf_t(const csc_matrix_t<i_t, f_t>& Linit, ... - hypersparse_threshold_(0.05) + hypersparse_threshold_(0.05), + work_estimate_(0.0) {cpp/src/dual_simplex/right_looking_lu.cpp (1)
18-18:⚠️ Potential issue | 🟡 MinorRemove unused
using cuopt::ins_vector;— leftover from theins_vector→std::vectormigration.The
ins_vectoralias is imported on line 18 but never referenced anywhere in the file. The using declaration and the associated include on line 10 can both be removed.Proposed fix
-#include <utilities/memory_instrumentation.hpp> - `#include` <raft/common/nvtx.hpp> `#include` <cassert> `#include` <cmath> `#include` <cstdio> -using cuopt::ins_vector;
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/basis_updates.cpp`:
- Around line 1264-1268: The else branch calls WT.transpose(V) but uses V_nz
still zero when updating work_estimate_, so update V_nz using the actual
non-zero count before accounting for the transpose cost; e.g., set V_nz =
V.col_start[m] (or otherwise compute V's nnz for column m) prior to doing
work_estimate_ += 3 * V_nz so the cost of WT.transpose(V) is correctly included
(refer to WT.transpose(V), V_nz, work_estimate_, and V.col_start).
- Around line 2215-2219: When etilde.i.size() can be zero, computing
work_estimate_ += etilde.i.size() * std::log2(etilde.i.size()) produces NaN; fix
by checking the size first: call etilde.sort() as before, then store auto n =
etilde.i.size(); if (n > 0) add work_estimate_ += n * std::log2(n); then always
call S_.append_column(etilde) and update work_estimate_ += 4 * n; this ensures
the log2 term is only computed for n>0 while preserving the rest of the logic
around etilde, work_estimate_, etilde.sort(), and S_.append_column(etilde).
In `@cpp/src/dual_simplex/bound_flipping_ratio_test.cpp`:
- Around line 234-239: In the loop that pops from bare_idx (the while loop
computing work_estimate_), guard the std::log2 call against a zero size so
std::log2(0) can't produce -inf; update the calculation for work_estimate_
(which currently does work_estimate_ += 2 * std::log2(bare_idx.size())) to use a
minimum of 1 for the argument (e.g., std::log2(std::max<size_t>(1,
bare_idx.size()))) or perform the addition only when bare_idx.size() > 0 so
work_estimate_ cannot become -infinity after pop_back(); adjust the code around
the pop and work_estimate_ update in the same block to use this guard.
In `@cpp/src/dual_simplex/bound_flipping_ratio_test.hpp`:
- Around line 56-58: The member work_estimate_ is not initialized and can be
read by the accessor work_estimate(), so initialize work_estimate_ to a sensible
default to avoid undefined reads; locate the class that defines work_estimate_,
add an initialization either at the member declaration (e.g., f_t work_estimate_
= 0;) or in the class constructor initializer list, and ensure the same
initialization is applied for the other occurrences referenced around lines
100-103 so all code paths see a defined value before any assignment.
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 2508-2512: The phase2_work_estimate update is using a product
instead of the sum, over-counting allocated elements; change the increment from
using n * 2 * m to n + 2 * m so it reflects the sizes of objective (n), c_basic
(m) and xB_workspace (m). Locate the site updating phase2_work_estimate (the
line near phase2_work_estimate += n * 2 * m) and replace the expression with n +
2 * m so the estimate correctly sums the allocations for objective, c_basic, and
xB_workspace.
- Around line 3484-3485: The work estimation is using delta_z_indices.size()
after phase2::clear_delta_z has already cleared that vector, so
phase2_work_estimate += 3 * delta_z_indices.size() is a no-op; capture the size
(or increment phase2_work_estimate) before calling phase2::clear_delta_z (or
compute auto n = delta_z_indices.size(); phase2_work_estimate += 3 * n; then
call phase2::clear_delta_z(entering_index, leaving_index, delta_z_mark,
delta_z_indices, delta_z)) so the work is accounted correctly.
🧹 Nitpick comments (5)
cpp/src/dual_simplex/vector_math.cpp (1)
127-168: New permutation utilities are correct and well-documented.The implementations for
permute_vector,inverse_permute_vector, andinverse_permutationare standard and correct. The work-cost comments are a nice addition.Minor nit:
auto n = p.size()yieldssize_t, which may trigger signed/unsigned comparison warnings in the loopfor (i_t k = 0; k < n; ++k). Considerconst i_t n = static_cast<i_t>(p.size())for consistency with the rest of the codebase.cpp/src/dual_simplex/crossover.cpp (1)
500-536: Factorize→repair→refactorize pattern is duplicated 4 times.This pattern (lines 500–536, 857–896, 1231–1257, 1403–1430) follows the same structure:
factorize_basis→ check rank →basis_repair→factorize_basis→ check rank again. Extracting this into a shared helper (e.g.,factorize_with_repair) would reduce ~120 lines of near-identical code and make thework_estimateplumbing less repetitive. Each call site differs only in return type mapping and a couple of post-repair steps.cpp/src/dual_simplex/right_looking_lu.cpp (1)
36-43: Internal helpers still usetypename VectorItemplate — could be simplified.
initialize_degree_dataandload_elements(lines 36, 82) still take a genericVectorIforcolumn_list, but all callers in this file passstd::vector<i_t>. Since the PR is removingins_vectorthroughout, these could be simplified to match. Low priority since they're in an anonymous namespace and work correctly.cpp/src/dual_simplex/singletons.cpp (1)
217-226: Redundantstatic_cast—col_permandrow_permare alreadystd::vector<i_t>&.These casts were meaningful when the parameters were
ins_vector, but after the migration tostd::vectorthey are identity casts that add confusion. The same pattern repeats at lines 253–254.Proposed fix (lines 217–226)
- auto& col_perm_vec = static_cast<std::vector<i_t>&>(col_perm); - auto& row_perm_vec = static_cast<std::vector<i_t>&>(row_perm); row_col_graph_t<i_t> graph{Cdeg.begin(), - col_perm_vec.begin(), + col_perm.begin(), A.col_start.cbegin(), A.i.cbegin(), Rdeg.begin(), - row_perm_vec.begin(), + row_perm.begin(), Rp.cbegin(), Rj.cbegin()};Proposed fix (lines 253–262)
- auto& row_perm_vec2 = static_cast<std::vector<i_t>&>(row_perm); - auto& col_perm_vec2 = static_cast<std::vector<i_t>&>(col_perm); row_col_graph_t<i_t> graph{Rdeg.begin(), - row_perm_vec2.begin(), + row_perm.begin(), Rp.cbegin(), Rj.cbegin(), Cdeg.begin(), - col_perm_vec2.begin(), + col_perm.begin(), A.col_start.cbegin(), A.i.cbegin()};cpp/src/dual_simplex/phase2.cpp (1)
3490-3498: Redundant null check onwork_unit_context.Line 3490 already guards the block with
&& work_unit_context, so the inner check on line 3495 is redundant. Not harmful, but unnecessary.Proposed simplification
if ((iter % FEATURE_LOG_INTERVAL) == 0 && work_unit_context) { [[maybe_unused]] i_t iters_elapsed = iter - last_feature_log_iter; phase2_work_estimate += ft.work_estimate(); ft.clear_work_estimate(); - if (work_unit_context) { - work_unit_context->record_work_sync_on_horizon(phase2_work_estimate / 1e8); - } + work_unit_context->record_work_sync_on_horizon(phase2_work_estimate / 1e8); phase2_work_estimate = 0.0;
|
/ok to test 8766092 |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/dual_simplex/right_looking_lu.cpp (1)
314-523:⚠️ Potential issue | 🔴 CriticalFix iterator invalidation after
pop_back()in col_count/row_count removal loops.Computing
it - begin()afterpop_back()when the iterator pointed to the erased element is undefined behavior. If the found element is the last in the vector, the iterator becomes invalid after removal. Capture the index beforepop_back()to fix this pattern in four locations:update_Cdegree_and_col_count(line 337),update_Rdegree_and_row_count(line 373), and two removals insideschur_complement(lines 489 and 504).Suggested fix
- std::swap(*it, col_count[cdeg].back()); - col_count[cdeg].pop_back(); - work_estimate += (it - col_count[cdeg].begin()); + const auto idx = static_cast<i_t>(it - col_count[cdeg].begin()); + std::swap(*it, col_count[cdeg].back()); + col_count[cdeg].pop_back(); + work_estimate += idx;cpp/src/dual_simplex/phase2.cpp (1)
182-183:⚠️ Potential issue | 🟡 MinorFormat string bug:
%dused forf_t(double) argument.
lp.upper[leaving_index]isf_t(adouble), but the format specifier is%d(int). This is undefined behavior and will print garbage.Proposed fix
- printf( - "lower %e x %e upper %d\n", lp.lower[leaving_index], x[leaving_index], lp.upper[leaving_index]); + printf( + "lower %e x %e upper %e\n", lp.lower[leaving_index], x[leaving_index], lp.upper[leaving_index]);
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 1408-1409: The work_estimate is over-counting because after
v_sparse.scatter(v) only v_sparse.i.size() entries are written, but the code
multiplies by v.size() (the full dense length); update the charge to use the
sparse count instead. Replace the current increment using 2 * v.size() with 2 *
v_sparse.i.size() (or equivalent access to the sparse index count) so that
work_estimate reflects the actual number of scattered entries; ensure you use
the same v_sparse symbol used in the scatter call.
- Around line 2519-2525: The work estimate incorrectly charges only m for two
size-n copies: update the increment of phase2_work_estimate after the
copy-constructors of vstatus_old and z_old to account for both size-n copies
(e.g. replace phase2_work_estimate += m with phase2_work_estimate += 2 * n, or
+= 4 * n if you want to include allocation cost), ensuring the change is made
where vstatus_old and z_old are created so the estimate correctly reflects the
cost of copying those n-sized vectors.
🧹 Nitpick comments (1)
cpp/src/dual_simplex/phase2.cpp (1)
1185-1188:atilde_start_sizeshould be an integer type, notf_t.
atilde_index.size()returnssize_t, but it is stored inf_t(double). While practically safe for reasonable sizes, usingi_torsize_tis more appropriate and avoids potential precision loss for very large values. Same issue on line 1198.Proposed fix
- const f_t atilde_start_size = atilde_index.size(); + const i_t atilde_start_size = atilde_index.size();
|
/ok to test 9effcff |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/dual_simplex/phase2.cpp (1)
182-183:⚠️ Potential issue | 🟡 MinorFormat specifier bug:
%dused for adoubleargument.
lp.upper[leaving_index]is of typef_t(double) but is printed with%d, causing undefined behavior and garbage output.Proposed fix
- printf( - "lower %e x %e upper %d\n", lp.lower[leaving_index], x[leaving_index], lp.upper[leaving_index]); + printf( + "lower %e x %e upper %e\n", lp.lower[leaving_index], x[leaving_index], lp.upper[leaving_index]);
🧹 Nitpick comments (1)
cpp/src/dual_simplex/phase2.cpp (1)
3489-3495: Accumulated work is lost on non-periodic exit paths.When the main loop terminates (OPTIMAL at line 2852, or early returns like NUMERICAL/TIME_LIMIT/DUAL_UNBOUNDED), the
phase2_work_estimateaccumulated since the lastFEATURE_LOG_INTERVALcheckpoint is never recorded towork_unit_context. If precision of total work tracking matters, a final flush before each exit point would be needed.
aliceb-nv
left a comment
There was a problem hiding this comment.
Thanks a ton for this work Chris, that's a lot of changes :)
Just a few minor observations; we can merge as I observed no significant changes on MIPLIB2017 runs (same number of feasible, similar gap when running in determinism mode)
I wonder if we could unify some common operations through utility functions, say, a tracked_sort (that computes n log n) or tracked_copy when doing std::vector assignments. As it stands I'm afraid it will be all too easy to forget some work estimate updates :)
I'm also thinking that the work estimate updates tend to bloat the code a little, which isn't bad in itself (since we are after all performing more things), but it is interwoven with the rest of the business logic. Maybe there could be a way to make the work estimate logic clearer/more decoupled. But let's discuss that another time! I really appreciate all the work you've done on this.
| cuopt::scope_guard work_unit_guard([&]() { | ||
| i_t remaining_iters = iter - last_feature_log_iter; | ||
| if (remaining_iters <= 0) return; | ||
|
|
||
| auto [total_loads, total_stores] = aggregator.collect_and_flush(); | ||
| // features.byte_loads = total_loads; | ||
| // features.byte_stores = total_stores; | ||
|
|
||
| // f_t now = toc(start_time); | ||
| // features.interval_runtime = now - interval_start_time; | ||
| // interval_start_time = now; | ||
|
|
||
| // features.iteration = iter; | ||
| // features.num_refactors = num_refactors; | ||
| // features.num_basis_updates = ft.num_updates(); | ||
| // features.sparse_delta_z_count = sparse_delta_z; | ||
| // features.dense_delta_z_count = dense_delta_z; | ||
| // features.total_bound_flips = total_bound_flips; | ||
| // features.num_infeasibilities = infeasibility_indices.size(); | ||
| // features.delta_y_nz_percentage = delta_y_nz_percentage; | ||
| // features.log_features(settings); | ||
|
|
||
| if (work_unit_context) { | ||
| // TEMP; | ||
| work_unit_context->record_work_sync_on_horizon((total_loads + total_stores) / 1e8); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Was removing the scope guard intended? We lose the work estimate update if we do an early exit, for example on NUMERICAL
There was a problem hiding this comment.
Ah. I didn't understand how this work. This is supposed to run the code when it goes out of scope?
|
/merge |
This PR converts dual simplex to use manual work estimates to simplify the code and ease maintenance.
Work estimates are also calculated for more functions.