Fix out-of-bound access in compute_max_variable_violation.#146
Fix out-of-bound access in compute_max_variable_violation.#146legrosbuffle wants to merge 5 commits intoNVIDIA:branch-25.10from
compute_max_variable_violation.#146Conversation
When [there are free variables](https://github.com/NVIDIA/cuopt/blob/63fbb6b22c8949798fffe8cb34ace85ad203f2bb/cpp/src/mip/problem/problem.cu#L1234), the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem. In that case, when computing max violation we're accessing out-of-bounds data. This is already covered by `bound_standardization_test`.
Actually I'm seeing that same pattern of OOB accesses in other places elsewhere, so I'm suspecting that there is a larger pattern of broken invariants with the model. I'm opening a bug to discuss that. |
akifcorduk
left a comment
There was a problem hiding this comment.
What's happenning here is actually the opposite. assignment is resized to original problem before we return to the user, but the internal problem_ptr still has the sizes of the modified problem.
| f_t solution_t<i_t, f_t>::compute_max_variable_violation() | ||
| { | ||
| const auto num_variables = view().assignment.size(); | ||
| assert(problem_ptr->variable_lower_bounds.size() >= num_variables); |
There was a problem hiding this comment.
The standard is to use cuopt_assert to convey the error message. There, we also have the control to enable and disable asserts easier.
There was a problem hiding this comment.
I also think it is not possible to infer any asserts here. The size might be greater or lower. The reason is that, we might be eliminating some vars in presolve, or we might be adding some vars because of free vars.
|
/ok to test 4cb0d22 |
|
/ok to test 11eadc5 |
|
Looks like this is break on assert, perhaps @akifcorduk comment still holds after trivial presolve updated 233 constraints 2009 variables. Objective offset 0.000000 |
|
@akifcorduk what's the next course of action here ? |
|
Moving to 25.10 milestone |
|
I couldn't reproduce this issue. It might have been fixed by one of the PRs or this is specific to the custom environment that the OP is using. |
|
@akifcorduk Could this have already been fixed by one of the merged PRs? |
I just checked after updating (at f298994), the invalid memory accesses are still happening. |
|
What's expected of me here ? |
Nah, I just assigned it to you since you are the owner and added awaiting response for @akifcorduk to get back to your question. |
Ah ok, thanks. |
|
@legrosbuffle could you give me the instructions to reproduce this issue? Data, machine, compiler, settings etc. |
I thought you had a repro here: https://github.com/NVIDIA/cuopt/actions/runs/16776602702/job/47507156031?pr=258#step:10:2580 ? For the data, this simply happens in several of the existing unit tests (see the bug for details: #150. Compiler & settings: Unfortunately we're using a custom toolchain (based on clang clang + libc++) and the only machines with GPUs I have access to require using that toolchain, so I can't give you a usable command-line for repro. But I can reproduce in two different ways: with asserts on (which triggers bound cheking errors in the |
|
/ok to test 35450be |
|
@akifcorduk @legrosbuffle I fixed a merge conflict in the last commit, please revert if there any mistakes. |
|
I can't reproduce the issue in 25.10. Instead the code is failing later on different OOB. The bug in the latter is more obvious and the fix is here: 346 |
When there are free variables, the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem.
In that case, when computing max violation we're accessing out-of-bounds data.
This is already covered by
bound_standardization_test.I'm not quite sure about whether
assert(problem_ptr->variable_lower_bounds.size() >= num_variables);should actually be an equality (i.e., how/if the mapping from variables to bounds is handled). But at least this is strictly better than out-of-bound accesses.