Parallel Branch-and-Bound#412
Conversation
…and diving threads.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 0e14fe2 |
Signed-off-by: nicolas <nguidotti@nvidia.com>
aliceb-nv
left a comment
There was a problem hiding this comment.
Thanks a lot for the great work and results Nicolas! Just a few minor nitpicks, otherwise looks good :) I'll let Chris review the algorithm side
| : std::thread::hardware_concurrency()), | ||
| num_threads(std::thread::hardware_concurrency() - 1), | ||
| num_bfs_threads(std::min(num_threads, 4)), | ||
| num_diving_threads(num_threads - num_bfs_threads), |
There was a problem hiding this comment.
Should we guarantee one diving thread at least maybe? The following code might make this assumption (and diving even with thread contention might be better than none at all)
There was a problem hiding this comment.
True. Also, if num_threads < 4, there will be have no diving threads. Instead, we can set something like this:
num_bfs_threads = std::min(1, num_threads / 4)
num_diving_threads = std::min(1, num_threads - num_bfs_threads)There was a problem hiding this comment.
I will experiment a bit with the different configurations
| branch_and_bound_settings.absolute_mip_gap_tol = context.settings.tolerances.absolute_mip_gap; | ||
| branch_and_bound_settings.relative_mip_gap_tol = context.settings.tolerances.relative_mip_gap; | ||
| branch_and_bound_settings.integer_tol = context.settings.tolerances.integrality_tolerance; | ||
| branch_and_bound_settings.num_threads = 1; |
There was a problem hiding this comment.
Have we experimented with this parameter? Submip tends to yield good solutions, it might perform better with a few more threads allocated
There was a problem hiding this comment.
Not yet. It was set to 1 to avoid oversubscription of the threads. I can allocate a few more threads to submip and see how it behaves.
There was a problem hiding this comment.
For now, I set to 1 best first thread and 1 diving thread. Later, we can experiment in allocating more threads to submip. @akifcorduk
| stack.push_front(second); | ||
| stack.push_front(first); | ||
|
|
||
| if (dive_queue_.size() < min_diving_queue_size_) { |
There was a problem hiding this comment.
It would be good to have a comment here about why you are adding to the dive_queue
There was a problem hiding this comment.
Is this the only place we add to the dive queue? I thought the BFS threads would also add to the dive queue
There was a problem hiding this comment.
The BFS also adds to the diving queue. It is around the line 856 in the explore_subtree routine.
| UNSET = 6, // The status is not set | ||
| }; | ||
|
|
||
| enum class mip_exploration_status_t { |
There was a problem hiding this comment.
Thanks for splitting this apart. I'm still a little confused why you need the status TIME_LIMIT, NODE_LIMIT, and NUMERICAL here. Why can't this status just be RUNNING or COMPLETED?
There was a problem hiding this comment.
The solver checks the (internal) status to determines when to stop (basically, when it is not RUNNING). The other status is to keep track of why we stopped the solver (it will be translated later to the mip_status_t).
There was a problem hiding this comment.
But if we hit a TIME_LIMIT or NODE_LIMIT shouldn't any part of the code be able to tell that just by checking the time and the number of nodes?
I think there is probably an opportunity to simplify the code a bit here and not mix the return status with whether branch and bound is running or not.
| FATHOMED = 3, // Node objective is greater than the upper bound | ||
| HAS_CHILDREN = 4, // Node has children to explore | ||
| NUMERICAL = 5, // Encountered numerical issue when solving the LP relaxation | ||
| TIME_LIMIT = 6 // Time out during the LP relaxation |
There was a problem hiding this comment.
Why is TIME_LIMIT needed here?
chris-maes
left a comment
There was a problem hiding this comment.
I'm removing the request for changes. But I'm still a bit fuzzy on how you handle nodes with numerical errors, in particular in the case where they are encountered by the diving thread.
I also a bit fuzzy on how the lower bound is handled in the ramp up phase versus normal BFS threads.
chris-maes
left a comment
There was a problem hiding this comment.
Trying to remove request changes
|
/merge |
This PR implement a parallel branch-and-bound procedure, which is split into two phases. In the first phase, the algorithm will greedily expand the search tree until a certain depth and then add the bottom nodes to a global heap. The parallel expansion is implemented using `omp task`. In the second phase, some threads will explore the tree using best first search with plunging, i.e., they take the first node from the global heap and then explore the entire branch that starts on this node. Any unexplored node are insert into the heap. The remaining threads will perform deep dives in order to find feasible solutions. The solver keep a small heap contains the most promising nodes to perform the dives, which is keep in sync with the global heap. This PR also - Replace the `std::thread`-based parallelization in the strong branching with OpenMP in order to use dynamic scheduling. This ensures that all threads have similar amount of work and improve parallel performance. - Fixed invalid memory access when trying to access the status of a fathomed node. - Replaced `std::mutex` with `omp atomic` whatever applicable. - Added dedicated classes `dive_queue_t` and `search_tree_t` to store the diving heap and the search tree, respectively. This is an extension of NVIDIA#305. Closes NVIDIA#320. Closes NVIDIA#417. ## Benchmark results (MIPLIB2017): master branch (53d6e74) ``` Average Gap: 0.2174861712 ``` This PR: ``` Average Gap: 0.1989485546 ``` i.e., a `1.8%` improvement. In terms of the geomean of the gap ratio, this is equal to `1.62x`. Authors: - Nicolas L. Guidotti (https://github.com/nguidotti) Approvers: - Rajesh Gandham (https://github.com/rg20) - Chris Maes (https://github.com/chris-maes) URL: NVIDIA#412
Description
This PR implement a parallel branch-and-bound procedure, which is split into two phases. In the first phase, the algorithm will greedily expand the search tree until a certain depth and then add the bottom nodes to a global heap. The parallel expansion is implemented using
omp task.In the second phase, some threads will explore the tree using best first search with plunging, i.e., they take the first node from the global heap and then explore the entire branch that starts on this node. Any unexplored node are insert into the heap. The remaining threads will perform deep dives in order to find feasible solutions. The solver keep a small heap contains the most promising nodes to perform the dives, which is keep in sync with the global heap.
This PR also
std::thread-based parallelization in the strong branching with OpenMP in order to use dynamic scheduling. This ensures that all threads have similar amount of work and improve parallel performance.std::mutexwithomp atomicwhatever applicable.dive_queue_tandsearch_tree_tto store the diving heap and the search tree, respectively.This is an extension of #305.
Closes #320.
Closes #417.
Benchmark results (MIPLIB2017):
master branch (53d6e74)
This PR:
i.e., a
1.8%improvement. In terms of the geomean of the gap ratio, this is equal to1.62x.Checklist