Skip to content

Fix race condition in add_external_solutions_to_population#909

Merged
rapids-bot[bot] merged 2 commits intoNVIDIA:mainfrom
akifcorduk:fix_race_cond
Feb 27, 2026
Merged

Fix race condition in add_external_solutions_to_population#909
rapids-bot[bot] merged 2 commits intoNVIDIA:mainfrom
akifcorduk:fix_race_cond

Conversation

@akifcorduk
Copy link
Copy Markdown
Contributor

@akifcorduk akifcorduk commented Feb 27, 2026

B&B preemption will be set right after the solution is set but if we don't acquire the mutex before the early return check in population, the writes to solutions_in_external_queue_ is not visible to heuristic thread. Early return then finishes the heuristics and returns no solution.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed a race condition so external solutions are always processed safely, improving thread safety and stability when handling concurrent incoming solutions.

@akifcorduk akifcorduk requested a review from a team as a code owner February 27, 2026 07:54
@akifcorduk akifcorduk requested a review from Kh4ster February 27, 2026 07:54
@akifcorduk akifcorduk added the bug Something isn't working label Feb 27, 2026
@akifcorduk akifcorduk added the non-breaking Introduces a non-breaking change label Feb 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8ad1a9 and df78b71.

📒 Files selected for processing (1)
  • cpp/src/mip_heuristics/diversity/population.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/mip_heuristics/diversity/population.cu

📝 Walkthrough

Walkthrough

Remove early-exit guard in external-solution addition; the function now always calls get_external_solutions() (which acquires solution_mutex) and proceeds to add solutions, with an added comment clarifying mutex acquisition to avoid race conditions.

Changes

Cohort / File(s) Summary
External solution queue handling
cpp/src/mip_heuristics/diversity/population.cu
Removed early return based on solutions_in_external_queue_; always invokes get_external_solutions() (which now performs mutex acquisition) and proceeds to add external solutions. Added comment about mutex necessity to prevent races.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Fix race condition in add_external_solutions_to_population' directly and accurately describes the main change in the changeset, which removes an early exit guard to fix a race condition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@aliceb-nv aliceb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super good catch Akif, thanks! Just one remark regarding a possible race condition

Comment on lines -201 to 210
std::lock_guard<std::mutex> lock(solution_mutex);
std::vector<solution_t<i_t, f_t>> return_vector;
i_t counter = 0;
f_t new_best_feasible_objective = best_feasible_objective;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_external_solutions() is called directly by diversity_manager, we may still need the lock in certain cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good catch. then let's keep it as it is before but skip the early return check.

@anandhkb anandhkb added this to the 26.04 milestone Feb 27, 2026
@akifcorduk
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 6d668c4 into NVIDIA:main Feb 27, 2026
168 of 171 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants