π§© Constraint Solving POTD:Hybrid CP/ML β Learning to Search Smarter #25474
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Constraint Solving β Problem of the Day. A newer discussion is available at Discussion #25647. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Category: Emerging Topics Β· Date: 2026-04-09
Classical constraint solvers spend enormous effort on search: which variable to branch on next, which value to try first, when to restart. These decisions are typically guided by hand-crafted heuristics. Hybrid CP/ML is the emerging discipline of replacing or augmenting those heuristics with learned models β letting a solver get smarter with experience.
Today's problem is a perfect vehicle for exploring this idea.
Problem Statement
Combinatorial Optimisation via Learned Heuristics
Consider a family of related instances of the same combinatorial problem (e.g., hundreds of job-shop scheduling instances drawn from the same factory). A solver tackling instance
kcould, in principle, exploit patterns observed while solving instances1 β¦ kβ1.Concrete instance β a tiny job-shop family:
Trade-offs:
Approach 2 β Reinforcement Learning over the Search Tree (RL + CP)
Model the solver's branching sequence as a Markov Decision Process:
s_t: current partial assignment + propagation state.a_t: choose variablex_iand valuev.β1per step (encourage short proofs) + bonus when a solution is found.Train a policy network (e.g., Graph Neural Network over the constraint graph) using policy-gradient methods (PPO, A3C) or actor-critic.
Trade-offs:
Example: Imitation-Learning Branching in OR-Tools (Python sketch)
Key Techniques
1. Graph Neural Networks (GNNs) for State Representation
The bipartite variableβconstraint graph is a natural representation of a CSP state. GNNs propagate information along edges (constraints), giving each variable node a rich embedding that captures neighbourhood structure β far more informative than scalar heuristic values.
2. Curriculum Learning
Training directly on hard instances is slow. Curriculum learning starts with tiny, easy instances where the reward signal is dense, then gradually increases difficulty. This mirrors how human students learn and dramatically accelerates policy convergence.
3. Portfolio & Algorithm Selection as Warm ML Baseline
Before training a deep policy, consider the simpler ML-over-features approach: train a classifier to pick which static heuristic works best for a given instance (SBS β Single Best Solver selection). Tools like AS-LIB and COSEAL provide benchmarks. This is often 80% of the gain at 5% of the engineering cost.
Challenge Corner
Bonus: How would you extend a GNN-based policy to handle dynamic problems, where new jobs arrive while the solver is already running?
References
Bengio, Y., Lodi, A., & Prouvost, A. (2021). Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon. European Journal of Operational Research, 290(2), 405β421. β The canonical survey.
Gasse, M., ChΓ©telat, D., Ferroni, N., Charlin, L., & Lodi, A. (2019). Exact Combinatorial Optimization with Graph Convolutional Neural Networks. NeurIPS 2019. β Introduced GNN-based branching for MIP; ideas transfer directly to CP.
Cappart, Q., ChΓ©telat, D., Khalil, E., Lodi, A., Morris, C., & VeliΔkoviΔ, P. (2023). Combinatorial Optimization and Reasoning with Graph Neural Networks. JMLR, 24(130). β Comprehensive treatment of GNNs for CO.
ECole library β
https://github.com/ds4dm/ecoleβ Open-source framework for learning-augmented branch-and-bound; provides Gym-style environments wrapping SCIP.Have you experimented with learned heuristics in a constraint solver? Share your experience or questions below! π§©
Beta Was this translation helpful? Give feedback.
All reactions