🧩 Constraint Solving POTD:Hybrid CP/ML — Learning to Search Smarter #25474

2026-04-09T12:56:53Z

github-actions[bot]
Bot Apr 9, 2026

Category: Emerging Topics · Date: 2026-04-09

Classical constraint solvers spend enormous effort on search: which variable to branch on next, which value to try first, when to restart. These decisions are typically guided by hand-crafted heuristics. Hybrid CP/ML is the emerging discipline of replacing or augmenting those heuristics with learned models — letting a solver get smarter with experience.

Today's problem is a perfect vehicle for exploring this idea.

Problem Statement

Combinatorial Optimisation via Learned Heuristics

Consider a family of related instances of the same combinatorial problem (e.g., hundreds of job-shop scheduling instances drawn from the same factory). A solver tackling instance k could, in principle, exploit patterns observed while solving instances 1 … k−1.

Concrete instance — a tiny job-shop family:

3 machines, 4 jobs, each job has 3 operations.
Processing times drawn uniformly from [1, 10].
Goal: minimise makespan.
```

A vanilla solver uses the **Minimum Remaining Values (MRV)** heuristic and **Least Constraining Value (LCV)**. The question is: can we learn a *better* branching policy by training on solved instances?

**Input:** A feature vector describing the current solver state (domain sizes, constraint degrees, LP relaxation values, …).  
**Output:** A ranking of candidate branching decisions.

---

### Why It Matters

- **Manufacturing & logistics**: The same factory produces a new scheduling instance every shift. Learned heuristics warm-start the solver and cut planning time by 30–80% on benchmarks.
- **Configuration & planning**: E-commerce product configurators and cloud resource allocators face millions of near-identical problems per day — a policy trained offline pays dividends at every query.
- **Autonomous research tools**: Solver portfolios and automated algorithm selection already use ML; learning *within* a search is the natural next step.

---

### Modelling Approaches

#### Approach 1 — CP with a Learned Branching Policy (Imitation Learning)

Collect *oracle* traces: run a strong solver (or exact branch-and-bound) on training instances and record the branching decisions it made.  
Train a classifier `π(s) → variable` and a regressor `ρ(s, x) → value` on these traces.  
At solve time, replace the static heuristic with `π` and `ρ`.

**Decision variables (job-shop example):**

```
start[j,o] ∈ [0..H]   // start time of operation o of job j
machine[j,o] = given   // machine assignment (fixed in job-shop)
```

**Key constraints:**

```
∀ j, o:  start[j,o+1] ≥ start[j,o] + duration[j,o]          // precedence
∀ m, (j,o),(j',o') on same machine:
         start[j,o] + d[j,o] ≤ start[j',o']  ∨  vice-versa  // no overlap
makespan ≥ start[j,last(j)] + d[j,last(j)]   ∀ j
Minimise makespan

Trade-offs:

Fast inference (one forward pass through a neural net).

Policy is only as good as the training distribution; out-of-distribution instances may be worse than the static heuristic.

Approach 2 — Reinforcement Learning over the Search Tree (RL + CP)

Model the solver's branching sequence as a Markov Decision Process:

State s_t: current partial assignment + propagation state.
Action a_t: choose variable x_i and value v.
Reward: −1 per step (encourage short proofs) + bonus when a solution is found.

Train a policy network (e.g., Graph Neural Network over the constraint graph) using policy-gradient methods (PPO, A3C) or actor-critic.

Trade-offs:

Does not require expert oracle traces — learns from scratch.

Training is expensive; reward signal is very sparse; hard to guarantee completeness.

Example: Imitation-Learning Branching in OR-Tools (Python sketch)

from ortools.sat.python import cp_model
import numpy as np

# Assume `policy_net` is a pre-trained PyTorch model
# that maps feature vectors to variable scores.

class LearnedBranchingCallback(cp_model.CpSolverSolutionCallback):
    def __init__(self, variables, policy_net):
        super().__init__()
        self.variables = variables
        self.policy_net = policy_net

    def on_solution_callback(self):
        pass   # record improving solutions

def extract_features(solver, variables):
    """Return a feature matrix: one row per unassigned variable."""
    features = []
    for v in variables:
        if not solver.Value(v):         # unassigned proxy
            domain_size = v.Proto().domain  # simplified
            features.append([domain_size, ...])
    return np.array(features, dtype=np.float32)

# Inside a custom search procedure (pseudo-code):
# 1. Extract features from current state.
# 2. Score variables with policy_net.
# 3. Branch on highest-scoring variable, lowest-scoring value.
# 4. Recurse; on failure, backtrack normally.

Note: OR-Tools CP-SAT does not expose a mid-search callback for branching. This sketch illustrates the concept; production implementations typically extend open-source solvers like Chuffed or use frameworks such as ECole or RL4CO.

Key Techniques

1. Graph Neural Networks (GNNs) for State Representation

The bipartite variable–constraint graph is a natural representation of a CSP state. GNNs propagate information along edges (constraints), giving each variable node a rich embedding that captures neighbourhood structure — far more informative than scalar heuristic values.

2. Curriculum Learning

Training directly on hard instances is slow. Curriculum learning starts with tiny, easy instances where the reward signal is dense, then gradually increases difficulty. This mirrors how human students learn and dramatically accelerates policy convergence.

3. Portfolio & Algorithm Selection as Warm ML Baseline

Before training a deep policy, consider the simpler ML-over-features approach: train a classifier to pick which static heuristic works best for a given instance (SBS — Single Best Solver selection). Tools like AS-LIB and COSEAL provide benchmarks. This is often 80% of the gain at 5% of the engineering cost.

Challenge Corner

Open Question: A learned branching policy may improve average-case performance but could catastrophically worsen worst-case behaviour on out-of-distribution instances. Can you design a fallback mechanism that detects when the policy is unreliable and reverts to a classical heuristic? What features would signal "I'm outside the training distribution"?

Bonus: How would you extend a GNN-based policy to handle dynamic problems, where new jobs arrive while the solver is already running?

References

Bengio, Y., Lodi, A., & Prouvost, A. (2021). Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon. European Journal of Operational Research, 290(2), 405–421. — The canonical survey.
Gasse, M., Chételat, D., Ferroni, N., Charlin, L., & Lodi, A. (2019). Exact Combinatorial Optimization with Graph Convolutional Neural Networks. NeurIPS 2019. — Introduced GNN-based branching for MIP; ideas transfer directly to CP.
Cappart, Q., Chételat, D., Khalil, E., Lodi, A., Morris, C., & Veličković, P. (2023). Combinatorial Optimization and Reasoning with Graph Neural Networks. JMLR, 24(130). — Comprehensive treatment of GNNs for CO.
ECole library — https://github.com/ds4dm/ecole — Open-source framework for learning-augmented branch-and-bound; provides Gym-style environments wrapping SCIP.

Have you experimented with learned heuristics in a constraint solver? Share your experience or questions below! 🧩

Generated by Constraint Solving — Problem of the Day · ● 173.7K · ◷

expires on Apr 16, 2026, 12:56 PM UTC

2026-04-10T12:55:45Z

github-actions[bot]
Bot Apr 10, 2026
Author

This discussion has been marked as outdated by Constraint Solving — Problem of the Day.

A newer discussion is available at Discussion #25647.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧩 Constraint Solving POTD:Hybrid CP/ML — Learning to Search Smarter #25474

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

🧩 Constraint Solving POTD:Hybrid CP/ML — Learning to Search Smarter #25474

Uh oh!

github-actions[bot] Bot Apr 9, 2026

Problem Statement

Approach 2 — Reinforcement Learning over the Search Tree (RL + CP)

Key Techniques

1. Graph Neural Networks (GNNs) for State Representation

2. Curriculum Learning

3. Portfolio & Algorithm Selection as Warm ML Baseline

Challenge Corner

References

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Apr 10, 2026 Author

github-actions[bot]
Bot Apr 9, 2026

github-actions[bot]
Bot Apr 10, 2026
Author