Skip to content

[Rule] LongestCommonSubsequence to ILP #110

@zazabap

Description

@zazabap

Source: LongestCommonSubsequence
Target: ILP
Motivation: Enables solving LCS instances via ILP solvers; the match-pair formulation naturally encodes subsequence ordering as linear constraints.
Reference: Blum et al., 2021 — discusses ILP formulation with match-pair binary variables and conflict/no-crossing constraints for LCS (Section 2, ILP model); the match-pair ILP formulation for sequence problems is also described in Althaus et al., 2006 in the context of sequence alignment.

Reduction Algorithm

Notation:

  • Source: $k = 2$ strings $s_1$ (length $n_1$) and $s_2$ (length $n_2$) over alphabet $\Sigma$
  • Target: ILP with binary variables, linear constraints, and a maximize objective

Scope: This reduction handles the 2-string LCS case only.

Variable mapping:

For each pair of positions $(j_1, j_2)$ where $s_1[j_1] = s_2[j_2]$, create a binary variable:

$$m_{j_1, j_2} \in {0, 1}$$

meaning "position $j_1$ of $s_1$ is matched to position $j_2$ of $s_2$ in the common subsequence."

Total variables: $|M|$ where $M = {(j_1, j_2) : s_1[j_1] = s_2[j_2]}$, bounded by $n_1 \cdot n_2$.

Constraints:

  1. Each position in $s_1$ matched at most once: for each $j_1 \in {0, \ldots, n_1-1}$,
    $$\sum_{j_2 : (j_1,j_2) \in M} m_{j_1, j_2} \leq 1$$

  2. Each position in $s_2$ matched at most once: for each $j_2 \in {0, \ldots, n_2-1}$,
    $$\sum_{j_1 : (j_1,j_2) \in M} m_{j_1, j_2} \leq 1$$

  3. Order preservation (no crossings): for all $(j_1, j_2), (j_1', j_2') \in M$ with $j_1 < j_1'$ and $j_2 > j_2'$,
    $$m_{j_1, j_2} + m_{j_1', j_2'} \leq 1$$

Objective: maximize $\sum_{(j_1,j_2) \in M} m_{j_1, j_2}$

Solution extraction: The matched positions with $m_{j_1,j_2} = 1$, sorted by $j_1$, give the characters of the LCS.

Size Overhead

Worst-case bounds (all characters match, i.e., $|M| = n_1 \cdot n_2$):

Target metric (code name) Worst-case bound
num_vars $n_1 \cdot n_2$
num_constraints $n_1 + n_2 + \binom{n_1 \cdot n_2}{2}$

In overhead expression syntax (using source problem getters):

  • num_vars = num_chars_first * num_chars_second
  • num_constraints = num_chars_first + num_chars_second + (num_chars_first * num_chars_second) ^ 2

Note: the actual number of variables and crossing constraints depends on the input content (character matches), so these are upper bounds. The crossing constraint count $\binom{|M|}{2}$ is $O(n_1^2 \cdot n_2^2)$ in the worst case.

Validation Method

Closed-loop testing: solve LCS by brute-force, solve the reduced ILP, and verify that both give the same optimal objective. Test with varying alphabet sizes and string lengths, including edge cases (identical strings, no common characters, single-character alphabet).

Example

Source instance: $s_1$ = ABAC, $s_2$ = BACA, alphabet $\Sigma = {A, B, C}$.

Variables (6 match pairs):

Variable $s_1$ pos $s_2$ pos Character
$m_{0,1}$ 0 1 A
$m_{0,3}$ 0 3 A
$m_{1,0}$ 1 0 B
$m_{2,1}$ 2 1 A
$m_{2,3}$ 2 3 A
$m_{3,2}$ 3 2 C

Constraints (13 total):

Assignment for $s_1$: $m_{0,1} + m_{0,3} \leq 1$, $m_{2,1} + m_{2,3} \leq 1$ (plus 2 trivial single-variable constraints).

Assignment for $s_2$: $m_{0,1} + m_{2,1} \leq 1$, $m_{0,3} + m_{2,3} \leq 1$ (plus 2 trivial).

No-crossing: $m_{0,1} + m_{1,0} \leq 1$, $m_{0,3} + m_{1,0} \leq 1$, $m_{0,3} + m_{2,1} \leq 1$, $m_{0,3} + m_{3,2} \leq 1$, $m_{2,3} + m_{3,2} \leq 1$.

Objective: maximize $m_{0,1} + m_{0,3} + m_{1,0} + m_{2,1} + m_{2,3} + m_{3,2}$

Optimal ILP solution: objective = 3, with $m_{1,0} = m_{2,1} = m_{3,2} = 1$ (all others 0).

Extracted LCS: $s_1[1] = s_2[0]$ = B, $s_1[2] = s_2[1]$ = A, $s_1[3] = s_2[2]$ = C → BAC (length 3), matching the brute-force LCS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.ruleA new reduction rule to be added.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions