Source: LongestCommonSubsequence
Target: MaximumIndependentSet
Motivation: Connects string problems to graph theory; enables solving LCS via MaxIS solvers and reveals structural equivalence between subsequence ordering and graph independence.
Reference: Apostolico & Guerra, 1987; Santini, Blum, Djukanovic et al., 2021
Reduction Algorithm
Notation:
- Source: $k$ strings $s_1, \ldots, s_k$ over alphabet $\Sigma$, with lengths $n_1, \ldots, n_k$
- Target: undirected graph $G = (V, E)$
Step 1 — Construct match nodes $V$:
A node $v = (p_1, p_2, \ldots, p_k)$ is created for each $k$-tuple of positions such that $s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]$ (all strings have the same character at their respective positions).
$$V = {(p_1, \ldots, p_k) : s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]}$$
Step 2 — Construct conflict edges $E$:
Two nodes $u = (a_1, \ldots, a_k)$ and $v = (b_1, \ldots, b_k)$ are connected by an edge if they conflict — i.e., they cannot both appear in a valid common subsequence. A conflict occurs when the position differences are not consistently ordered across all strings:
$${u, v} \in E \iff \text{NOT}\ (\forall i: a_i < b_i) \ \text{AND NOT}\ (\forall i: a_i > b_i)$$
In other words, there exist indices $i, j$ such that $a_i < b_i$ but $a_j > b_j$ (a "crossing"), or $a_i = b_i$ for some $i$ (same position used twice).
Variable mapping: Each LCS character corresponds to a selected node; each MaxIS vertex in the independent set corresponds to a matched position tuple.
Objective: $\text{MaxIS}(G) = \text{LCS}(s_1, \ldots, s_k)$. An independent set of size $L$ gives a common subsequence of length $L$, and vice versa.
Required Model Extension
The implementer must add the following getter to LongestCommonSubsequence in src/models/misc/longest_common_subsequence.rs in the same PR as the reduction rule:
/// Returns the cross-frequency product: the sum over each alphabet symbol
/// of the product of that symbol's frequency across all input strings.
///
/// Formally: Σ_{c ∈ 0..alphabet_size} ∏_{i=1..k} count(c, strings[i])
/// where count(c, s) is the number of occurrences of symbol c in string s.
///
/// This equals the exact number of match-node vertices in the LCS → MaxIS
/// reduction graph.
pub fn cross_frequency_product(&self) -> usize {
(0..self.alphabet_size)
.map(|c| {
self.strings
.iter()
.map(|s| s.iter().filter(|&&sym| sym == c).count())
.product::<usize>()
})
.sum()
}
This getter is referenced in the #[reduction(overhead)] attribute as cross_frequency_product.
Size Overhead
| Target metric (code name) |
Expression |
num_vertices |
cross_frequency_product |
num_edges |
cross_frequency_product^2 |
Where cross_frequency_product = $\sum_{c \in \Sigma} \prod_{i=1}^{k} \text{count}(c, s_i)$. This equals the exact number of match nodes $|V|$. The edge count is a worst-case bound (all node pairs conflict).
Validation Method
Closed-loop testing: solve LCS by brute-force (enumerate subsequences of shortest string), solve the reduced MaxIS instance, and verify that both give the same optimal value. Test on instances with varying alphabet sizes and string counts ($k = 2, 3, 4$).
Example
Source instance: $k = 2$, $s_1$ = ABAC, $s_2$ = BACA, alphabet $\Sigma = {A, B, C}$.
Step 1 — Match nodes:
| Node |
$s_1$ pos |
$s_2$ pos |
Character |
| $v_0$ |
0 |
1 |
A |
| $v_1$ |
0 |
3 |
A |
| $v_2$ |
1 |
0 |
B |
| $v_3$ |
2 |
1 |
A |
| $v_4$ |
2 |
3 |
A |
| $v_5$ |
3 |
2 |
C |
Step 2 — Conflict edges (9 edges):
| Edge |
Reason |
| $v_0 - v_1$ |
same $s_1$ position (0) |
| $v_0 - v_2$ |
crossing: $s_1$: $0<1$, $s_2$: $1>0$
|
| $v_0 - v_3$ |
same $s_2$ position (1) |
| $v_1 - v_2$ |
crossing: $s_1$: $0<1$, $s_2$: $3>0$
|
| $v_1 - v_3$ |
crossing: $s_1$: $0<2$, $s_2$: $3>1$
|
| $v_1 - v_4$ |
same $s_2$ position (3) |
| $v_1 - v_5$ |
crossing: $s_1$: $0<3$, $s_2$: $3>2$
|
| $v_3 - v_4$ |
same $s_1$ position (2) |
| $v_4 - v_5$ |
crossing: $s_1$: $2<3$, $s_2$: $3>2$
|
MaxIS solution: ${v_2, v_3, v_5}$ (size 3, unique).
| Node |
Position |
Char |
| $v_2$ |
$s_1[1], s_2[0]$ |
B |
| $v_3$ |
$s_1[2], s_2[1]$ |
A |
| $v_5$ |
$s_1[3], s_2[2]$ |
C |
Extracted LCS: BAC (length 3). Verified: B-A-C is a subsequence of both ABAC (positions 1,2,3) and BACA (positions 0,1,2).
Source: LongestCommonSubsequence
Target: MaximumIndependentSet
Motivation: Connects string problems to graph theory; enables solving LCS via MaxIS solvers and reveals structural equivalence between subsequence ordering and graph independence.
Reference: Apostolico & Guerra, 1987; Santini, Blum, Djukanovic et al., 2021
Reduction Algorithm
Notation:
Step 1 — Construct match nodes$V$ :
A node$v = (p_1, p_2, \ldots, p_k)$ is created for each $k$ -tuple of positions such that $s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]$ (all strings have the same character at their respective positions).
Step 2 — Construct conflict edges$E$ :
Two nodes$u = (a_1, \ldots, a_k)$ and $v = (b_1, \ldots, b_k)$ are connected by an edge if they conflict — i.e., they cannot both appear in a valid common subsequence. A conflict occurs when the position differences are not consistently ordered across all strings:
In other words, there exist indices$i, j$ such that $a_i < b_i$ but $a_j > b_j$ (a "crossing"), or $a_i = b_i$ for some $i$ (same position used twice).
Variable mapping: Each LCS character corresponds to a selected node; each MaxIS vertex in the independent set corresponds to a matched position tuple.
Objective:$\text{MaxIS}(G) = \text{LCS}(s_1, \ldots, s_k)$ . An independent set of size $L$ gives a common subsequence of length $L$ , and vice versa.
Required Model Extension
The implementer must add the following getter to
LongestCommonSubsequenceinsrc/models/misc/longest_common_subsequence.rsin the same PR as the reduction rule:This getter is referenced in the
#[reduction(overhead)]attribute ascross_frequency_product.Size Overhead
num_verticescross_frequency_productnum_edgescross_frequency_product^2Where$\sum_{c \in \Sigma} \prod_{i=1}^{k} \text{count}(c, s_i)$ . This equals the exact number of match nodes $|V|$ . The edge count is a worst-case bound (all node pairs conflict).
cross_frequency_product=Validation Method
Closed-loop testing: solve LCS by brute-force (enumerate subsequences of shortest string), solve the reduced MaxIS instance, and verify that both give the same optimal value. Test on instances with varying alphabet sizes and string counts ($k = 2, 3, 4$ ).
Example
Source instance:$k = 2$ , $s_1$ = $s_2$ = $\Sigma = {A, B, C}$ .
ABAC,BACA, alphabetStep 1 — Match nodes:
Step 2 — Conflict edges (9 edges):
MaxIS solution:${v_2, v_3, v_5}$ (size 3, unique).
Extracted LCS: BAC (length 3). Verified: B-A-C is a subsequence of both
ABAC(positions 1,2,3) andBACA(positions 0,1,2).