[Rule] LongestCommonSubsequence to MaximumIndependentSet

**Source:** LongestCommonSubsequence
**Target:** MaximumIndependentSet
**Motivation:** Connects string problems to graph theory; enables solving LCS via MaxIS solvers and reveals structural equivalence between subsequence ordering and graph independence.
**Reference:** [Apostolico & Guerra, 1987](https://doi.org/10.1007/BF01840365); [Santini, Blum, Djukanovic et al., 2021](https://doi.org/10.1016/j.cor.2020.105089)

## Reduction Algorithm

**Notation:**
- Source: $k$ strings $s_1, \ldots, s_k$ over alphabet $\Sigma$, with lengths $n_1, \ldots, n_k$
- Target: undirected graph $G = (V, E)$

**Step 1 — Construct match nodes $V$:**

A node $v = (p_1, p_2, \ldots, p_k)$ is created for each $k$-tuple of positions such that $s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]$ (all strings have the same character at their respective positions).

$$V = \{(p_1, \ldots, p_k) : s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]\}$$

**Step 2 — Construct conflict edges $E$:**

Two nodes $u = (a_1, \ldots, a_k)$ and $v = (b_1, \ldots, b_k)$ are connected by an edge if they **conflict** — i.e., they cannot both appear in a valid common subsequence. A conflict occurs when the position differences are not consistently ordered across all strings:

$$\{u, v\} \in E \iff \text{NOT}\ (\forall i: a_i < b_i) \ \text{AND NOT}\ (\forall i: a_i > b_i)$$

In other words, there exist indices $i, j$ such that $a_i < b_i$ but $a_j > b_j$ (a "crossing"), or $a_i = b_i$ for some $i$ (same position used twice).

**Variable mapping:** Each LCS character corresponds to a selected node; each MaxIS vertex in the independent set corresponds to a matched position tuple.

**Objective:** $\text{MaxIS}(G) = \text{LCS}(s_1, \ldots, s_k)$. An independent set of size $L$ gives a common subsequence of length $L$, and vice versa.

## Required Model Extension

The implementer **must** add the following getter to `LongestCommonSubsequence` in `src/models/misc/longest_common_subsequence.rs` in the same PR as the reduction rule:

```rust
/// Returns the cross-frequency product: the sum over each alphabet symbol
/// of the product of that symbol's frequency across all input strings.
///
/// Formally: Σ_{c ∈ 0..alphabet_size} ∏_{i=1..k} count(c, strings[i])
/// where count(c, s) is the number of occurrences of symbol c in string s.
///
/// This equals the exact number of match-node vertices in the LCS → MaxIS
/// reduction graph.
pub fn cross_frequency_product(&self) -> usize {
    (0..self.alphabet_size)
        .map(|c| {
            self.strings
                .iter()
                .map(|s| s.iter().filter(|&&sym| sym == c).count())
                .product::<usize>()
        })
        .sum()
}
```

This getter is referenced in the `#[reduction(overhead)]` attribute as `cross_frequency_product`.

## Size Overhead

| Target metric (code name) | Expression |
|---|---|
| `num_vertices` | `cross_frequency_product` |
| `num_edges` | `cross_frequency_product^2` |

Where `cross_frequency_product` = $\sum_{c \in \Sigma} \prod_{i=1}^{k} \text{count}(c, s_i)$. This equals the exact number of match nodes $|V|$. The edge count is a worst-case bound (all node pairs conflict).

## Validation Method

Closed-loop testing: solve LCS by brute-force (enumerate subsequences of shortest string), solve the reduced MaxIS instance, and verify that both give the same optimal value. Test on instances with varying alphabet sizes and string counts ($k = 2, 3, 4$).

## Example

**Source instance:** $k = 2$, $s_1$ = `ABAC`, $s_2$ = `BACA`, alphabet $\Sigma = \{A, B, C\}$.

**Step 1 — Match nodes:**

| Node | $s_1$ pos | $s_2$ pos | Character |
|:---:|:---:|:---:|:---:|
| $v_0$ | 0 | 1 | A |
| $v_1$ | 0 | 3 | A |
| $v_2$ | 1 | 0 | B |
| $v_3$ | 2 | 1 | A |
| $v_4$ | 2 | 3 | A |
| $v_5$ | 3 | 2 | C |

**Step 2 — Conflict edges (9 edges):**

| Edge | Reason |
|:---:|:---:|
| $v_0 - v_1$ | same $s_1$ position (0) |
| $v_0 - v_2$ | crossing: $s_1$: $0<1$, $s_2$: $1>0$ |
| $v_0 - v_3$ | same $s_2$ position (1) |
| $v_1 - v_2$ | crossing: $s_1$: $0<1$, $s_2$: $3>0$ |
| $v_1 - v_3$ | crossing: $s_1$: $0<2$, $s_2$: $3>1$ |
| $v_1 - v_4$ | same $s_2$ position (3) |
| $v_1 - v_5$ | crossing: $s_1$: $0<3$, $s_2$: $3>2$ |
| $v_3 - v_4$ | same $s_1$ position (2) |
| $v_4 - v_5$ | crossing: $s_1$: $2<3$, $s_2$: $3>2$ |

**MaxIS solution:** $\{v_2, v_3, v_5\}$ (size 3, unique).

| Node | Position | Char |
|:---:|:---:|:---:|
| $v_2$ | $s_1[1], s_2[0]$ | B |
| $v_3$ | $s_1[2], s_2[1]$ | A |
| $v_5$ | $s_1[3], s_2[2]$ | C |

Extracted LCS: **BAC** (length 3). Verified: B-A-C is a subsequence of both `ABAC` (positions 1,2,3) and `BACA` (positions 0,1,2).

Target metric (code name)	Expression
`num_vertices`	`cross_frequency_product`
`num_edges`	`cross_frequency_product^2`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rule] LongestCommonSubsequence to MaximumIndependentSet #109

Reduction Algorithm

Required Model Extension

Size Overhead

Validation Method

Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Edge	Reason
$v_0 - v_1$	same $s_1$ position (0)
$v_0 - v_2$	crossing: $s_1$: $0<1$, $s_2$: $1>0$
$v_0 - v_3$	same $s_2$ position (1)
$v_1 - v_2$	crossing: $s_1$: $0<1$, $s_2$: $3>0$
$v_1 - v_3$	crossing: $s_1$: $0<2$, $s_2$: $3>1$
$v_1 - v_4$	same $s_2$ position (3)
$v_1 - v_5$	crossing: $s_1$: $0<3$, $s_2$: $3>2$
$v_3 - v_4$	same $s_1$ position (2)
$v_4 - v_5$	crossing: $s_1$: $2<3$, $s_2$: $3>2$

Node	Position	Char
$v_2$	$s_1[1], s_2[0]$	B
$v_3$	$s_1[2], s_2[1]$	A
$v_5$	$s_1[3], s_2[2]$	C

[Rule] LongestCommonSubsequence to MaximumIndependentSet #109

Description

Reduction Algorithm

Required Model Extension

Size Overhead

Validation Method

Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions