Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions docs/paper/reductions.typ
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
"SubsetSum": [Subset Sum],
"MinimumFeedbackArcSet": [Minimum Feedback Arc Set],
"MinimumFeedbackVertexSet": [Minimum Feedback Vertex Set],
"ShortestCommonSupersequence": [Shortest Common Supersequence],
"MinimumSumMulticenter": [Minimum Sum Multicenter],
"SubgraphIsomorphism": [Subgraph Isomorphism],
"SubsetSum": [Subset Sum],
Expand Down Expand Up @@ -1038,6 +1039,66 @@ Biclique Cover is equivalent to factoring the biadjacency matrix $M$ of the bipa
*Example.* Let $A = {3, 7, 1, 8, 2, 4}$ ($n = 6$) and target $B = 11$. Selecting $A' = {3, 8}$ gives sum $3 + 8 = 11 = B$. Another solution: $A' = {7, 4}$ with sum $7 + 4 = 11 = B$.
]

#problem-def("ShortestCommonSupersequence")[
Given a finite alphabet $Sigma$, a set $R = {r_1, dots, r_m}$ of strings over $Sigma^*$, and a positive integer $K$, determine whether there exists a string $w in Sigma^*$ with $|w| lt.eq K$ such that every string $r_i in R$ is a _subsequence_ of $w$: there exist indices $1 lt.eq j_1 < j_2 < dots < j_(|r_i|) lt.eq |w|$ with $w[j_k] = r_i [k]$ for all $k$.
][
A classic NP-complete string problem, listed as problem SR8 in Garey and Johnson @garey1979. #cite(<maier1978>, form: "prose") proved NP-completeness; #cite(<raiha1981>, form: "prose") showed the problem remains NP-complete even over a binary alphabet ($|Sigma| = 2$). Note that _subsequence_ (characters may be non-contiguous) differs from _substring_ (contiguous block): the Shortest Common Supersequence asks that each input string can be embedded into $w$ by selecting characters in order but not necessarily adjacently.

For $|R| = 2$ strings, the problem is solvable in polynomial time via the duality with the Longest Common Subsequence (LCS): if $"LCS"(r_1, r_2)$ has length $ell$, then the shortest common supersequence has length $|r_1| + |r_2| - ell$, computable in $O(|r_1| dot |r_2|)$ time by dynamic programming. For general $|R| = m$, the brute-force search over all strings of length at most $K$ takes $O(|Sigma|^K)$ time. Applications include bioinformatics (reconstructing ancestral sequences from fragments), data compression (representing multiple strings compactly), and scheduling (merging instruction sequences).

*Example.* Let $Sigma = {a, b, c}$ and $R = {"abc", "bac"}$. We seek the shortest string $w$ containing both $"abc"$ and $"bac"$ as subsequences.

#figure({
let w = ("b", "a", "b", "c")
let r1 = ("a", "b", "c") // "abc"
let r2 = ("b", "a", "c") // "bac"
let embed1 = (1, 2, 3) // positions of a, b, c in w (0-indexed)
let embed2 = (0, 1, 3) // positions of b, a, c in w (0-indexed)
let blue = graph-colors.at(0)
let teal = rgb("#76b7b2")
let red = graph-colors.at(1)
align(center, stack(dir: ttb, spacing: 0.6cm,
// Row 1: the supersequence w
stack(dir: ltr, spacing: 0pt,
box(width: 1.2cm, height: 0.5cm, align(center + horizon, text(8pt)[$w =$])),
..w.enumerate().map(((i, ch)) => {
let is1 = embed1.contains(i)
let is2 = embed2.contains(i)
let fill = if is1 and is2 { blue.transparentize(60%) } else if is1 { blue.transparentize(80%) } else if is2 { teal.transparentize(80%) } else { white }
box(width: 0.55cm, height: 0.55cm, fill: fill, stroke: 0.5pt + luma(120),
align(center + horizon, text(9pt, weight: "bold", ch)))
}),
),
// Row 2: embedding of r1
stack(dir: ltr, spacing: 0pt,
box(width: 1.2cm, height: 0.5cm, align(center + horizon, text(8pt, fill: blue)[$r_1 =$])),
..range(w.len()).map(i => {
let idx = embed1.position(j => j == i)
let ch = if idx != none { r1.at(idx) } else { sym.dot.c }
let col = if idx != none { blue } else { luma(200) }
box(width: 0.55cm, height: 0.55cm,
align(center + horizon, text(9pt, fill: col, weight: if idx != none { "bold" } else { "regular" }, ch)))
}),
),
// Row 3: embedding of r2
stack(dir: ltr, spacing: 0pt,
box(width: 1.2cm, height: 0.5cm, align(center + horizon, text(8pt, fill: teal)[$r_2 =$])),
..range(w.len()).map(i => {
let idx = embed2.position(j => j == i)
let ch = if idx != none { r2.at(idx) } else { sym.dot.c }
let col = if idx != none { teal } else { luma(200) }
box(width: 0.55cm, height: 0.55cm,
align(center + horizon, text(9pt, fill: col, weight: if idx != none { "bold" } else { "regular" }, ch)))
}),
),
))
},
caption: [Shortest Common Supersequence: $w = "babc"$ (length 4) contains $r_1 = "abc"$ (blue, positions 1,2,3) and $r_2 = "bac"$ (teal, positions 0,1,3) as subsequences. Dots mark unused positions in each embedding.],
) <fig:scs>

The supersequence $w = "babc"$ has length 4 and contains both input strings as subsequences. This is optimal because $"LCS"("abc", "bac") = "ac"$ (length 2), so the shortest common supersequence has length $3 + 3 - 2 = 4$.
]

#problem-def("MinimumFeedbackArcSet")[
Given a directed graph $G = (V, A)$, find a minimum-size subset $A' subset.eq A$ such that $G - A'$ is a directed acyclic graph (DAG). Equivalently, $A'$ must contain at least one arc from every directed cycle in $G$.
][
Expand Down
11 changes: 11 additions & 0 deletions docs/paper/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,17 @@ @article{cygan2014
doi = {10.1137/140990255}
}

@article{raiha1981,
author = {Kari-Jouko R{\"a}ih{\"a} and Esko Ukkonen},
title = {The Shortest Common Supersequence Problem over Binary Alphabet is {NP}-Complete},
journal = {Theoretical Computer Science},
volume = {16},
number = {2},
pages = {187--198},
year = {1981},
doi = {10.1016/0304-3975(81)90075-X}
}

@article{bodlaender2012,
author = {Hans L. Bodlaender and Fedor V. Fomin and Arie M. C. A. Koster and Dieter Kratsch and Dimitrios M. Thilikos},
title = {A Note on Exact Algorithms for Vertex Ordering Problems on Graphs},
Expand Down
21 changes: 21 additions & 0 deletions docs/src/reductions/problem_schemas.json
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,27 @@
}
]
},
{
"name": "ShortestCommonSupersequence",
"description": "Find a common supersequence of bounded length for a set of strings",
"fields": [
{
"name": "alphabet_size",
"type_name": "usize",
"description": "Size of the alphabet"
},
{
"name": "strings",
"type_name": "Vec<Vec<usize>>",
"description": "Input strings over the alphabet {0, ..., alphabet_size-1}"
},
{
"name": "bound",
"type_name": "usize",
"description": "Bound on supersequence length (configuration has exactly this many symbols)"
}
]
},
{
"name": "SpinGlass",
"description": "Minimize Ising Hamiltonian on a graph",
Expand Down
10 changes: 7 additions & 3 deletions problemreductions-cli/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ Flags by problem type:
LCS --strings
FAS --arcs [--weights] [--num-vertices]
FVS --arcs [--weights] [--num-vertices]
SCS --strings, --bound [--alphabet-size]
ILP, CircuitSAT (via reduction only)

Geometry graph variants (use slash notation, e.g., MIS/KingsSubgraph):
Expand Down Expand Up @@ -338,18 +339,21 @@ pub struct CreateArgs {
/// Required edge indices for RuralPostman (comma-separated, e.g., "0,2,4")
#[arg(long)]
pub required_edges: Option<String>,
/// Upper bound B for RuralPostman
/// Upper bound (for RuralPostman or SCS)
#[arg(long)]
pub bound: Option<i32>,
pub bound: Option<i64>,
/// Pattern graph edge list for SubgraphIsomorphism (e.g., 0-1,1-2,2-0)
#[arg(long)]
pub pattern: Option<String>,
/// Input strings for LCS (semicolon-separated, e.g., "ABAC;BACA")
/// Input strings for LCS (e.g., "ABAC;BACA") or SCS (e.g., "0,1,2;1,2,0")
#[arg(long)]
pub strings: Option<String>,
/// Directed arcs for directed graph problems (e.g., 0>1,1>2,2>0)
#[arg(long)]
pub arcs: Option<String>,
/// Alphabet size for SCS (optional; inferred from max symbol + 1 if omitted)
#[arg(long)]
pub alphabet_size: Option<usize>,
}

#[derive(clap::Args)]
Expand Down
59 changes: 57 additions & 2 deletions problemreductions-cli/src/commands/create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ use crate::util;
use anyhow::{bail, Context, Result};
use problemreductions::models::algebraic::{ClosestVectorProblem, BMF};
use problemreductions::models::graph::{GraphPartitioning, HamiltonianPath};
use problemreductions::models::misc::{BinPacking, LongestCommonSubsequence, PaintShop, SubsetSum};
use problemreductions::models::misc::{
BinPacking, LongestCommonSubsequence, PaintShop, ShortestCommonSupersequence, SubsetSum,
};
use problemreductions::prelude::*;
use problemreductions::registry::collect_schemas;
use problemreductions::topology::{
Expand Down Expand Up @@ -52,6 +54,7 @@ fn all_data_flags_empty(args: &CreateArgs) -> bool {
&& args.pattern.is_none()
&& args.strings.is_none()
&& args.arcs.is_none()
&& args.alphabet_size.is_none()
}

fn type_format_hint(type_name: &str, graph_type: Option<&str>) -> &'static str {
Expand Down Expand Up @@ -103,6 +106,7 @@ fn example_for(canonical: &str, graph_type: Option<&str>) -> &'static str {
}
"SubgraphIsomorphism" => "--graph 0-1,1-2,2-0 --pattern 0-1",
"SubsetSum" => "--sizes 3,7,1,8,2,4 --target 11",
"ShortestCommonSupersequence" => "--strings \"0,1,2;1,2,0\" --bound 4",
_ => "",
}
}
Expand Down Expand Up @@ -280,7 +284,7 @@ pub fn create(args: &CreateArgs, out: &OutputConfig) -> Result<()> {
"RuralPostman requires --bound\n\n\
Usage: pred create RuralPostman --graph 0-1,1-2,2-3 --edge-weights 1,1,1 --required-edges 0,2 --bound 6"
)
})?;
})? as i32;
(
ser(RuralPostman::new(
graph,
Expand Down Expand Up @@ -667,6 +671,57 @@ pub fn create(args: &CreateArgs, out: &OutputConfig) -> Result<()> {
)
}

// ShortestCommonSupersequence
"ShortestCommonSupersequence" => {
let usage = "Usage: pred create SCS --strings \"0,1,2;1,2,0\" --bound 4";
let strings_str = args.strings.as_deref().ok_or_else(|| {
anyhow::anyhow!("ShortestCommonSupersequence requires --strings\n\n{usage}")
})?;
let bound = args.bound.ok_or_else(|| {
anyhow::anyhow!("ShortestCommonSupersequence requires --bound\n\n{usage}")
})? as usize;
let strings: Vec<Vec<usize>> = strings_str
.split(';')
.map(|s| {
let trimmed = s.trim();
if trimmed.is_empty() {
return Ok(Vec::new());
}
trimmed
.split(',')
.map(|v| {
v.trim()
.parse::<usize>()
.map_err(|e| anyhow::anyhow!("Invalid alphabet index: {}", e))
})
.collect::<Result<Vec<_>>>()
})
.collect::<Result<Vec<_>>>()?;
Comment on lines +683 to +699
let inferred = strings
.iter()
.flat_map(|s| s.iter())
.copied()
.max()
.map(|m| m + 1)
.unwrap_or(0);
let alphabet_size = args.alphabet_size.unwrap_or(inferred);
if alphabet_size < inferred {
anyhow::bail!(
"--alphabet-size {} is smaller than the largest symbol + 1 ({}) in the strings",
alphabet_size,
inferred
);
}
(
ser(ShortestCommonSupersequence::new(
alphabet_size,
strings,
bound,
))?,
resolved_variant.clone(),
)
}

// MinimumFeedbackVertexSet
"MinimumFeedbackVertexSet" => {
let arcs_str = args.arcs.as_deref().ok_or_else(|| {
Expand Down
6 changes: 5 additions & 1 deletion problemreductions-cli/src/dispatch.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
use anyhow::{bail, Context, Result};
use problemreductions::models::algebraic::{ClosestVectorProblem, ILP};
use problemreductions::models::misc::{BinPacking, Knapsack, LongestCommonSubsequence, SubsetSum};
use problemreductions::models::misc::{
BinPacking, Knapsack, LongestCommonSubsequence, ShortestCommonSupersequence, SubsetSum,
};
use problemreductions::prelude::*;
use problemreductions::rules::{MinimizeSteps, ReductionGraph};
use problemreductions::solvers::{BruteForce, ILPSolver, Solver};
Expand Down Expand Up @@ -254,6 +256,7 @@ pub fn load_problem(
"LongestCommonSubsequence" => deser_opt::<LongestCommonSubsequence>(data),
"MinimumFeedbackVertexSet" => deser_opt::<MinimumFeedbackVertexSet<i32>>(data),
"SubsetSum" => deser_sat::<SubsetSum>(data),
"ShortestCommonSupersequence" => deser_sat::<ShortestCommonSupersequence>(data),
"MinimumFeedbackArcSet" => deser_opt::<MinimumFeedbackArcSet<i32>>(data),
_ => bail!("{}", crate::problem_name::unknown_problem_error(&canonical)),
}
Expand Down Expand Up @@ -324,6 +327,7 @@ pub fn serialize_any_problem(
"LongestCommonSubsequence" => try_ser::<LongestCommonSubsequence>(any),
"MinimumFeedbackVertexSet" => try_ser::<MinimumFeedbackVertexSet<i32>>(any),
"SubsetSum" => try_ser::<SubsetSum>(any),
"ShortestCommonSupersequence" => try_ser::<ShortestCommonSupersequence>(any),
"MinimumFeedbackArcSet" => try_ser::<MinimumFeedbackArcSet<i32>>(any),
_ => bail!("{}", crate::problem_name::unknown_problem_error(&canonical)),
}
Expand Down
2 changes: 2 additions & 0 deletions problemreductions-cli/src/problem_name.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ pub const ALIASES: &[(&str, &str)] = &[
("LCS", "LongestCommonSubsequence"),
("MaxMatching", "MaximumMatching"),
("FVS", "MinimumFeedbackVertexSet"),
("SCS", "ShortestCommonSupersequence"),
("FAS", "MinimumFeedbackArcSet"),
("pmedian", "MinimumSumMulticenter"),
];
Expand Down Expand Up @@ -66,6 +67,7 @@ pub fn resolve_alias(input: &str) -> String {
"fas" | "minimumfeedbackarcset" => "MinimumFeedbackArcSet".to_string(),
"minimumsummulticenter" | "pmedian" => "MinimumSumMulticenter".to_string(),
"subsetsum" => "SubsetSum".to_string(),
"scs" | "shortestcommonsupersequence" => "ShortestCommonSupersequence".to_string(),
"hamiltonianpath" => "HamiltonianPath".to_string(),
_ => input.to_string(), // pass-through for exact names
}
Expand Down
7 changes: 4 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,12 @@ pub mod prelude {
pub use crate::models::graph::{
KColoring, MaxCut, MaximalIS, MaximumClique, MaximumIndependentSet, MaximumMatching,
MinimumDominatingSet, MinimumFeedbackArcSet, MinimumFeedbackVertexSet,
MinimumSumMulticenter, MinimumVertexCover,
PartitionIntoTriangles, RuralPostman, TravelingSalesman,
MinimumSumMulticenter, MinimumVertexCover, PartitionIntoTriangles,
RuralPostman, TravelingSalesman,
};
pub use crate::models::misc::{
BinPacking, Factoring, Knapsack, LongestCommonSubsequence, PaintShop, SubsetSum,
BinPacking, Factoring, Knapsack, LongestCommonSubsequence, PaintShop,
ShortestCommonSupersequence, SubsetSum,
};
pub use crate::models::set::{MaximumSetPacking, MinimumSetCovering};

Expand Down
3 changes: 3 additions & 0 deletions src/models/misc/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,21 @@
//! - [`Knapsack`]: 0-1 Knapsack (maximize value subject to weight capacity)
//! - [`LongestCommonSubsequence`]: Longest Common Subsequence
//! - [`PaintShop`]: Minimize color switches in paint shop scheduling
//! - [`ShortestCommonSupersequence`]: Find a common supersequence of bounded length
//! - [`SubsetSum`]: Find a subset summing to exactly a target value

mod bin_packing;
pub(crate) mod factoring;
mod knapsack;
mod longest_common_subsequence;
pub(crate) mod paintshop;
pub(crate) mod shortest_common_supersequence;
mod subset_sum;

pub use bin_packing::BinPacking;
pub use factoring::Factoring;
pub use knapsack::Knapsack;
pub use longest_common_subsequence::LongestCommonSubsequence;
pub use paintshop::PaintShop;
pub use shortest_common_supersequence::ShortestCommonSupersequence;
pub use subset_sum::SubsetSum;
Loading
Loading