Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions docs/paper/reductions.typ
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
"BoundedComponentSpanningForest": [Bounded Component Spanning Forest],
"BinPacking": [Bin Packing],
"BoyceCoddNormalFormViolation": [Boyce-Codd Normal Form Violation],
"ConsistencyOfDatabaseFrequencyTables": [Consistency of Database Frequency Tables],
"ClosestVectorProblem": [Closest Vector Problem],
"ConsecutiveSets": [Consecutive Sets],
"MinimumMultiwayCut": [Minimum Multiway Cut],
Expand Down Expand Up @@ -3201,6 +3202,77 @@ A classical NP-complete problem from Garey and Johnson @garey1979[Ch.~3, p.~76],
A relation satisfies _Boyce-Codd Normal Form_ (BCNF) if every non-trivial functional dependency $X arrow.r Y$ has $X$ as a superkey --- that is, $X^+$ = $A'$. This classical NP-complete problem from database theory asks whether the given attribute subset $A'$ violates BCNF. The NP-completeness was established by Beeri and Bernstein (1979) via reduction from Hitting Set. It appears as problem SR29 in Garey and Johnson's compendium (category A4: Storage and Retrieval).
]

#{
let x = load-model-example("ConsistencyOfDatabaseFrequencyTables")
let num_objects = x.instance.num_objects
let num_attrs = x.instance.attribute_domains.len()
let domains = x.instance.attribute_domains
let table01 = x.instance.frequency_tables.at(0).counts
let table12 = x.instance.frequency_tables.at(1).counts
let config = x.optimal_config
let value = (object, attr) => config.at(object * num_attrs + attr)
[
#problem-def("ConsistencyOfDatabaseFrequencyTables")[
Given a finite set $V$ of objects, a finite set $A$ of attributes, a domain $D_a$ for each $a in A$, a collection of pairwise frequency tables $f_(a,b): D_a times D_b -> ZZ^(>=0)$ whose entries sum to $|V|$, and a set $K subset.eq V times A times union_(a in A) D_a$ of known triples $(v, a, x)$, determine whether there exist functions $g_a: V -> D_a$ such that $g_a(v) = x$ for every $(v, a, x) in K$ and, for every published table $f_(a,b)$, exactly $f_(a,b)(x, y)$ objects satisfy $(g_a(v), g_b(v)) = (x, y)$.
][
Consistency of Database Frequency Tables is Garey and Johnson's storage-and-retrieval problem SR35 @garey1979. It asks whether released pairwise marginals can come from some hidden microdata table while respecting already known individual attribute values, making it a natural decision problem in statistical disclosure control. The direct witness space implemented in this crate assigns one categorical variable to each object-attribute pair, so exhaustive search runs in $O^*((product_(a in A) |D_a|)^(|V|))$. #footnote[This is the exact search bound induced by the implementation's configuration space; no faster general exact worst-case algorithm is claimed here.]

*Example.* Let $|V| = #num_objects$ with attributes $a_0, a_1, a_2$ having domain sizes $#domains.at(0)$, $#domains.at(1)$, and $#domains.at(2)$ respectively. Publish the pairwise tables

#align(center, table(
columns: 4,
align: center,
table.header([$f_(a_0, a_1)$], [$0$], [$1$], [$2$]),
[$0$], [#table01.at(0).at(0)], [#table01.at(0).at(1)], [#table01.at(0).at(2)],
[$1$], [#table01.at(1).at(0)], [#table01.at(1).at(1)], [#table01.at(1).at(2)],
))

and

#align(center, table(
columns: 3,
align: center,
table.header([$f_(a_1, a_2)$], [$0$], [$1$]),
[$0$], [#table12.at(0).at(0)], [#table12.at(0).at(1)],
[$1$], [#table12.at(1).at(0)], [#table12.at(1).at(1)],
[$2$], [#table12.at(2).at(0)], [#table12.at(2).at(1)],
))

together with the known values $K = {(v_0, a_0, 0), (v_3, a_0, 1), (v_1, a_2, 1)}$. One consistent completion is:

#align(center, table(
columns: 4,
align: center,
table.header([object], [$a_0$], [$a_1$], [$a_2$]),
[$v_0$], [#value(0, 0)], [#value(0, 1)], [#value(0, 2)],
[$v_1$], [#value(1, 0)], [#value(1, 1)], [#value(1, 2)],
[$v_2$], [#value(2, 0)], [#value(2, 1)], [#value(2, 2)],
[$v_3$], [#value(3, 0)], [#value(3, 1)], [#value(3, 2)],
[$v_4$], [#value(4, 0)], [#value(4, 1)], [#value(4, 2)],
[$v_5$], [#value(5, 0)], [#value(5, 1)], [#value(5, 2)],
))

This witness satisfies every published count: in $f_(a_0, a_1)$ each of the six cells appears exactly once, while in $f_(a_1, a_2)$ the five occupied cells have multiplicities $1, 1, 2, 1, 1$ exactly as listed above. It also respects all three known triples, so the answer is YES.
]
]
}

#reduction-rule("ConsistencyOfDatabaseFrequencyTables", "ILP")[
Each object-attribute pair is encoded by a one-hot binary vector over its domain, and each pairwise frequency count becomes a linear equality over McCormick auxiliary variables that linearize the product of two one-hot indicators. Known values are fixed by pinning the corresponding indicator to 1. The resulting ILP is a pure feasibility problem (trivial objective).
][
_Construction._ Let $V$ be the set of objects, $A$ the set of attributes with domains $D_a$, $cal(T)$ the set of published frequency tables, and $K$ the set of known triples $(v, a, x)$.

_Variables:_ (1) Binary one-hot indicators $y_(v,a,x) in {0, 1}$ for each object $v in V$, attribute $a in A$, and value $x in D_a$: $y_(v,a,x) = 1$ iff object $v$ takes value $x$ for attribute $a$. (2) Binary auxiliary variables $z_(t,v,x,x') in {0, 1}$ for each table $t in cal(T)$ (with attribute pair $(a, b)$), object $v in V$, and cell $(x, x') in D_a times D_b$: $z_(t,v,x,x') = 1$ iff object $v$ realizes cell $(x, x')$ in table $t$.

_Constraints:_ (1) One-hot: $sum_(x in D_a) y_(v,a,x) = 1$ for all $v in V$, $a in A$. (2) Known values: $y_(v,a,x) = 1$ for each $(v, a, x) in K$. (3) McCormick linearization for $z_(t,v,x,x') = y_(v,a,x) dot y_(v,b,x')$: $z_(t,v,x,x') lt.eq y_(v,a,x)$, $z_(t,v,x,x') lt.eq y_(v,b,x')$, $z_(t,v,x,x') gt.eq y_(v,a,x) + y_(v,b,x') - 1$. (4) Frequency counts: $sum_(v in V) z_(t,v,x,x') = f_t (x, x')$ for each table $t$ and cell $(x, x')$.

_Objective:_ Minimize $0$ (feasibility problem).

_Correctness._ ($arrow.r.double$) A consistent assignment defines one-hot indicators and their products; all constraints hold by construction, and the frequency equalities match the published counts. ($arrow.l.double$) Any feasible binary solution assigns exactly one value per object-attribute (one-hot), respects known values, and the McCormick constraints force $z_(t,v,x,x') = y_(v,a,x) dot y_(v,b,x')$ for binary variables, so the frequency equalities certify consistency.

_Solution extraction._ For each object $v$ and attribute $a$, find $x$ with $y_(v,a,x) = 1$; assign value $x$ to $(v, a)$.
]

#problem-def("SumOfSquaresPartition")[
Given a finite set $A = {a_0, dots, a_(n-1)}$ with sizes $s(a_i) in ZZ^+$, a positive integer $K lt.eq |A|$ (number of groups), and a positive integer $J$ (bound), determine whether $A$ can be partitioned into $K$ disjoint sets $A_1, dots, A_K$ such that $sum_(i=1)^K (sum_(a in A_i) s(a))^2 lt.eq J$.
][
Expand Down
14 changes: 14 additions & 0 deletions problemreductions-cli/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ Flags by problem type:
RuralPostman (RPP) --graph, --edge-weights, --required-edges, --bound
MultipleChoiceBranching --arcs [--weights] --partition --bound [--num-vertices]
AdditionalKey --num-attributes, --dependencies, --relation-attrs [--known-keys]
ConsistencyOfDatabaseFrequencyTables --num-objects, --attribute-domains, --frequency-tables [--known-values]
SubgraphIsomorphism --graph (host), --pattern (pattern)
LCS --strings, --bound [--alphabet-size]
FAS --arcs [--weights] [--num-vertices]
Expand Down Expand Up @@ -312,6 +313,7 @@ Examples:
pred create MIS/UnitDiskGraph --positions \"0,0;1,0;0.5,0.8\" --radius 1.5
pred create MIS --random --num-vertices 10 --edge-prob 0.3
pred create MultiprocessorScheduling --lengths 4,5,3,2,6 --num-processors 2 --deadline 10
pred create ConsistencyOfDatabaseFrequencyTables --num-objects 6 --attribute-domains \"2,3,2\" --frequency-tables \"0,1:1,1,1|1,1,1;1,2:1,1|0,2|1,1\" --known-values \"0,0,0;3,0,1;1,2,1\"
pred create BiconnectivityAugmentation --graph 0-1,1-2,2-3 --potential-edges 0-2:3,0-3:4,1-3:2 --budget 5
pred create FVS --arcs \"0>1,1>2,2>0\" --weights 1,1,1
pred create UndirectedTwoCommodityIntegralFlow --graph 0-2,1-2,2-3 --capacities 1,1,2 --source-1 0 --sink-1 3 --source-2 1 --sink-2 3 --requirement-1 1 --requirement-2 1
Expand Down Expand Up @@ -608,6 +610,18 @@ pub struct CreateArgs {
/// Known candidate keys for AdditionalKey (e.g., "0,1;2,3")
#[arg(long)]
pub known_keys: Option<String>,
/// Number of objects for ConsistencyOfDatabaseFrequencyTables
#[arg(long)]
pub num_objects: Option<usize>,
/// Attribute-domain sizes for ConsistencyOfDatabaseFrequencyTables (comma-separated, e.g., "2,3,2")
#[arg(long)]
pub attribute_domains: Option<String>,
/// Pairwise frequency tables for ConsistencyOfDatabaseFrequencyTables (e.g., "0,1:1,1|0,1;1,2:1,0|0,1")
#[arg(long)]
pub frequency_tables: Option<String>,
/// Known value triples for ConsistencyOfDatabaseFrequencyTables (e.g., "0,0,0;3,1,2")
#[arg(long)]
pub known_values: Option<String>,
/// Domain size for ConjunctiveBooleanQuery
#[arg(long)]
pub domain_size: Option<usize>,
Expand Down
Loading
Loading