Skip to content

[Model] SetBasis #400

@isPANN

Description

@isPANN

Motivation

SET BASIS (P134) from Garey & Johnson, A3 SP7. An NP-complete problem in set/storage theory: given a collection C of subsets of a finite set S, find a minimum-size "basis" B of subsets such that every set in C can be expressed as a union of sets in B. This has applications in data compression, database schema design, and Boolean function minimization -- finding a compact representation of a family of sets. The problem is closely related to set cover but has a fundamentally different structure: instead of covering elements, we must reconstruct exact sets via unions.

Associated reduction rules:

  • As target: R74 (VERTEX COVER -> SET BASIS)

Definition

Name: SetBasis
Canonical name: Set Basis Problem; also: Minimum Test Set Basis, Minimum Set Basis
Reference: Garey & Johnson, Computers and Intractability, A3 SP7

Mathematical definition:

INSTANCE: Collection C of subsets of a finite set S, positive integer K ≤ |C|.
QUESTION: Is there a collection B of subsets of S with |B| = K such that, for each c ∈ C, there is a subcollection of B whose union is exactly c?

The optimization version asks: find the minimum K such that a basis B of size K exists.

Variables

  • Count: K × |S| binary variables. The K basis elements are each encoded as a binary vector of length |S|, indicating which elements of S are included in that basis set.
  • Per-variable domain: binary {0, 1} — whether element j ∈ S belongs to basis set i.
  • Meaning: The configuration is a K × |S| binary matrix where row i defines basis element b_i ⊆ S. The assignment is valid (returns true) if every set c ∈ C can be exactly reconstructed as a union of some subcollection of B = {b_1, ..., b_K}.
  • dims() spec: vec![2; k * universe_size] — K × |S| binary variables, where variables [i*|S| .. (i+1)*|S|) encode basis element b_i.

Schema (data type)

Type name: SetBasis
Variants: none (pure set-theoretic problem)

Field Type Description
universe_size usize Size of the ground set S (=
collection Vec<Vec<usize>> The collection C of target subsets of S (each represented as sorted element indices)
k usize Maximum allowed basis size K

Size fields (for overhead expressions): universe_size (= |S|), num_sets (= |C|), basis_size (= K)

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • Key getter methods: universe_size() (= |S|), num_sets() (= |C|), basis_size() (= K).
  • Basis elements are arbitrary subsets of S (not necessarily members of C).
  • Field naming follows MinimumSetCovering conventions (universe_size, not set_size).

Complexity

  • Decision complexity: NP-complete (Stockmeyer, 1975; transformation from VERTEX COVER). Remains NP-complete when all c ∈ C have |c| ≤ 3, but is trivial when all |c| ≤ 2.
  • Best known exact algorithm: No specialized exact exponential algorithm is known. The problem can be formulated as an ILP. For practical instances, constraint programming and SAT solvers are used.
  • Complexity string: "2^(basis_size * universe_size)" (brute force over K × |S| binary variables)

declare_variants! guidance:

crate::declare_variants! {
    SetBasis => "2^(basis_size * universe_size)",
}
  • References:
    • L. J. Stockmeyer (1975). "The Set Basis Problem is NP-Complete." IBM Research Report RC 5431, IBM Research Center, Yorktown Heights, NY.

Specialization

  • This is a special case of: General set representation / compression problems
  • Known special cases:
    • All sets in C have size ≤ 2: trivially solvable (each element is its own basis element)
    • All sets in C have size ≤ 3: still NP-complete
  • Related problems: SET COVER (basis elements must cover S, not reconstruct C), EXACT COVER (disjoint union), MINIMUM EQUIVALENT EXPRESSION

Extra Remark

Full book text:

INSTANCE: Collection C of subsets of a finite set S, positive integer K ≤ |C|.
QUESTION: Is there a collection B of subsets of S with |B| = K such that, for each c ∈ C, there is a subcollection of B whose union is exactly c?
Reference: [Stockmeyer, 1975]. Transformation from VERTEX COVER.
Comment: Remains NP-complete if all c ∈ C have |c| ≤ 3, but is trivial if all c ∈ C have |c| ≤ 2.

How to solve

  • It can be solved by (existing) bruteforce. Enumerate all possible collections of K subsets of S; for each collection B, check if every c ∈ C can be expressed as a union of a subcollection of B.
  • It can be solved by reducing to integer programming. Binary variable x_T for each candidate subset T ⊆ S; constrain sum(x_T) = K; for each c ∈ C, add constraints ensuring that c is expressible as the union of selected basis sets.
  • Other: SAT encoding; constraint programming.

Example Instance

Ground set S = {a, b, c, d} (|S| = 4), Collection C of 4 subsets, K = 3:

  • c_1 = {a, b}
  • c_2 = {b, c}
  • c_3 = {a, c}
  • c_4 = {a, b, c}

Question: Is there a basis B of 3 subsets of S such that each c_i is a union of sets in B?

Solution: B = { {a}, {b}, {c} } with |B| = 3.

  • c_1 = {a, b} = {a} ∪ {b} ✓
  • c_2 = {b, c} = {b} ∪ {c} ✓
  • c_3 = {a, c} = {a} ∪ {c} ✓
  • c_4 = {a, b, c} = {a} ∪ {b} ∪ {c} ✓

Answer for K = 3: YES.

Infeasibility for K = 2: Suppose B = {b_1, b_2}. Then c_1 = {a,b}, c_2 = {b,c}, and c_3 = {a,c} must each be either b_1, b_2, or b_1 ∪ b_2. Since b_1 ∪ b_2 can equal at most one of these three sets, the other two must each be b_1 or b_2 — but we only have two basis elements for three distinct sets. So K = 2 is infeasible.

Answer for K = 2: NO. Minimum basis size = 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions