Skip to content

[Model] ConjunctiveQueryFoldability #448

@isPANN

Description

@isPANN

Motivation

CONJUNCTIVE QUERY FOLDABILITY (P178) from Garey & Johnson, A4 SR30. A classical NP-complete problem in database theory. Conjunctive query foldability (also called conjunctive query containment) asks whether one conjunctive query can be "folded" into another by substituting undistinguished variables, which is equivalent to the existence of a homomorphism between the queries. This problem is fundamental to query optimization in relational databases.

Associated reduction rules:

  • R124: Graph 3-Colorability -> Conjunctive Query Foldability (this problem is the target)

Definition

Name: ConjunctiveQueryFoldability
Canonical name: Conjunctive Query Foldability (also: Conjunctive Query Containment, CQ Containment)
Reference: Garey & Johnson, Computers and Intractability, A4 SR30

Background: A conjunctive query is a pattern-matching query over relations — like a SQL SELECT ... FROM ... WHERE ... with only AND conditions. Each query uses three kinds of terms:

  • Domain constants D: fixed values (e.g., 0, 1, 2)
  • Distinguished variables X: the query's output columns — must stay unchanged
  • Undistinguished variables Y: internal/temporary variables (existentially quantified) — can be substituted

Foldability asks: can Q1 be transformed into Q2 by replacing each undistinguished variable with some other term? The substitution σ maps each undistinguished variable y ∈ Y to one of:

  • Another undistinguished variable (merge two temp vars into one)
  • A distinguished variable (fix a temp var to an output column)
  • A domain constant (fix a temp var to a concrete value)

If such σ exists, Q1 is "redundant" relative to Q2 and can be simplified.

Mathematical definition:

INSTANCE: A finite domain D (set of constants), relations R = {R_1, ..., R_m} (each R_i has a fixed arity d_i), distinguished variables X, undistinguished variables Y, and two queries Q_1 and Q_2. Each query Q is a conjunction of atoms A_1 ∧ A_2 ∧ ... ∧ A_r, where each atom has the form R_j(u_1, ..., u_{d_j}) with each argument u ∈ D ∪ X ∪ Y.

QUESTION: Is there a substitution σ: Y → X ∪ Y ∪ D such that applying σ to every undistinguished variable in Q_1 produces Q_2 (as a set of atoms)?

Variables

  • Count: num_undistinguished variables, one per undistinguished variable in Y.
  • Per-variable domain: Each can map to any of num_distinguished + num_undistinguished + domain_size targets (distinguished vars, undistinguished vars, or domain constants).
  • dims(): vec![num_distinguished + num_undistinguished + domain_size; num_undistinguished]
  • evaluate(): Decode σ from the configuration. Apply σ to every undistinguished variable occurrence in Q1's conjuncts. Return true iff the resulting set of atoms equals Q2's set of atoms.

Schema (data type)

Type name: ConjunctiveQueryFoldability
Variants: none (the query structure is stored directly)

Field Type Description
domain_size usize Size of the domain D (constants indexed 0..domain_size)
num_distinguished usize Number of distinguished variables in X
num_undistinguished usize Number of undistinguished variables in Y
relation_arities Vec<usize> Arity d_i of each relation R_i
query1_conjuncts Vec<(usize, Vec<Term>)> Atoms of Q1: each is (relation_index, arguments)
query2_conjuncts Vec<(usize, Vec<Term>)> Atoms of Q2: same format

where Term is an enum:

enum Term {
    Constant(usize),       // domain constant D[i]
    Distinguished(usize),  // distinguished variable X[i]
    Undistinguished(usize), // undistinguished variable Y[i]
}

Size fields (getter methods for overhead expressions and declare_variants!):

  • domain_size() — returns domain_size
  • num_distinguished() — returns num_distinguished
  • num_undistinguished() — returns num_undistinguished
  • num_conjuncts_q1() — returns query1_conjuncts.len()
  • num_conjuncts_q2() — returns query2_conjuncts.len()

Complexity

  • Decision complexity: NP-complete (Chandra and Merlin, 1977; transformation from GRAPH 3-COLORABILITY).
  • Best known exact algorithm: Brute-force: enumerate all (num_distinguished + num_undistinguished + domain_size)^num_undistinguished substitutions, check each in O(num_conjuncts_q1) time.
  • declare_variants! complexity string: "(num_distinguished + num_undistinguished + domain_size)^num_undistinguished * num_conjuncts_q1"
  • Parameterized: For acyclic conjunctive queries (those with hypertree-width 1), containment is in LOGCFL (and hence in polynomial time). The problem is also polynomial when the number of atoms in Q_2 is bounded.
  • References:
    • [Chandra and Merlin, 1977] A. K. Chandra and P. M. Merlin, "Optimal Implementation of Conjunctive Queries in Relational Data Bases", Proc. 9th STOC, pp. 77-90, 1977.
    • [Kolaitis, 2007] P. G. Kolaitis, "Complexity of Conjunctive Query Containment" (survey), various lecture notes.

Specialization

  • Related problems: Conjunctive Query Equivalence (bidirectional containment), Conjunctive Boolean Query (evaluation), Graph Homomorphism.
  • Special cases: When Q_2 is the query for K_3 (the complete graph on 3 vertices), foldability reduces to graph 3-colorability. When both queries have the same structure, it reduces to graph isomorphism.
  • Generalization: The problem generalizes to unions of conjunctive queries, where containment is Pi_2^P-complete.

Extra Remark

Full book text:

INSTANCE: Finite domain set D, a collection R = {R1,R2,...,Rm} of relations, where each Ri consists of a set of di-tuples with entries from D, a set X of distinguished variables, a set Y of undistinguished variables, and two "queries" Q1 and Q2 over X,Y,D, and R, where a query Q has the form
(x1,x2,...,xk)(exists y1,y2,...,yl)(A1 ∧ A2 ∧ ... ∧ Ar)
for some k,l, and r, with X' = {x1,x2,...,xk} ⊆ X, Y' = {y1,y2,...,yl} ⊆ Y, and each Ai of the form Rj(u1,u2,...,udj) with each u in D ∪ X' ∪ Y' (see reference for interpretation of such expressions in terms of data bases).
QUESTION: Is there a function sigma: Y -> X ∪ Y ∪ D such that, if for each y in Y the symbol sigma(y) is substituted for every occurrence of y in Q1, then the result is query Q2?
Reference: [Chandra and Merlin, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: The isomorphism problem for conjunctive queries (with two queries being isomorphic if they are the same up to one-to-one renaming of the variables, reordering of conjuncts, and reordering within quantifications) is polynomially equivalent to graph isomorphism.

How to solve

  • It can be solved by (existing) bruteforce. Enumerate all functions sigma: Y -> X ∪ Y ∪ D; for each sigma, apply the substitution to Q_1 and check if the result equals Q_2 (up to reordering of conjuncts).
  • It can be solved by reducing to integer programming.
  • Other: (none identified)

Example Instance

Single binary relation R (index 0, arity 2). We use shorthand: D(i) = Constant(i), X(i) = Distinguished(i), U(i) = Undistinguished(i).

Instance 1 (YES — foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))

Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x) ∧ R(u, u)
— "triangle (x, u, v) plus self-loop on u"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)]), (0, [U(0), U(0)])]

Q2(x): R(x, a) ∧ R(a, a) ∧ R(a, x)
— "self-loop on a, with edges to/from x"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), U(2)]), (0, [U(2), X(0)])]

Substitution σ: U(0) → U(2), U(1) → U(2) (merge u and v into a):

  • R(X(0), U(0)) → R(X(0), U(2)) = R(x, a) ∈ Q2 ✓
  • R(U(0), U(1)) → R(U(2), U(2)) = R(a, a) ∈ Q2 ✓
  • R(U(1), X(0)) → R(U(2), X(0)) = R(a, x) ∈ Q2 ✓
  • R(U(0), U(0)) → R(U(2), U(2)) = R(a, a) ∈ Q2 ✓ (duplicate, absorbed)

σ(Q1) = {R(x,a), R(a,a), R(a,x)} = Q2. Answer: YES.

This is the unique valid σ — merging two variables of a triangle into one requires the self-loop atom R(u,u) to justify the resulting R(a,a).

Instance 2 (NO — not foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))

Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x)
— "triangle (x, u, v), no self-loop"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)])]

Q2(x): R(x, a) ∧ R(a, x)
— "2-cycle between x and a"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), X(0)])]

No valid σ exists. For any mapping of u, v to {x, a}:

  • σ(u)=a, σ(v)=a: σ(Q1) = {R(x,a), R(a,a), R(a,x)} — contains R(a,a) which is not in Q2.
  • σ(u)=a, σ(v)=x: σ(Q1) = {R(x,a), R(a,x), R(x,x)} — contains R(x,x) not in Q2.
  • σ(u)=x, σ(v)=a: σ(Q1) = {R(x,x), R(x,a), R(a,x)} — contains R(x,x) not in Q2.
  • σ(u)=x, σ(v)=x: σ(Q1) = {R(x,x)} — missing R(x,a) and R(a,x).

A triangle has 3 atoms but the 2-cycle has only 2, and any variable merging in the triangle produces a self-loop atom that the 2-cycle lacks. Answer: NO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions