Motivation
CONJUNCTIVE QUERY FOLDABILITY (P178) from Garey & Johnson, A4 SR30. A classical NP-complete problem in database theory. Conjunctive query foldability (also called conjunctive query containment) asks whether one conjunctive query can be "folded" into another by substituting undistinguished variables, which is equivalent to the existence of a homomorphism between the queries. This problem is fundamental to query optimization in relational databases.
Associated reduction rules:
- R124: Graph 3-Colorability -> Conjunctive Query Foldability (this problem is the target)
Definition
Name: ConjunctiveQueryFoldability
Canonical name: Conjunctive Query Foldability (also: Conjunctive Query Containment, CQ Containment)
Reference: Garey & Johnson, Computers and Intractability, A4 SR30
Background: A conjunctive query is a pattern-matching query over relations — like a SQL SELECT ... FROM ... WHERE ... with only AND conditions. Each query uses three kinds of terms:
- Domain constants D: fixed values (e.g., 0, 1, 2)
- Distinguished variables X: the query's output columns — must stay unchanged
- Undistinguished variables Y: internal/temporary variables (existentially quantified) — can be substituted
Foldability asks: can Q1 be transformed into Q2 by replacing each undistinguished variable with some other term? The substitution σ maps each undistinguished variable y ∈ Y to one of:
- Another undistinguished variable (merge two temp vars into one)
- A distinguished variable (fix a temp var to an output column)
- A domain constant (fix a temp var to a concrete value)
If such σ exists, Q1 is "redundant" relative to Q2 and can be simplified.
Mathematical definition:
INSTANCE: A finite domain D (set of constants), relations R = {R_1, ..., R_m} (each R_i has a fixed arity d_i), distinguished variables X, undistinguished variables Y, and two queries Q_1 and Q_2. Each query Q is a conjunction of atoms A_1 ∧ A_2 ∧ ... ∧ A_r, where each atom has the form R_j(u_1, ..., u_{d_j}) with each argument u ∈ D ∪ X ∪ Y.
QUESTION: Is there a substitution σ: Y → X ∪ Y ∪ D such that applying σ to every undistinguished variable in Q_1 produces Q_2 (as a set of atoms)?
Variables
- Count:
num_undistinguished variables, one per undistinguished variable in Y.
- Per-variable domain: Each can map to any of
num_distinguished + num_undistinguished + domain_size targets (distinguished vars, undistinguished vars, or domain constants).
dims(): vec![num_distinguished + num_undistinguished + domain_size; num_undistinguished]
evaluate(): Decode σ from the configuration. Apply σ to every undistinguished variable occurrence in Q1's conjuncts. Return true iff the resulting set of atoms equals Q2's set of atoms.
Schema (data type)
Type name: ConjunctiveQueryFoldability
Variants: none (the query structure is stored directly)
| Field |
Type |
Description |
domain_size |
usize |
Size of the domain D (constants indexed 0..domain_size) |
num_distinguished |
usize |
Number of distinguished variables in X |
num_undistinguished |
usize |
Number of undistinguished variables in Y |
relation_arities |
Vec<usize> |
Arity d_i of each relation R_i |
query1_conjuncts |
Vec<(usize, Vec<Term>)> |
Atoms of Q1: each is (relation_index, arguments) |
query2_conjuncts |
Vec<(usize, Vec<Term>)> |
Atoms of Q2: same format |
where Term is an enum:
enum Term {
Constant(usize), // domain constant D[i]
Distinguished(usize), // distinguished variable X[i]
Undistinguished(usize), // undistinguished variable Y[i]
}
Size fields (getter methods for overhead expressions and declare_variants!):
domain_size() — returns domain_size
num_distinguished() — returns num_distinguished
num_undistinguished() — returns num_undistinguished
num_conjuncts_q1() — returns query1_conjuncts.len()
num_conjuncts_q2() — returns query2_conjuncts.len()
Complexity
- Decision complexity: NP-complete (Chandra and Merlin, 1977; transformation from GRAPH 3-COLORABILITY).
- Best known exact algorithm: Brute-force: enumerate all (num_distinguished + num_undistinguished + domain_size)^num_undistinguished substitutions, check each in O(num_conjuncts_q1) time.
declare_variants! complexity string: "(num_distinguished + num_undistinguished + domain_size)^num_undistinguished * num_conjuncts_q1"
- Parameterized: For acyclic conjunctive queries (those with hypertree-width 1), containment is in LOGCFL (and hence in polynomial time). The problem is also polynomial when the number of atoms in Q_2 is bounded.
- References:
- [Chandra and Merlin, 1977] A. K. Chandra and P. M. Merlin, "Optimal Implementation of Conjunctive Queries in Relational Data Bases", Proc. 9th STOC, pp. 77-90, 1977.
- [Kolaitis, 2007] P. G. Kolaitis, "Complexity of Conjunctive Query Containment" (survey), various lecture notes.
Specialization
- Related problems: Conjunctive Query Equivalence (bidirectional containment), Conjunctive Boolean Query (evaluation), Graph Homomorphism.
- Special cases: When Q_2 is the query for K_3 (the complete graph on 3 vertices), foldability reduces to graph 3-colorability. When both queries have the same structure, it reduces to graph isomorphism.
- Generalization: The problem generalizes to unions of conjunctive queries, where containment is Pi_2^P-complete.
Extra Remark
Full book text:
INSTANCE: Finite domain set D, a collection R = {R1,R2,...,Rm} of relations, where each Ri consists of a set of di-tuples with entries from D, a set X of distinguished variables, a set Y of undistinguished variables, and two "queries" Q1 and Q2 over X,Y,D, and R, where a query Q has the form
(x1,x2,...,xk)(exists y1,y2,...,yl)(A1 ∧ A2 ∧ ... ∧ Ar)
for some k,l, and r, with X' = {x1,x2,...,xk} ⊆ X, Y' = {y1,y2,...,yl} ⊆ Y, and each Ai of the form Rj(u1,u2,...,udj) with each u in D ∪ X' ∪ Y' (see reference for interpretation of such expressions in terms of data bases).
QUESTION: Is there a function sigma: Y -> X ∪ Y ∪ D such that, if for each y in Y the symbol sigma(y) is substituted for every occurrence of y in Q1, then the result is query Q2?
Reference: [Chandra and Merlin, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: The isomorphism problem for conjunctive queries (with two queries being isomorphic if they are the same up to one-to-one renaming of the variables, reordering of conjuncts, and reordering within quantifications) is polynomially equivalent to graph isomorphism.
How to solve
Example Instance
Single binary relation R (index 0, arity 2). We use shorthand: D(i) = Constant(i), X(i) = Distinguished(i), U(i) = Undistinguished(i).
Instance 1 (YES — foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))
Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x) ∧ R(u, u)
— "triangle (x, u, v) plus self-loop on u"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)]), (0, [U(0), U(0)])]
Q2(x): R(x, a) ∧ R(a, a) ∧ R(a, x)
— "self-loop on a, with edges to/from x"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), U(2)]), (0, [U(2), X(0)])]
Substitution σ: U(0) → U(2), U(1) → U(2) (merge u and v into a):
- R(X(0), U(0)) → R(X(0), U(2)) = R(x, a) ∈ Q2 ✓
- R(U(0), U(1)) → R(U(2), U(2)) = R(a, a) ∈ Q2 ✓
- R(U(1), X(0)) → R(U(2), X(0)) = R(a, x) ∈ Q2 ✓
- R(U(0), U(0)) → R(U(2), U(2)) = R(a, a) ∈ Q2 ✓ (duplicate, absorbed)
σ(Q1) = {R(x,a), R(a,a), R(a,x)} = Q2. Answer: YES.
This is the unique valid σ — merging two variables of a triangle into one requires the self-loop atom R(u,u) to justify the resulting R(a,a).
Instance 2 (NO — not foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))
Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x)
— "triangle (x, u, v), no self-loop"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)])]
Q2(x): R(x, a) ∧ R(a, x)
— "2-cycle between x and a"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), X(0)])]
No valid σ exists. For any mapping of u, v to {x, a}:
- σ(u)=a, σ(v)=a: σ(Q1) = {R(x,a), R(a,a), R(a,x)} — contains R(a,a) which is not in Q2.
- σ(u)=a, σ(v)=x: σ(Q1) = {R(x,a), R(a,x), R(x,x)} — contains R(x,x) not in Q2.
- σ(u)=x, σ(v)=a: σ(Q1) = {R(x,x), R(x,a), R(a,x)} — contains R(x,x) not in Q2.
- σ(u)=x, σ(v)=x: σ(Q1) = {R(x,x)} — missing R(x,a) and R(a,x).
A triangle has 3 atoms but the 2-cycle has only 2, and any variable merging in the triangle produces a self-loop atom that the 2-cycle lacks. Answer: NO.
Motivation
CONJUNCTIVE QUERY FOLDABILITY (P178) from Garey & Johnson, A4 SR30. A classical NP-complete problem in database theory. Conjunctive query foldability (also called conjunctive query containment) asks whether one conjunctive query can be "folded" into another by substituting undistinguished variables, which is equivalent to the existence of a homomorphism between the queries. This problem is fundamental to query optimization in relational databases.
Associated reduction rules:
Definition
Name:
ConjunctiveQueryFoldabilityCanonical name: Conjunctive Query Foldability (also: Conjunctive Query Containment, CQ Containment)
Reference: Garey & Johnson, Computers and Intractability, A4 SR30
Background: A conjunctive query is a pattern-matching query over relations — like a SQL
SELECT ... FROM ... WHERE ...with onlyANDconditions. Each query uses three kinds of terms:Foldability asks: can Q1 be transformed into Q2 by replacing each undistinguished variable with some other term? The substitution σ maps each undistinguished variable y ∈ Y to one of:
If such σ exists, Q1 is "redundant" relative to Q2 and can be simplified.
Mathematical definition:
INSTANCE: A finite domain D (set of constants), relations R = {R_1, ..., R_m} (each R_i has a fixed arity d_i), distinguished variables X, undistinguished variables Y, and two queries Q_1 and Q_2. Each query Q is a conjunction of atoms A_1 ∧ A_2 ∧ ... ∧ A_r, where each atom has the form R_j(u_1, ..., u_{d_j}) with each argument u ∈ D ∪ X ∪ Y.
QUESTION: Is there a substitution σ: Y → X ∪ Y ∪ D such that applying σ to every undistinguished variable in Q_1 produces Q_2 (as a set of atoms)?
Variables
num_undistinguishedvariables, one per undistinguished variable in Y.num_distinguished + num_undistinguished + domain_sizetargets (distinguished vars, undistinguished vars, or domain constants).dims():vec![num_distinguished + num_undistinguished + domain_size; num_undistinguished]evaluate(): Decode σ from the configuration. Apply σ to every undistinguished variable occurrence in Q1's conjuncts. Returntrueiff the resulting set of atoms equals Q2's set of atoms.Schema (data type)
Type name:
ConjunctiveQueryFoldabilityVariants: none (the query structure is stored directly)
domain_sizeusizenum_distinguishedusizenum_undistinguishedusizerelation_aritiesVec<usize>query1_conjunctsVec<(usize, Vec<Term>)>query2_conjunctsVec<(usize, Vec<Term>)>where
Termis an enum:Size fields (getter methods for overhead expressions and
declare_variants!):domain_size()— returnsdomain_sizenum_distinguished()— returnsnum_distinguishednum_undistinguished()— returnsnum_undistinguishednum_conjuncts_q1()— returnsquery1_conjuncts.len()num_conjuncts_q2()— returnsquery2_conjuncts.len()Complexity
declare_variants!complexity string:"(num_distinguished + num_undistinguished + domain_size)^num_undistinguished * num_conjuncts_q1"Specialization
Extra Remark
Full book text:
INSTANCE: Finite domain set D, a collection R = {R1,R2,...,Rm} of relations, where each Ri consists of a set of di-tuples with entries from D, a set X of distinguished variables, a set Y of undistinguished variables, and two "queries" Q1 and Q2 over X,Y,D, and R, where a query Q has the form
(x1,x2,...,xk)(exists y1,y2,...,yl)(A1 ∧ A2 ∧ ... ∧ Ar)
for some k,l, and r, with X' = {x1,x2,...,xk} ⊆ X, Y' = {y1,y2,...,yl} ⊆ Y, and each Ai of the form Rj(u1,u2,...,udj) with each u in D ∪ X' ∪ Y' (see reference for interpretation of such expressions in terms of data bases).
QUESTION: Is there a function sigma: Y -> X ∪ Y ∪ D such that, if for each y in Y the symbol sigma(y) is substituted for every occurrence of y in Q1, then the result is query Q2?
Reference: [Chandra and Merlin, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: The isomorphism problem for conjunctive queries (with two queries being isomorphic if they are the same up to one-to-one renaming of the variables, reordering of conjuncts, and reordering within quantifications) is polynomially equivalent to graph isomorphism.
How to solve
Example Instance
Single binary relation R (index 0, arity 2). We use shorthand:
D(i)= Constant(i),X(i)= Distinguished(i),U(i)= Undistinguished(i).Instance 1 (YES — foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))
Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x) ∧ R(u, u)
— "triangle (x, u, v) plus self-loop on u"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)]), (0, [U(0), U(0)])]
Q2(x): R(x, a) ∧ R(a, a) ∧ R(a, x)
— "self-loop on a, with edges to/from x"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), U(2)]), (0, [U(2), X(0)])]
Substitution σ: U(0) → U(2), U(1) → U(2) (merge u and v into a):
σ(Q1) = {R(x,a), R(a,a), R(a,x)} = Q2. Answer: YES.
This is the unique valid σ — merging two variables of a triangle into one requires the self-loop atom R(u,u) to justify the resulting R(a,a).
Instance 2 (NO — not foldable):
domain_size = 0, num_distinguished = 1 (x = X(0)), num_undistinguished = 3 (u = U(0), v = U(1), a = U(2))
Q1(x): R(x, u) ∧ R(u, v) ∧ R(v, x)
— "triangle (x, u, v), no self-loop"
— query1_conjuncts = [(0, [X(0), U(0)]), (0, [U(0), U(1)]), (0, [U(1), X(0)])]
Q2(x): R(x, a) ∧ R(a, x)
— "2-cycle between x and a"
— query2_conjuncts = [(0, [X(0), U(2)]), (0, [U(2), X(0)])]
No valid σ exists. For any mapping of u, v to {x, a}:
A triangle has 3 atoms but the 2-cycle has only 2, and any variable merging in the triangle produces a self-loop atom that the 2-cycle lacks. Answer: NO.