Skip to content

[Model] MinimumCodeGenerationUnlimitedRegisters #902

@isPANN

Description

@isPANN

Motivation

MINIMUM CODE GENERATION WITH UNLIMITED REGISTERS (P297) from Garey & Johnson, A11 PO5. Given an expression DAG where arcs are partitioned into left (L) and right (R) operand sets, find the minimum number of instructions needed to compute all roots using unlimited registers but only 2-address instructions (where one operand is destroyed). The unlimited register supply means spilling is never needed, but the 2-address instruction constraint means one operand register is overwritten by the result, forcing careful operand ordering.

Associated reduction rules:

  • As target: R320 (FEEDBACK VERTEX SET →)
  • Connections (beyond G&J):
    • Complements MinimumCodeGenerationOneRegister (P296): that problem has 1 register with 3-address instructions; this has unlimited registers with 2-address instructions
    • The L/R partition determines which operand is destroyed by each operation

Definition

Name: MinimumCodeGenerationUnlimitedRegisters
Reference: Garey & Johnson, Computers and Intractability, A11 PO5

Mathematical definition:

INSTANCE: Directed acyclic graph G = (V, A) with maximum out-degree 2, a partition of A into two sets L (left operand arcs) and R (right operand arcs).
OBJECTIVE: Find a program of minimum number of instructions for an unlimited-register machine (using 2-address instructions) that computes all root vertices (vertices with in-degree 0).

Instruction semantics:

  • For a vertex v with arcs (v, u) ∈ L and (v, w) ∈ R: the operation computes v = op(u, w) where the result overwrites the register holding the left operand u. After the instruction, the register that held u now holds v, and the register holding w is unchanged.
  • A LOAD instruction copies a value from one register to another (needed when a value must be preserved because it would otherwise be destroyed).
  • With unlimited registers, there is no spilling — every value can be kept in a register. The cost comes from extra LOAD (copy) instructions needed to preserve values before they are destroyed.

Variables

  • Count: n = |V| (one variable per vertex representing its position in the instruction schedule)
  • Per-variable domain: {0, 1, ..., 3n-1} (time step; upper bound allows LOAD + OP per vertex plus extra copies)
  • Meaning: A valid program is a sequence of OP and LOAD instructions respecting DAG dependencies. The objective minimizes total instruction count (OPs + LOADs).

Schema (data type)

Type name: MinimumCodeGenerationUnlimitedRegisters
Variants: (none)

Field Type Description
num_vertices usize Number of vertices
left_arcs Vec<(usize, usize)> Left operand arcs L ⊆ A: (parent, child) — child's register is destroyed
right_arcs Vec<(usize, usize)> Right operand arcs R ⊆ A: (parent, child) — child's register is preserved

Complexity

  • Decision complexity: NP-complete (Aho, Johnson, and Ullman, 1977; transformation from FEEDBACK VERTEX SET).
  • Best known exact algorithm: O*(2^num_vertices) brute-force.
  • Restrictions: NP-complete even if only leaf vertices have in-degree > 1. Polynomial for directed forests. Also polynomial if 3-address instructions are allowed.
  • References:
    • A.V. Aho, S.C. Johnson, J.D. Ullman (1977). "Code Generation for Machines with Multiregister Operations." Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 21–28.

Extra Remark

Full book text:

INSTANCE: Directed acyclic graph G = (V, A) with maximum out-degree 2, partition of A into sets L and R, positive integer K.
QUESTION: Can all root (in-degree 0) vertices be computed by a program of K or fewer instructions for an unlimited-register machine?
Reference: [Aho, Johnson, and Ullman, 1977a]. Transformation from FEEDBACK VERTEX SET.
Comment: NP-complete even if the only vertices having in-degree greater than 1 are leaves [Aho, Johnson, and Ullman, 1977a]. Solvable in polynomial time for forests or if 3-address instructions are allowed.

Note: The original G&J formulation is a decision problem with threshold K. We use the equivalent optimization formulation: minimize the number of instructions.

How to solve

  • It can be solved by (existing) bruteforce. (Enumerate all evaluation orderings and copy-insertion choices; find the one with minimum total instructions.)
  • It can be solved by reducing to integer programming.
  • Other: Polynomial for forests and for 3-address architectures.

Example Instance

Expression DAG G with 5 vertices {v0, v1, v2, v3, v4}:

  • Leaves: {v3, v4} (input values, each in its own register)
  • Internal: {v1, v2} (binary operations)
  • Root: {v0} (final result)
  • L/R partition:
    • v1: (v1→v3) ∈ L, (v1→v4) ∈ R — v1 = op(v3, v4), destroys register holding v3
    • v2: (v2→v3) ∈ L, (v2→v4) ∈ R — v2 = op(v3, v4), destroys register holding v3
    • v0: (v0→v1) ∈ L, (v0→v2) ∈ R — v0 = op(v1, v2), destroys register holding v1

Note: v3 and v4 are shared leaves (in-degree 2 each). v3 is the left operand (destroyed) in both v1 and v2.

Optimal program (4 instructions):

  1. LOAD: copy register(v3) → new register (preserving original v3)
  2. OP v1: reg = op(v3_copy, v4) — destroys the copy, v4 preserved
  3. OP v2: reg = op(v3_original, v4) — destroys original v3, v4 preserved
  4. OP v0: reg = op(v1, v2) — destroys v1

Why 3 instructions is impossible: There are 3 internal nodes requiring 3 OPs. Since v3 is a left operand in both v1 and v2, computing v1 destroys v3's register. To then compute v2 (which also needs v3 as left operand), v3 must have been copied first. This forces at least 1 LOAD, giving 3 + 1 = 4 minimum.

Suboptimal program (5 instructions): If v4 is also copied unnecessarily, we get 3 OPs + 2 LOADs = 5.

Expected Outcome

Optimal value: Min(4) — the minimum number of instructions is 4 (3 OPs + 1 LOAD). The shared left-operand destruction of v3 forces exactly one copy instruction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions