Skip to content

spec: update constraint ID system#570

Open
erik-3milabs wants to merge 19 commits into
spec/mainfrom
spec/constraint_id
Open

spec: update constraint ID system#570
erik-3milabs wants to merge 19 commits into
spec/mainfrom
spec/constraint_id

Conversation

@erik-3milabs
Copy link
Copy Markdown
Collaborator

This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This

  1. permit addition, removing and reorganizing constraints without changing any IDs, and
  2. enforces ID updates upon constraint modification

i.e., two constraints with the same ID are 100-ε% certain to be identical.

Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.

Before:
image

After:
image

@erik-3milabs erik-3milabs self-assigned this Apr 28, 2026
@erik-3milabs erik-3milabs added the spec Updates and improvements to the spec document label Apr 28, 2026
Comment thread spec/src.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Review

This PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. BRH-1J3K). It also restructures the constraint table (separate tag/counter columns, per-constraint dividers) and adds short code fields to all chip TOML files.

The approach is sound. Three bugs found:


Medium — Crash on empty variable groups (src.typ:191–197)

.sum() without a default: on an empty array returns none in Typst, not an empty dict. If any chip has a variable group with zero entries, the inner .sum() returns none, causing a crash on the outer .sum(). Fix: add default: (:) to both calls.

Low — z-fill panics for ≥100 constraints in one group (chip.typ:310)

"0" * (2 - s.len()) produces a negative repeat count when the counter hits 100+. Typst panics on negative string repeat. Fix: calc.max(0, 2 - s.len()).

Low — Hash input ambiguity from unescaped separator (src.typ:182)

The chip name, group, and type are joined with , without escaping. A comma in any field makes two distinct inputs hash identically, potentially causing ID collisions. Current names are safe, but it's a latent fragility. Fix: use a separator that cannot appear in those values (e.g. \x00).

@github-actions
Copy link
Copy Markdown

Codex Code Review

No actionable issues found in the PR diff.

The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns.

I could not run a full Typst/shiroa render because neither typst nor shiroa is installed in this environment.

@erik-3milabs erik-3milabs marked this pull request as draft April 28, 2026 13:28
@erik-3milabs erik-3milabs marked this pull request as ready for review April 28, 2026 13:37
@github-actions
Copy link
Copy Markdown

Codex Code Review

Findings:

  • Low - spec/tooling/chip.py:990 ignores the new TOML code field. It sets self.code = data.get("name", ""), so the validator checks name twice and never validates code. Invalid code values would pass Python tooling, and any tooling using Chip.code will see the full chip name instead of the short code added in this PR. Use data.get("code", self.name) and report self.code in the identifier error.

No security issues found in the PR diff. I did not run a Typst render because typst is not installed in this environment.

Comment thread spec/tooling/chip.py Outdated
Comment thread spec/tooling/chip.py Outdated
Comment thread spec/src.typ
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Review: spec/constraint_id

Overview

Replaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a code shorthand field.

Bugs

[High] chip.py reads wrong key for codeself.code = data.get("name", "") always resolves to the chip name; the new code field (e.g. "BRH") is never read or validated. See inline comment on line 990.

[Medium] tag excluded from constraint digestdigestable_constraint omits the tag field for interaction and template constraints. Two constraints calling different tables with identical inputs/outputs/iters receive the same ID, breaking the core invariant. See inline comment on lines 210-213.

[Low] Wrong variable in error messagereporter.asserts(self.code.isidentifier(), f"Invalid identifier: {self.name!r}") prints the name, not the code. See inline comment on line 994.

FNV-1a implementation

The two-limb 64-bit multiplication is mathematically correct: carry propagation from lo into hi is handled, and the hash[1] * prime[1] term (≥ 2⁶⁴) is correctly discarded. No issues here.

Minor observations

  • CONSTRAINT_ID_CHAR_COUNT = 4 gives 20 bits of ID space. With ~50 constraints per chip the birthday-collision probability is negligible (~0.1%), so this is fine for the current scale.
  • bytes-to-hex's z-fill silently produces wrong output for strings longer than 2 chars, but since all inputs are single bytes (0–255) this can't trigger in practice.

erik-3milabs and others added 3 commits April 28, 2026 15:43
Copy link
Copy Markdown
Collaborator

@RobinJadoul RobinJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet

Comment thread spec/src.typ
///
/// This implementation operates on two 32-bit limbs, rather than a single
/// 64-bit limb, since Typst does not support u64s.
#let FNV-1a(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasoning for not using a typst package for hashing?
e.g. digestify should be running in wasm, so speed should not be a massive concern (probably)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • digestify is available under the MIT license. When I skimmed the license last week, I thought that using it might require our spec to also be released under the MIT (or compatible) license. Having a closer look now, this might not actually be the case. Still, I'm not a legal expert, so I decided to avoid it just in case.
  • The other hashing package (jumble) is slower than this implementation.
  • It was fun to learn a bit about non-cryptographic hashing, and implement this technique here.

Comment thread spec/chip.typ Outdated

let indices = (("",) + iters_of(constraint).map(it => it.at(0))).join(".")

let z-fill(s) = "0" * calc.max(2 - s.len(), 0) + s
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears a few times, maybe lift to the top level.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used twice (in src.typ and chip.typ). Since those two files don't have any shared dependencies/don't depend on one another, I'd have to create a new file that both can depend on. I didn't feel like doing this just for this single one-line function.

Comment thread spec/chip.typ Outdated
let z-fill(s) = "0" * calc.max(2 - s.len(), 0) + s
let ref-tag(i) = raw(tag) + sub("/" + z-fill(str(i)))
return (
context super[#emph(z-fill(str(counter(figure.where(kind: counter-kind)).get().at(0) + 1)))],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter(...).final() could give us the total count of constraints, to maybe be able to z-fill more than 2 digits if required

Comment thread spec/src.typ
}

/// Converts a byte array to a hexadecimal string
#let bytes-to-hex(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there much need to carry hex around instead of going straight from bytes to base 32?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. dropped it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec Updates and improvements to the spec document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants