feat: synix.ext configurable transform library by marklubin · Pull Request #72 · marklubin/synix

marklubin · 2026-02-17T09:32:47Z

Summary

Add synix.ext package with four configurable transforms: MapSynthesis (1:1), GroupSynthesis (N:M), ReduceSynthesis (N:1), FoldSynthesis (N:1 sequential)
Rewrite 03-team-report demo to use ext transforms — pipeline.py drops from ~140 lines to ~65
Simplify README to follow simple-to-complex trajectory, move reference material to docs/pipeline-api.md
Add full ext transforms documentation with parameter tables, examples, and "Choosing a Transform" decision guide
37 unit tests + 6 e2e tests for all four transforms
Cassettes regenerated from live API, goldens regenerated from replay

Test plan

uv run release passes (lint + 1051 tests + 5 demos)
uvx synix init scaffold still works with ext transforms
Demo 03-team-report runs correctly from cassettes
README links resolve correctly on GitHub
docs/pipeline-api.md anchor links work from README

Add four configurable transforms that eliminate boilerplate for common LLM pipeline patterns: - MapSynthesis (1:1): apply a prompt to each input independently - GroupSynthesis (N:M): group by metadata key, one output per group - ReduceSynthesis (N:1): combine all inputs into a single output - FoldSynthesis (N:1 sequential): accumulate through inputs one at a time Rewrite the 03-team-report demo to use ext transforms instead of custom Transform subclasses, reducing pipeline.py from ~140 lines to ~65. Simplify README.md to follow a simple-to-complex trajectory, moving detailed reference material to docs/pipeline-api.md. Add full ext transforms documentation with parameter tables and examples.

github-actions · 2026-02-17T09:33:32Z

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: you’re adding a new public module (synix.ext) with behavioral/caching semantics and you rewrote the README to make it front-and-center, but the implementation has multiple correctness and scale traps.

One-way doors

Public API surface: synix.ext + class names (MapSynthesis, GroupSynthesis, ReduceSynthesis, FoldSynthesis)
Hard to reverse because users will import these directly and bake them into pipelines and templates.
Safe to merge if: you commit to compatibility guarantees (even pre-1.0), or gate behind “experimental” naming/docs, or keep it internal until you’ve validated caching + ordering + boundary behavior.
Prompt templating contract via naive string replacement + placeholder vocabulary
Users will write prompts with {artifact}, {artifacts}, etc. Changing placeholder names/escaping later will silently change outputs/caching.
Safe to merge if: you specify templating rules precisely (escaping, missing placeholders, literal braces) and add tests for edge cases.
Cache/fingerprint semantics tied to inspect.getsource() of callables
This is brittle across environments (zipimport, compiled, REPL, lambdas) and can cause surprise cache busting or non-busting. Once people rely on this behavior, changing it breaks incremental rebuild expectations.
Safe to merge if: you define a stable callable identity scheme (explicit version= override like DESIGN.md suggests) and document fallback behavior.

Findings

src/synix/ext/fold_synthesis.py: get_cache_key() ignores sort_by
Failure mode: changing fold ordering won’t invalidate cache; you can get “cached” outputs built with old ordering (wrong result, silent).
Severity: [critical]
src/synix/ext/group_synthesis.py: get_cache_key() ignores group_by / on_missing / missing_key / label_prefix / artifact_type
Failure mode: you can change grouping logic or output labeling and still hit cache (or collide), producing stale/wrong artifacts and provenance.
Severity: [critical]
src/synix/ext/map_synthesis.py: get_cache_key() ignores artifact_type and label_fn
Failure mode: change output type/labeling logic, still cached. Also prompt_id changes but cache key might not.
Severity: [warning]
FoldSynthesis.execute(): prompt rendering via chained .replace()
Failure mode: no escaping; if user content includes {artifact} etc you get accidental substitutions. Also replacement order bugs (e.g., {artifact} inside accumulated).
Severity: [warning]
GroupSynthesis.execute(): slug = group_key.lower().replace(" ", "-")
Failure mode: non-string keys (callable returns int/None), unicode, slashes → crash or unsafe filenames/labels; collisions ("A B" vs "A-B").
Severity: [warning]
ReduceSynthesis / GroupSynthesis: unbounded prompt construction (artifacts_text concatenation)
Failure mode: O(total_input_size) memory + token blowups; will hard-fail on large corpora (10k artifacts) with no chunking/limits. This is exactly the “scale” cliff Synix should avoid.
Severity: [critical]
GroupSynthesis.estimate_output_count() returns max(input_count // 2, 1)
Failure mode: garbage planning/cost estimates; misleading UX. Output count depends on number of groups, not half the inputs.
Severity: [minor]
Docs vs design mismatch: DESIGN.md emphasizes prompt functions + versioning decorator; PR pushes prompt strings
Failure mode: you’re steering the mental model away from “Python-first prompts” toward mini-templating DSL, undermining stated principles.
Severity: [warning]

Missing

Tests for cache invalidation when changing group_by, sort_by, label_fn, artifact_type, on_missing, etc. Current tests only check prompt changes.
Any token/context budgeting or chunking strategy for Reduce/Group (or explicit guardrails + error message).
Clear documentation of templating/escaping rules and placeholder handling when missing/extra.
A statement on whether synix.ext is experimental (docs say experimental for validators/batch build, but not ext).

Verdict

Block. The ext transforms are a reasonable direction, but cache keys are currently wrong for several parameters and Reduce/Group are unbounded and will fail badly at scale; both are core-build-system correctness issues, not polish.

Review parameters

Model: gpt-5.2
Context: README.md, DESIGN.md, synix.dev, PR diff
Diff size: 3,351 lines
Prompt: .github/prompts/openai_review.md
Timestamp: 2026-02-17T09:33:32Z

github-actions · 2026-02-17T09:33:41Z

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR introduces synix.ext, a module of four configurable transforms (MapSynthesis, GroupSynthesis, ReduceSynthesis, FoldSynthesis) that cover common pipeline patterns without requiring custom Transform subclasses. It refactors the team-report template to use them, slims the README into a gateway document pointing to detailed docs, and adds comprehensive unit and e2e tests.

Alignment

Strong fit. DESIGN.md's four build rule types (transform, aggregate, fold, merge) map cleanly to Map/Group/Fold/Reduce. The ext transforms maintain content-addressed artifacts, include prompt text in cache keys, fingerprint callables for cache invalidation, and sort inputs deterministically — all consistent with the materialization key and audit determinism principles. The Python-first bet (§4.1: "code > config") is advanced while lowering the barrier: users who don't need custom logic avoid subclassing entirely. The README refactor correctly pushes detail into dedicated docs without losing the "build system for memory" framing.

Observations

[positive] Cache invalidation is thorough — prompt text, initial value (FoldSynthesis), and callable source code all contribute to fingerprints. This directly implements the step_version_hash concept from DESIGN.md §3.3.
[positive] Test coverage is excellent: 482 lines of unit tests covering happy paths, edge cases (missing metadata with all three on_missing modes), placeholder substitution, sort ordering, cache key differentiation, and fingerprint inclusion. 232 lines of e2e tests exercise the full CLI pipeline path including rebuild caching and GroupSynthesis with a custom upstream transform.
[concern] MapSynthesis.execute assumes inputs[0] without a length check. If split is bypassed or misconfigured, this throws an unhelpful IndexError. A guard or assertion would be clearer. (map_synthesis.py:68)
[concern] GroupSynthesis.execute has a recursive self-call path when _group_key is missing from config — it calls self.split() then self.execute() recursively. If split somehow produces a unit without _group_key, this infinite-recurses. The config merge {**config, **config_extras} should always include it, but a depth guard or explicit check would be safer. (group_synthesis.py:120-125)
[question] GroupSynthesis.estimate_output_count returns max(input_count // 2, 1) — this is a rough heuristic. Is this used for cost estimation in plan? If so, it could be quite misleading for pipelines where group cardinality is much lower or higher than N/2.
[concern] The custom transform example in pipeline-api.md imports _get_llm_client and _logged_complete — private APIs (underscore-prefixed). These are now part of the documented extension surface. Either make them public or provide a public wrapper. Users building on these will break if internals change. (pipeline-api.md:222-224)
[nit] MapSynthesis.__init__ types label_fn as object | None rather than Callable[[Artifact], str] | None. Same for GroupSynthesis.group_by and FoldSynthesis.sort_by. The docs describe the correct types but the signatures don't enforce them.
[positive] FoldSynthesis correctly forces batch=False in __init__, matching the design constraint that sequential accumulation can't be parallelized. This is also tested explicitly.
[nit] Warning output in GroupSynthesis.split uses bare print(..., file=sys.stderr) rather than logging or a Synix-specific warning mechanism. This will be invisible if stderr is suppressed and inconsistent with any future --quiet mode (Non-interactive automation mode for CI/scripted runs #54).
[positive] The README refactor removes ~200 lines of detail (entity model, batch build docs, known limitations table) without losing any information — it all moves to linked docs. This is the right editorial choice for a pre-1.0 project whose README was becoming a user manual.

Verdict

This is a well-executed PR that lowers the barrier to pipeline authoring while preserving architectural integrity — a clear good step for the project, with minor concerns around private API exposure in the extension surface and a couple of defensive coding gaps.

Review parameters

Model: claude-opus-4-6
Context: README.md, DESIGN.md, synix.dev, PR diff
Diff size: 3,351 lines
Prompt: .github/prompts/claude_review.md
Timestamp: 2026-02-17T09:33:41Z

- Cache keys now include all config parameters (artifact_type, group_by, on_missing, missing_key, sort_by) — changing any parameter invalidates cache even if prompt stays the same - Safe template rendering via _render.py — escapes placeholder tokens in user content so {artifact} in artifact text won't cause double-substitution - Replace print(stderr) with proper logging via logger.warning() in GroupSynthesis - Add 16 new tests: cache key invalidation for all parameters, template injection safety, render_template unit tests

github-actions · 2026-02-17T09:48:05Z

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: you’re introducing a new public “configurable transforms” API (synix.ext) that will be hard to change later, and the implementation has a couple of correctness/scale footguns.

One-way doors

New public API surface: synix.ext (Map/Group/Reduce/FoldSynthesis)
Hard to reverse because users will write pipelines against these classes and their parameter names/placeholder conventions.
Safe to merge only if: you’re willing to support these signatures for a long time, or you explicitly label them experimental in docs and semver policy and keep them out of the “main” README Quick Start until stabilized.
Prompt placeholder vocabulary + custom renderer semantics ({artifact}, {artifacts}, etc.)
Hard to reverse because prompts become user data and are embedded in caches/fingerprints; changing placeholders breaks existing pipelines and cache behavior.
Safe only if: you document placeholders as a versioned contract and add compatibility shims or a version field if you ever extend/rename tokens.
Cache/fingerprint behavior for callables uses inspect.getsource fallback to repr
Hard to reverse because it directly affects rebuild triggers; users will come to rely on “changing X rebuilds / doesn’t rebuild.”
Safe only if: you define this as best-effort/unstable or implement a more robust callable identity scheme (explicit version= like in DESIGN.md).

Findings

src/synix/ext/_render.py: “escaping” via zero‑width char then unescape pass
Failure mode: _ESCAPE_MAP maps "{artifact}" -> "{\u200bartifact}", but your unescape loop tries to replace the full escaped string with the original token. That escaped string is never present (the string contains {<ZWSP>artifact}, not {\u200bartifact}). Result: the ZWSP remains in output, silently mutating user content/prompts. This is not “clean text” as claimed.
Severity: [critical]
GroupSynthesis.execute: label slugging is lossy/unsafe (group_key.lower().replace(" ", "-"))
Failure mode: collisions ("A B" vs "A-B"), illegal filesystem/label chars (/, :, unicode), and huge labels. Also group_key could be non-string if callable returns int.
Severity: [warning]
FoldSynthesis ordering: default sort by artifact_id
Failure mode: for sequential accumulation, sorting by content hash is arbitrary and unstable under content changes; this will reorder the fold when any input content changes, producing wildly different outputs and cache invalidation. “Deterministic” isn’t the same as “semantically correct.” Should default to a time/key when available or require explicit sort_by for fold.
Severity: [warning]
ReduceSynthesis/GroupSynthesis: unbounded prompt growth (artifacts_text concatenation)
Failure mode: for large corpora/groups this will explode context size and either hard-fail model limits or incur massive cost. No truncation, chunking, or guardrails.
Severity: [warning]
docs/pipeline-api.md custom transform example imports private functions (_get_llm_client, _logged_complete)
Failure mode: you’re documenting private APIs as the way to extend the system. That’s a maintenance trap; users will copy/paste it and you can’t change internals later.
Severity: [warning]
README/Docs drift vs DESIGN.md
DESIGN.md emphasizes prompt functions + explicit versioning decorator; ext transforms push prompt strings. That’s a conceptual divergence with caching/versioning implications. If ext is the recommended path, the design doc is now misleading.
Severity: [minor]

Missing

Tests covering the _render.py ZWSP bug (you currently assert placeholders remain, but not that output has no ZWSP). Add a direct assertion that "\u200b" is not in rendered output.
Explicit “experimental” labeling for synix.ext in README (and maybe in module docstring) if you’re not ready to support it long-term.
Guardrails for N:1/N:M prompt size (max artifacts, max chars/tokens, or chunking strategy), at least warnings.
A stable slug/label normalization utility used consistently across the project (and tested).

Verdict

Block — _render.py is currently incorrect and will silently corrupt rendered prompts/content; fix that plus basic label/ordering/scale guardrails before merging a new public API.

Review parameters

Model: gpt-5.2
Context: README.md, DESIGN.md, synix.dev, PR diff
Diff size: 3,602 lines
Prompt: .github/prompts/openai_review.md
Timestamp: 2026-02-17T09:48:04Z

github-actions · 2026-02-17T09:48:11Z

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR introduces synix.ext, a library of four configurable transforms (MapSynthesis, GroupSynthesis, ReduceSynthesis, FoldSynthesis) that let pipeline authors define LLM steps via prompt strings instead of custom Transform subclasses. It refactors the team-report template to use these, slims the README by extracting detail into docs/pipeline-api.md, and includes ~870 lines of unit + e2e tests.

Alignment

Strong fit. DESIGN.md's Python-first decision (§4.1) explicitly argues for code over config to enable experimentation. The ext module sits at the right level: it's still Python objects with depends_on references forming a DAG, but removes boilerplate for the four build rule patterns (transform/aggregate/fold/merge) defined in §3.5. Cache keys incorporate prompt text, callables, and artifact types — consistent with the materialization key principle that cache keys must capture all inputs. FoldSynthesis correctly forces batch=False, respecting the sequential dependency constraint from §3.8. The _render.py escape mechanism preserves prompt integrity, supporting the audit determinism principle (§3.9) — the rendered prompt is what gets sent, no silent corruption.

Observations

[positive] _render.py — The zero-width joiner escape/unescape approach to prevent placeholder injection in user content is thoughtful and well-tested. This is a real risk with recursive template substitution.
[positive] Test coverage is excellent: 638 lines of unit tests covering happy paths, edge cases (missing metadata, placeholder injection, callable fingerprinting), and cache key differentiation. E2E tests exercise the full CLI path including build, cache, plan, search, and list.
[concern] MapSynthesis.execute assumes inputs[0] without a length check. If the build engine ever passes an empty list (e.g., a source layer with no files matching), this throws an unhelpful IndexError. A guard or clear error message would be safer.
[concern] GroupSynthesis.execute recurses into itself when _group_key is missing from config (lines calling self.split then self.execute). If split returns units that also lack _group_key for some reason, this is unbounded recursion. The base case depends on config state rather than a structural guarantee.
[question] The README pipeline example imports from synix.ext but the existing synix.transforms module (EpisodeSummary, MonthlyRollup, etc.) still exists. Are the built-in transforms being rebased on ext internally, or are these parallel hierarchies? The relationship isn't stated.
[nit] FoldSynthesis.get_cache_key uses repr(self.sort_by) for callable sort_by, but compute_fingerprint uses inspect.getsource. These can diverge — the cache key could match while the fingerprint doesn't, or vice versa. Consider using the same strategy in both.
[concern] GroupSynthesis.estimate_output_count returns max(input_count // 2, 1) — a rough heuristic. For plan cost estimation this could be significantly wrong (e.g., 100 inputs all in 1 group). Not blocking, but worth a comment explaining the approximation.
[positive] The README refactor moves reference material to dedicated docs without losing information. The README now reads as a quick-start guide that links out, which is better for the target audience.
[nit] label_fn is typed as object | None in MapSynthesis's __init__. Should be Callable[[Artifact], str] | None for clarity and IDE support.
[positive] Cassettes and golden files are updated consistently, showing the template was actually re-run against the new transforms.

Verdict

This is a well-executed extension-model PR that makes the common case easy while keeping the custom-transform escape hatch — a good incremental step that directly serves the "iterate on architectures" thesis.

Review parameters

Model: claude-opus-4-6
Context: README.md, DESIGN.md, synix.dev, PR diff
Diff size: 3,602 lines
Prompt: .github/prompts/claude_review.md
Timestamp: 2026-02-17T09:48:11Z

Resolve conflicts from ext transforms merge (PR #72): - Keep v0.12.1 version from main - Keep HEAD's batch-build fixes (create_fresh, pipeline hash, N:1 gating, per-unit errors) - Merge CLI table styles from main with full batch-build commands

marklubin merged commit 08c97df into main Feb 17, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: synix.ext configurable transform library#72

feat: synix.ext configurable transform library#72
marklubin merged 2 commits intomainfrom
feat/ext-transforms

marklubin commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marklubin commented Feb 17, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 17, 2026

One-way doors

Findings

Missing

Verdict

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

One-way doors

Findings

Missing

Verdict

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant