feat(wren): build MDL manifests from OSI semantic models#2322
feat(wren): build MDL manifests from OSI semantic models#2322goldmedal wants to merge 7 commits into
Conversation
Adds `wren context build|validate|show --from-osi <file>` so OSI users can
generate Wren MDL JSON directly from an Open Semantic Interchange YAML file
without forking it into a parallel wren project. The OSI file stays the
single source of truth.
Wren-specific build hints (column types, dialect preference, metrics
handling, composite-PK selection) live inside the OSI file via the spec's
sanctioned `custom_extensions: [{vendor_name: WREN, data: '<json>'}]`
mechanism — at the document root, on a semantic_model, on a dataset, or on
a field. SM-level config overrides root; CLI flags override both.
When a WREN block is needed but missing, validate emits copy-pasteable
YAML snippets (untyped fields, composite primary keys, cross-dataset
metrics, ambiguous semantic_model selection).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds OSI support: a new ChangesOSI Support for Wren Context CLI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
core/wren/tests/unit/test_osi.py (1)
37-649: ⚡ Quick winAdd regression tests for malformed OSI input and invalid relationship column element types.
The suite is strong overall, but it currently misses the two highest-risk failure modes in this module: parse/read failure handling and non-string
from_columns/to_columnsentries. Locking these in would prevent CLI crash regressions and condition-construction regressions.✅ Suggested additions
+def test_build_manifest_malformed_yaml_returns_error(tmp_path: Path): + p = tmp_path / "bad.yaml" + p.write_text("semantic_model: [") + manifest, errors = build_manifest_from_osi(p, data_source="postgres") + assert manifest == {} + assert any(e.level == "error" for e in errors) + + +def test_relationship_non_string_join_columns_error(): + bad = { + "version": "0.2.0", + "semantic_model": [ + { + "name": "x", + "datasets": [ + { + "name": "a", + "source": "c.s.a", + "fields": [{"name": "id", "expression": "id"}], + }, + { + "name": "b", + "source": "c.s.b", + "fields": [{"name": "id", "expression": "id"}], + }, + ], + "relationships": [ + { + "name": "a_to_b", + "from": "a", + "to": "b", + "from_columns": ["id"], + "to_columns": [123], + } + ], + } + ], + } + p = Path(__file__).parent / "tmp_invalid_rel.yaml" + p.write_text(json.dumps(bad)) + try: + _, errors = build_manifest_from_osi(p, data_source="postgres") + assert any("entries must be non-empty strings" in e.message for e in errors) + finally: + p.unlink(missing_ok=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/unit/test_osi.py` around lines 37 - 649, Add two regression tests into this test module: one that feeds malformed OSI content (e.g., bad JSON/YAML) to parse_osi and to the CLI paths (lint_osi_file or build_manifest_from_osi via runner.invoke) and asserts a clean error is returned (no unhandled exception / exit_code != 0 and error message mentions parse/read failure); and one that constructs an OSI with a relationship whose from_columns/to_columns contain non-string elements (e.g., integers or lists), then calls build_manifest_from_osi and asserts it does not crash but returns an error entry (errors list contains an error with level "error" and a message referencing relationship/from_columns or the invalid element). Reference parse_osi and build_manifest_from_osi as the targets to exercise and ensure tests assert exit codes/messages rather than raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/osi.py`:
- Around line 674-675: Wrap calls to load_osi_file (e.g., where osi =
load_osi_file(osi_path)) in a try/except that catches file/read/parse exceptions
and converts them into a ValidationError instance appended to errors instead of
letting the exception propagate; on error set osi to None (or an empty safe
value) so downstream code can continue. Apply the same pattern to the other
occurrence around lines 799-804 so both read/parse failures produce structured
ValidationError diagnostics rather than uncaught exceptions.
- Around line 547-577: The code builds the SQL join condition from from_cols and
to_cols without validating element types, which lets non-string entries be
stringified into invalid SQL; update the validation in the block that currently
checks list-ness and lengths (the variables from_cols, to_cols and the
surrounding function that appends ValidationError) to iterate both lists and
verify each element is an instance of str (or unicode), and if any non-string
entries are found append a ValidationError referencing the relationship name and
which side/indices are invalid (or list the offending values), then return ({},
errors) instead of constructing parts/condition; only build parts and the final
condition after all elements pass the string-type check.
---
Nitpick comments:
In `@core/wren/tests/unit/test_osi.py`:
- Around line 37-649: Add two regression tests into this test module: one that
feeds malformed OSI content (e.g., bad JSON/YAML) to parse_osi and to the CLI
paths (lint_osi_file or build_manifest_from_osi via runner.invoke) and asserts a
clean error is returned (no unhandled exception / exit_code != 0 and error
message mentions parse/read failure); and one that constructs an OSI with a
relationship whose from_columns/to_columns contain non-string elements (e.g.,
integers or lists), then calls build_manifest_from_osi and asserts it does not
crash but returns an error entry (errors list contains an error with level
"error" and a message referencing relationship/from_columns or the invalid
element). Reference parse_osi and build_manifest_from_osi as the targets to
exercise and ensure tests assert exit codes/messages rather than raising.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6344dbdf-832f-4a25-905e-e853072dfe06
📒 Files selected for processing (7)
core/wren/src/wren/context_cli.pycore/wren/src/wren/osi.pycore/wren/tests/fixtures/osi/minimal.yamlcore/wren/tests/fixtures/osi/multi_semantic_model.yamlcore/wren/tests/fixtures/osi/ref_sql_source.yamlcore/wren/tests/fixtures/osi/tpcds_full.yamlcore/wren/tests/unit/test_osi.py
…Error Address coderabbit review on Canner#2322: - Wrap load_osi_file in try/except so YAML/JSON parse failures and missing files become structured errors instead of uncaught exceptions in build/validate flows. - Validate that relationship from_columns / to_columns entries are non-empty strings before constructing the join condition; non-string entries previously stringified into invalid SQL. - Regression tests for both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end testing against the wren engine surfaced that the snake→camel conversion mangled `_instructions` into `Instructions` (the leading underscore is stripped before capitalize-and-join), so downstream tooling that looks for the exact key (memory indexer, MDL importer) couldn't find the OSI-derived instructions. Pop the key before _convert_keys and re-attach after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/osi.py`:
- Around line 792-799: The JSON "show" path is still converting "_instructions"
into "Instructions" because build_manifest_from_osi() (used by context_cli.py
for wren context show --from-osi --output json) calls _convert_keys(manifest)
directly; fix it by mirroring the build_json_from_osi approach: in
build_manifest_from_osi (or right inside context_cli.py where
build_manifest_from_osi() result is converted) pop manifest.pop("_instructions",
None) before calling _convert_keys, call manifest_json =
_convert_keys(manifest), then set manifest_json["_instructions"] = instructions
if instructions is not None so the output preserves the literal "_instructions"
key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f88efa87-38d8-411d-818f-10cd87da9489
📒 Files selected for processing (2)
core/wren/src/wren/osi.pycore/wren/tests/unit/test_osi.py
…n project For users who want to escape OSI's surface-area limits (no cubes, views, RLAC/CLAC, calculated columns), `init --from-osi` runs the OSI converter once and lands a full native wren project layout that they own going forward. Mutually exclusive with `--from-mdl`. Surfaces conversion warnings during the migration so the user knows which generated YAML spots need a human review. Composition only — reuses `build_json_from_osi` (OSI→camelCase MDL) + `convert_mdl_to_project` (MDL→ProjectFile[]) + `write_project_files` that already power `--from-mdl`. Side-effect of also carrying OSI `semantic_model.name` through the manifest dict so the project gets a sensible default name. Docs updated with a side-by-side of the three flows (native, keep-OSI, migrate-once) and a "Migrate to a native wren project" section that calls out when *not* to migrate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/wren/src/wren/context_cli.py (1)
1057-1062:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
_instructionskey mangled toInstructionsin JSON output.
_show_from_osicalls_convert_keys(manifest)directly without the special handling thatbuild_json_from_osiuses (pop before conversion, re-add after). The_instructionskey will be transformed incorrectly, breaking downstream tooling that expects the exact_instructionskey.🐛 Proposed fix
if output == "json": from wren.context import _convert_keys # noqa: PLC0415 + instructions = manifest.pop("_instructions", None) manifest_json = _convert_keys(manifest) manifest_json["layoutVersion"] = 2 + if instructions: + manifest_json["_instructions"] = instructions typer.echo(json.dumps(manifest_json, indent=2, ensure_ascii=False)) return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/src/wren/context_cli.py` around lines 1057 - 1062, _show_from_osi currently calls _convert_keys(manifest) directly which lets the private "_instructions" key be renamed (e.g., to "Instructions"); modify _show_from_osi to mirror build_json_from_osi: if manifest contains "_instructions", pop and store that value, call _convert_keys on the rest, then reinsert the stored value back under the exact "_instructions" key into the converted manifest before emitting JSON; reference the functions _show_from_osi, _convert_keys, and build_json_from_osi and ensure the key name remains exactly "_instructions".
🧹 Nitpick comments (1)
core/wren/tests/unit/test_osi.py (1)
718-735: ⚡ Quick winConsider adding assertion for
_instructionskey preservation.This test validates
layoutVersionandtableReferencebut doesn't verify that_instructionsis preserved correctly. Given thebuild_json_from_ositest at line 361-369 explicitly guards against_instructions→Instructionsmangling, the same check would be valuable here for theshow --output jsonpath.💡 Suggested addition
assert result.exit_code == 0, result.output data = json.loads(result.output) assert data["layoutVersion"] == 2 assert "tableReference" in data["models"][0] + assert "_instructions" in data + assert "Instructions" not in data🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/unit/test_osi.py` around lines 718 - 735, Add an assertion in test_cli_show_from_osi_json to verify the `_instructions` key is preserved in the JSON output: after loading `data` assert that `"_instructions"` exists on `data["models"][0]` and that its value matches the expected instructions (reuse the expected value produced by build_json_from_osi or the known value from the minimal fixture), mirroring the check already done in the build_json_from_osi test to guard against `_instructions` → `Instructions` mangling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@core/wren/src/wren/context_cli.py`:
- Around line 1057-1062: _show_from_osi currently calls _convert_keys(manifest)
directly which lets the private "_instructions" key be renamed (e.g., to
"Instructions"); modify _show_from_osi to mirror build_json_from_osi: if
manifest contains "_instructions", pop and store that value, call _convert_keys
on the rest, then reinsert the stored value back under the exact "_instructions"
key into the converted manifest before emitting JSON; reference the functions
_show_from_osi, _convert_keys, and build_json_from_osi and ensure the key name
remains exactly "_instructions".
---
Nitpick comments:
In `@core/wren/tests/unit/test_osi.py`:
- Around line 718-735: Add an assertion in test_cli_show_from_osi_json to verify
the `_instructions` key is preserved in the JSON output: after loading `data`
assert that `"_instructions"` exists on `data["models"][0]` and that its value
matches the expected instructions (reuse the expected value produced by
build_json_from_osi or the known value from the minimal fixture), mirroring the
check already done in the build_json_from_osi test to guard against
`_instructions` → `Instructions` mangling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 13d28502-f3d6-447e-a654-7547aac08ee5
📒 Files selected for processing (4)
core/wren/src/wren/context_cli.pycore/wren/src/wren/osi.pycore/wren/tests/unit/test_osi.pydocs/core/guides/osi.md
✅ Files skipped from review due to trivial changes (1)
- docs/core/guides/osi.md
Bring the manage_project "Migrate from an existing manifest" section in sync with the new flag — same external-manifest entry point, now with two supported source formats. Cross-link back to the OSI guide for the keep-OSI-as-source alternative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… json` Coderabbit caught that the same _instructions→Instructions mangling I fixed in build_json_from_osi (commit 98b064a) still happened in the show path, which builds its JSON output via _convert_keys(manifest) directly. Mirror the pop-before/restore-after pattern so the show preview stays consistent with the build output. Extend test_cli_show_from_osi_json to lock it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Lets users of the Open Semantic Interchange spec turn an OSI YAML into a Wren MDL without forking it into a parallel wren project. Two entry points:
Three ways to land
wren_project.yml+models/+views/+relationships.yml*.yamlOSI filewren context init→ edit →buildwren context build --from-osiwren context init --from-osi→ edit →buildHow wren-specific hints reach an OSI file
OSI has no file-system extension convention — its only sanctioned vendor escape hatch is the in-document
custom_extensions: [{vendor_name, data}]block. Wren config (column types, dialect preference, composite-PK picks, metrics handling) is read fromcustom_extensions[vendor_name=WREN]:Resolution order: CLI flags > semantic_model-level WREN > root-level WREN > defaults.
Actionable validate output
When a WREN block is needed but missing, the linter emits copy-pasteable YAML snippets for the user to paste into their OSI file:
column_typessnippet per datasetsemantic_model[]entries → snippet fordefault_semantic_modelat OSI rootvalidate --from-osi --strictexits 1 on any warning.Mapping summary
dataset.source: a.b.cmodel.table_reference: {catalog: a, schema: b, table: c}dataset.sourcewith SQL keywordsmodel.ref_sqldataset.fields[]model.columns[](dialect-picked expression, type from WREN/is_time/VARCHAR)dataset.primary_key: [x]model.primary_key: x(composite arrays warn + pick first)relationships[]relationshipswith synthesizedconditionand MANY_TO_ONEsemantic_model.namewren_project.yml.name(when migrating viainit --from-osi)semantic_model.ai_context.instructions_instructionsmetrics[](top-level OSI)Documentation
docs/core/guides/osi.md— full OSI guide covering the three flows, WREN extension block, mapping table, and migration sectiondocs/core/guides/manage_project.md—init --from-osilisted alongsideinit --from-mdlin the migration sectionTest plan
tests/unit/test_osi.pycovering parse, WREN block extraction, precedence merging, every conversion path, lint warnings (untyped fields / composite PK / cross-dataset metric / ambiguous semantic_model), malformed input handling, non-string join columns, CLI smoke tests forbuild/validate/show/init+ a round-trip check thatinit → buildproduces the same MDL as a directbuild --from-osidry_planconfirms the produced MDL plans queries correctly (join + aggregation + calculated column)test_validate_strict_warnsis unrelated to this PR)ruff format+ruff checkclean🤖 Generated with Claude Code