Skip to content

feat(wren): build MDL manifests from OSI semantic models#2322

Open
goldmedal wants to merge 7 commits into
Canner:mainfrom
goldmedal:worktree-plan-0525-104459
Open

feat(wren): build MDL manifests from OSI semantic models#2322
goldmedal wants to merge 7 commits into
Canner:mainfrom
goldmedal:worktree-plan-0525-104459

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 25, 2026

Summary

Lets users of the Open Semantic Interchange spec turn an OSI YAML into a Wren MDL without forking it into a parallel wren project. Two entry points:

# Keep OSI as the single source of truth — every build re-reads it
wren context build|validate|show --from-osi semantic_model.yaml --data-source postgres

# Or migrate once to a native wren project, then own the YAML directly
wren context init --from-osi semantic_model.yaml --data-source postgres --path my_project

Three ways to land

Native wren OSI kept as source OSI migrated once
Source files wren_project.yml + models/ + views/ + relationships.yml A single *.yaml OSI file wren project layout (after migration)
Command wren context init → edit → build wren context build --from-osi wren context init --from-osi → edit → build
Stays in sync with OSI? n/a Yes — re-read every build No — one-way snapshot
Cubes / views / RLAC? Yes No (OSI doesn't model them) Yes (you own the YAML)

How wren-specific hints reach an OSI file

OSI has no file-system extension convention — its only sanctioned vendor escape hatch is the in-document custom_extensions: [{vendor_name, data}] block. Wren config (column types, dialect preference, composite-PK picks, metrics handling) is read from custom_extensions[vendor_name=WREN]:

custom_extensions:
  - vendor_name: WREN
    data: '{"dialect": "SNOWFLAKE", "metrics": "note"}'

semantic_model:
  - name: shop
    datasets:
      - name: orders
        source: shop.public.orders
        custom_extensions:
          - vendor_name: WREN
            data: |
              {
                "column_types": {"order_id": "BIGINT", "amount": "DECIMAL(18,2)"}
              }

Resolution order: CLI flags > semantic_model-level WREN > root-level WREN > defaults.

Actionable validate output

When a WREN block is needed but missing, the linter emits copy-pasteable YAML snippets for the user to paste into their OSI file:

  • Untyped fields → column_types snippet per dataset
  • Composite primary key → which column wren picked + override snippet
  • Cross-dataset metric → flagged as note-only (Wren cubes bind to a single base_object)
  • Multiple semantic_model[] entries → snippet for default_semantic_model at OSI root

validate --from-osi --strict exits 1 on any warning.

Mapping summary

OSI Wren MDL
dataset.source: a.b.c model.table_reference: {catalog: a, schema: b, table: c}
dataset.source with SQL keywords model.ref_sql
dataset.fields[] model.columns[] (dialect-picked expression, type from WREN/is_time/VARCHAR)
dataset.primary_key: [x] model.primary_key: x (composite arrays warn + pick first)
relationships[] wren relationships with synthesized condition and MANY_TO_ONE
semantic_model.name wren_project.yml.name (when migrating via init --from-osi)
semantic_model.ai_context.instructions MDL _instructions
metrics[] (top-level OSI) rendered as markdown notes appended to instructions

Documentation

Test plan

  • 72 new tests in tests/unit/test_osi.py covering parse, WREN block extraction, precedence merging, every conversion path, lint warnings (untyped fields / composite PK / cross-dataset metric / ambiguous semantic_model), malformed input handling, non-string join columns, CLI smoke tests for build / validate / show / init + a round-trip check that init → build produces the same MDL as a direct build --from-osi
  • Real-world TPC-DS OSI example committed as a fixture, exercises every warning path
  • End-to-end smoke test against the wren engine via dry_plan confirms the produced MDL plans queries correctly (join + aggregation + calculated column)
  • Existing 157 context / convert-mdl tests still pass (one pre-existing flake on test_validate_strict_warns is unrelated to this PR)
  • ruff format + ruff check clean

🤖 Generated with Claude Code

Adds `wren context build|validate|show --from-osi <file>` so OSI users can
generate Wren MDL JSON directly from an Open Semantic Interchange YAML file
without forking it into a parallel wren project. The OSI file stays the
single source of truth.

Wren-specific build hints (column types, dialect preference, metrics
handling, composite-PK selection) live inside the OSI file via the spec's
sanctioned `custom_extensions: [{vendor_name: WREN, data: '<json>'}]`
mechanism — at the document root, on a semantic_model, on a dataset, or on
a field. SM-level config overrides root; CLI flags override both.

When a WREN block is needed but missing, validate emits copy-pasteable
YAML snippets (untyped fields, composite primary keys, cross-dataset
metrics, ambiguous semantic_model selection).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python Pull requests that update Python code core labels May 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds OSI support: a new osi.py module converts OSI semantic models into Wren MDL manifests (manifest/JSON builders and linting). context_cli.py gains --from-osi, --data-source, and --semantic-model options for init, validate, build, and show, which delegate to OSI helpers. Fixtures, tests, and docs document and validate the workflow.

Changes

OSI Support for Wren Context CLI

Layer / File(s) Summary
OSI conversion module foundation and transformation logic
core/wren/src/wren/osi.py
Parses OSI as JSON/YAML, selects semantic models by name or default, extracts and shallow-merges WREN vendor configuration, converts datasets (source/ref_sql/table_reference, primary-key resolution, fields with expression/type/calculated detection), relationships (join validation/condition synthesis), and metrics (expression extraction, cross-dataset detection, instructions rendering) into a Wren manifest. Exposes build_manifest_from_osi, build_json_from_osi (camelCase + layoutVersion), and lint_osi_file.
CLI option types and command signatures
core/wren/src/wren/context_cli.py
Adds FromOsiOpt, DataSourceOpt, SemanticModelOpt and extends init, validate, build, and show signatures with --from-osi, --data-source, and --semantic-model. Enforces mutual exclusivity and adds help text.
CLI helpers and OSI routing
core/wren/src/wren/context_cli.py
Implements OSI helper functions and early-branch routing: _init_from_osi, _build_from_osi, _validate_from_osi, and _show_from_osi, enforcing --data-source, handling strict/validate-first semantics, writing/formatting outputs, and printing diagnostics or counts.
Test fixtures, comprehensive test suite, and docs
core/wren/tests/fixtures/osi/*, core/wren/tests/unit/test_osi.py, docs/core/guides/osi.md
Adds OSI fixtures (minimal, multi-model, ref-sql, TPC-DS), unit/integration/CLI tests covering parsing, selection, conversion, linting, and CLI behaviors, plus a user guide documenting WREN custom_extensions, precedence, mapping, warnings, and CLI examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • PaulChen79
  • douenergy

Poem

🐰 From OSI fields the tidy wren hops light,
Merges, joins, and metrics stitched just right,
CLI options hum — build, validate, show,
MDL blooms where the semantic models grow,
Warnings whisper hints beneath moonlight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(wren): build MDL manifests from OSI semantic models' accurately and specifically describes the main feature being added: generating Wren MDL manifests directly from OSI semantic models.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 25, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/wren/tests/unit/test_osi.py (1)

37-649: ⚡ Quick win

Add regression tests for malformed OSI input and invalid relationship column element types.

The suite is strong overall, but it currently misses the two highest-risk failure modes in this module: parse/read failure handling and non-string from_columns/to_columns entries. Locking these in would prevent CLI crash regressions and condition-construction regressions.

✅ Suggested additions
+def test_build_manifest_malformed_yaml_returns_error(tmp_path: Path):
+    p = tmp_path / "bad.yaml"
+    p.write_text("semantic_model: [")
+    manifest, errors = build_manifest_from_osi(p, data_source="postgres")
+    assert manifest == {}
+    assert any(e.level == "error" for e in errors)
+
+
+def test_relationship_non_string_join_columns_error():
+    bad = {
+        "version": "0.2.0",
+        "semantic_model": [
+            {
+                "name": "x",
+                "datasets": [
+                    {
+                        "name": "a",
+                        "source": "c.s.a",
+                        "fields": [{"name": "id", "expression": "id"}],
+                    },
+                    {
+                        "name": "b",
+                        "source": "c.s.b",
+                        "fields": [{"name": "id", "expression": "id"}],
+                    },
+                ],
+                "relationships": [
+                    {
+                        "name": "a_to_b",
+                        "from": "a",
+                        "to": "b",
+                        "from_columns": ["id"],
+                        "to_columns": [123],
+                    }
+                ],
+            }
+        ],
+    }
+    p = Path(__file__).parent / "tmp_invalid_rel.yaml"
+    p.write_text(json.dumps(bad))
+    try:
+        _, errors = build_manifest_from_osi(p, data_source="postgres")
+        assert any("entries must be non-empty strings" in e.message for e in errors)
+    finally:
+        p.unlink(missing_ok=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_osi.py` around lines 37 - 649, Add two regression
tests into this test module: one that feeds malformed OSI content (e.g., bad
JSON/YAML) to parse_osi and to the CLI paths (lint_osi_file or
build_manifest_from_osi via runner.invoke) and asserts a clean error is returned
(no unhandled exception / exit_code != 0 and error message mentions parse/read
failure); and one that constructs an OSI with a relationship whose
from_columns/to_columns contain non-string elements (e.g., integers or lists),
then calls build_manifest_from_osi and asserts it does not crash but returns an
error entry (errors list contains an error with level "error" and a message
referencing relationship/from_columns or the invalid element). Reference
parse_osi and build_manifest_from_osi as the targets to exercise and ensure
tests assert exit codes/messages rather than raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/osi.py`:
- Around line 674-675: Wrap calls to load_osi_file (e.g., where osi =
load_osi_file(osi_path)) in a try/except that catches file/read/parse exceptions
and converts them into a ValidationError instance appended to errors instead of
letting the exception propagate; on error set osi to None (or an empty safe
value) so downstream code can continue. Apply the same pattern to the other
occurrence around lines 799-804 so both read/parse failures produce structured
ValidationError diagnostics rather than uncaught exceptions.
- Around line 547-577: The code builds the SQL join condition from from_cols and
to_cols without validating element types, which lets non-string entries be
stringified into invalid SQL; update the validation in the block that currently
checks list-ness and lengths (the variables from_cols, to_cols and the
surrounding function that appends ValidationError) to iterate both lists and
verify each element is an instance of str (or unicode), and if any non-string
entries are found append a ValidationError referencing the relationship name and
which side/indices are invalid (or list the offending values), then return ({},
errors) instead of constructing parts/condition; only build parts and the final
condition after all elements pass the string-type check.

---

Nitpick comments:
In `@core/wren/tests/unit/test_osi.py`:
- Around line 37-649: Add two regression tests into this test module: one that
feeds malformed OSI content (e.g., bad JSON/YAML) to parse_osi and to the CLI
paths (lint_osi_file or build_manifest_from_osi via runner.invoke) and asserts a
clean error is returned (no unhandled exception / exit_code != 0 and error
message mentions parse/read failure); and one that constructs an OSI with a
relationship whose from_columns/to_columns contain non-string elements (e.g.,
integers or lists), then calls build_manifest_from_osi and asserts it does not
crash but returns an error entry (errors list contains an error with level
"error" and a message referencing relationship/from_columns or the invalid
element). Reference parse_osi and build_manifest_from_osi as the targets to
exercise and ensure tests assert exit codes/messages rather than raising.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6344dbdf-832f-4a25-905e-e853072dfe06

📥 Commits

Reviewing files that changed from the base of the PR and between 9c9ba89 and 7fc7675.

📒 Files selected for processing (7)
  • core/wren/src/wren/context_cli.py
  • core/wren/src/wren/osi.py
  • core/wren/tests/fixtures/osi/minimal.yaml
  • core/wren/tests/fixtures/osi/multi_semantic_model.yaml
  • core/wren/tests/fixtures/osi/ref_sql_source.yaml
  • core/wren/tests/fixtures/osi/tpcds_full.yaml
  • core/wren/tests/unit/test_osi.py

Comment thread core/wren/src/wren/osi.py
Comment thread core/wren/src/wren/osi.py Outdated
goldmedal and others added 2 commits May 25, 2026 13:28
…Error

Address coderabbit review on Canner#2322:

- Wrap load_osi_file in try/except so YAML/JSON parse failures and missing
  files become structured errors instead of uncaught exceptions in
  build/validate flows.
- Validate that relationship from_columns / to_columns entries are
  non-empty strings before constructing the join condition; non-string
  entries previously stringified into invalid SQL.
- Regression tests for both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end testing against the wren engine surfaced that the snake→camel
conversion mangled `_instructions` into `Instructions` (the leading
underscore is stripped before capitalize-and-join), so downstream tooling
that looks for the exact key (memory indexer, MDL importer) couldn't find
the OSI-derived instructions. Pop the key before _convert_keys and
re-attach after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/osi.py`:
- Around line 792-799: The JSON "show" path is still converting "_instructions"
into "Instructions" because build_manifest_from_osi() (used by context_cli.py
for wren context show --from-osi --output json) calls _convert_keys(manifest)
directly; fix it by mirroring the build_json_from_osi approach: in
build_manifest_from_osi (or right inside context_cli.py where
build_manifest_from_osi() result is converted) pop manifest.pop("_instructions",
None) before calling _convert_keys, call manifest_json =
_convert_keys(manifest), then set manifest_json["_instructions"] = instructions
if instructions is not None so the output preserves the literal "_instructions"
key.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f88efa87-38d8-411d-818f-10cd87da9489

📥 Commits

Reviewing files that changed from the base of the PR and between b0f86df and 98b064a.

📒 Files selected for processing (2)
  • core/wren/src/wren/osi.py
  • core/wren/tests/unit/test_osi.py

Comment thread core/wren/src/wren/osi.py
…n project

For users who want to escape OSI's surface-area limits (no cubes, views,
RLAC/CLAC, calculated columns), `init --from-osi` runs the OSI converter
once and lands a full native wren project layout that they own going
forward. Mutually exclusive with `--from-mdl`. Surfaces conversion
warnings during the migration so the user knows which generated YAML
spots need a human review.

Composition only — reuses `build_json_from_osi` (OSI→camelCase MDL)
+ `convert_mdl_to_project` (MDL→ProjectFile[]) + `write_project_files`
that already power `--from-mdl`. Side-effect of also carrying OSI
`semantic_model.name` through the manifest dict so the project gets a
sensible default name.

Docs updated with a side-by-side of the three flows (native, keep-OSI,
migrate-once) and a "Migrate to a native wren project" section that
calls out when *not* to migrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/wren/src/wren/context_cli.py (1)

1057-1062: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_instructions key mangled to Instructions in JSON output.

_show_from_osi calls _convert_keys(manifest) directly without the special handling that build_json_from_osi uses (pop before conversion, re-add after). The _instructions key will be transformed incorrectly, breaking downstream tooling that expects the exact _instructions key.

🐛 Proposed fix
     if output == "json":
         from wren.context import _convert_keys  # noqa: PLC0415

+        instructions = manifest.pop("_instructions", None)
         manifest_json = _convert_keys(manifest)
         manifest_json["layoutVersion"] = 2
+        if instructions:
+            manifest_json["_instructions"] = instructions
         typer.echo(json.dumps(manifest_json, indent=2, ensure_ascii=False))
         return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/context_cli.py` around lines 1057 - 1062, _show_from_osi
currently calls _convert_keys(manifest) directly which lets the private
"_instructions" key be renamed (e.g., to "Instructions"); modify _show_from_osi
to mirror build_json_from_osi: if manifest contains "_instructions", pop and
store that value, call _convert_keys on the rest, then reinsert the stored value
back under the exact "_instructions" key into the converted manifest before
emitting JSON; reference the functions _show_from_osi, _convert_keys, and
build_json_from_osi and ensure the key name remains exactly "_instructions".
🧹 Nitpick comments (1)
core/wren/tests/unit/test_osi.py (1)

718-735: ⚡ Quick win

Consider adding assertion for _instructions key preservation.

This test validates layoutVersion and tableReference but doesn't verify that _instructions is preserved correctly. Given the build_json_from_osi test at line 361-369 explicitly guards against _instructionsInstructions mangling, the same check would be valuable here for the show --output json path.

💡 Suggested addition
     assert result.exit_code == 0, result.output
     data = json.loads(result.output)
     assert data["layoutVersion"] == 2
     assert "tableReference" in data["models"][0]
+    assert "_instructions" in data
+    assert "Instructions" not in data
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_osi.py` around lines 718 - 735, Add an assertion in
test_cli_show_from_osi_json to verify the `_instructions` key is preserved in
the JSON output: after loading `data` assert that `"_instructions"` exists on
`data["models"][0]` and that its value matches the expected instructions (reuse
the expected value produced by build_json_from_osi or the known value from the
minimal fixture), mirroring the check already done in the build_json_from_osi
test to guard against `_instructions` → `Instructions` mangling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@core/wren/src/wren/context_cli.py`:
- Around line 1057-1062: _show_from_osi currently calls _convert_keys(manifest)
directly which lets the private "_instructions" key be renamed (e.g., to
"Instructions"); modify _show_from_osi to mirror build_json_from_osi: if
manifest contains "_instructions", pop and store that value, call _convert_keys
on the rest, then reinsert the stored value back under the exact "_instructions"
key into the converted manifest before emitting JSON; reference the functions
_show_from_osi, _convert_keys, and build_json_from_osi and ensure the key name
remains exactly "_instructions".

---

Nitpick comments:
In `@core/wren/tests/unit/test_osi.py`:
- Around line 718-735: Add an assertion in test_cli_show_from_osi_json to verify
the `_instructions` key is preserved in the JSON output: after loading `data`
assert that `"_instructions"` exists on `data["models"][0]` and that its value
matches the expected instructions (reuse the expected value produced by
build_json_from_osi or the known value from the minimal fixture), mirroring the
check already done in the build_json_from_osi test to guard against
`_instructions` → `Instructions` mangling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 13d28502-f3d6-447e-a654-7547aac08ee5

📥 Commits

Reviewing files that changed from the base of the PR and between 98b064a and deadd3e.

📒 Files selected for processing (4)
  • core/wren/src/wren/context_cli.py
  • core/wren/src/wren/osi.py
  • core/wren/tests/unit/test_osi.py
  • docs/core/guides/osi.md
✅ Files skipped from review due to trivial changes (1)
  • docs/core/guides/osi.md

goldmedal and others added 2 commits May 25, 2026 14:07
Bring the manage_project "Migrate from an existing manifest" section in
sync with the new flag — same external-manifest entry point, now with two
supported source formats. Cross-link back to the OSI guide for the
keep-OSI-as-source alternative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… json`

Coderabbit caught that the same _instructions→Instructions mangling I
fixed in build_json_from_osi (commit 98b064a) still happened in the
show path, which builds its JSON output via _convert_keys(manifest)
directly. Mirror the pop-before/restore-after pattern so the show
preview stays consistent with the build output. Extend
test_cli_show_from_osi_json to lock it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal requested a review from PaulChen79 May 25, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core documentation Improvements or additions to documentation python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant