Skip to content

feat: cube PyO3 binding + wren cube CLI + memory indexing#2277

Merged
goldmedal merged 5 commits into
feat/wasm-cubefrom
feat/cube-cli
May 14, 2026
Merged

feat: cube PyO3 binding + wren cube CLI + memory indexing#2277
goldmedal merged 5 commits into
feat/wasm-cubefrom
feat/cube-cli

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 14, 2026

Summary

Phase D + E of the cube rollout (impl-cube-cli.md): expose the Rust cube translator to Python, surface it through the wren CLI, and teach the memory indexer about cubes.

Phase D — wren-core-py binding (#0674865c)

  • New cube_query_to_sql(cube_query_json, manifest_json) -> str pyfunction in core/wren-core-py/src/cube.rs. Serde-driven: both inputs are JSON, output is the generated SQL. Bad JSON / unknown cube / cyclic measures all raise ValueError.
  • 8 pytest cases in tests/test_cube.py covering the happy path plus error modes.

Phase E1+E2 — YAML pipeline (#d05a3efe)

  • core/wren/src/wren/context.py gains load_cubes(), wires cubes into build_manifest() / build_json() (snake → camel), and extends validate_project() with structural cube checks (name uniqueness, baseObject exists, hierarchy levels reference declared dimensions). Deep validation (measure cycles) still happens Rust-side at analyze time.
  • wren context init now scaffolds an empty cubes/ directory.
  • 8 new unit tests.

Phase E3+E4+E6 — wren cube CLI (#25fcfa1c)

  • New core/wren/src/wren/cube_cli.py with three commands:
    • wren cube list
    • wren cube describe <name>
    • wren cube query — build a CubeQuery from CLI flags (--cube, --measures, --dimensions, --time-dimension name:granularity[:start,end], --filter dim:op[:value], --limit, --offset) or load it from JSON via --from <file|->. --sql-only prints the generated SQL without executing.
  • Registered via app.add_typer(cube_app) in cli.py.
  • 10 new unit tests.

Phase E5 — Memory indexing (#3b73847c)

  • extract_schema_items() now emits records for cubes, measures, cube dimensions, and time dimensions so semantic search surfaces them alongside models and views.
  • describe_schema() gains a cube section printing measures/dimensions/time dimensions/hierarchies.
  • 7 new unit tests covering the new record types and describe_schema output.

Test plan

  • cargo test --lib -p wren-core cube:: — 33 cube unit tests pass (no regressions)
  • cd core/wren-core-py && just test-py — 33 pass (8 new + 25 existing)
  • cd core/wren && uv run pytest tests/unit/ — 167 pass for cube-related modules (test_cube_cli, test_memory, test_context); a single unrelated test_validate_strict_warns continues to fail on this branch and on feat/wasm-cube HEAD, so the regression is pre-existing.
  • cd core/wren && just lint — clean
  • cd core/wren-core-py && cargo clippy -- -D warnings — clean
  • CI on the wren / wren-core-py path-filtered workflows passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added cube management commands: wren cube list, describe, and query for working with cubes in your project.
    • Cube definitions are now integrated into project structure with automatic validation.
    • Ability to convert cube queries to SQL and execute them directly.

Review Change Stack

goldmedal and others added 4 commits May 14, 2026 12:16
PyO3 binding for the Rust cube translator added in PR #2265. The
function takes CubeQuery JSON and manifest JSON, returns the generated
SQL, and raises ValueError on bad input or translation errors (unknown
cube/measure/dimension, cyclic derived measures, …).

Includes 8 pytest cases covering the basic happy path plus error modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
context build now picks up cube definitions from <project>/cubes/*.yml
and includes them in the generated manifest (snake_case for
build_manifest, camelCase via build_json for the engine). context init
scaffolds an empty cubes/ directory next to models/ and views/, and
context validate gains structural checks for cube name uniqueness,
baseObject reference, and hierarchy levels — deep semantic checks (cycle
detection, measure resolution) still happen Rust-side at analyze time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three commands under `wren cube`:

  * list           — show all cubes and their measures/dimensions
  * describe NAME  — pretty-print the full cube schema (JSON)
  * query          — build a CubeQuery from CLI flags (--cube,
                     --measures, --dimensions, --time-dimension,
                     --filter, --limit, --offset) or load it from JSON
                     via --from <file|->, translate it to SQL through
                     the new wren_core.cube_query_to_sql binding, then
                     run it through WrenEngine. --sql-only prints the
                     generated SQL without executing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extract_schema_items now emits records for cubes, measures, cube
dimensions, and time dimensions so semantic search surfaces them
alongside models and views. describe_schema gains a cube section that
prints measures (with expressions), dimensions, time dimensions, and
hierarchies. Both walks tolerate manifests without a cubes key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bbf673af-ea75-495e-b015-7d433871d723

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cube-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added python Pull requests that update Python code core labels May 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
core/wren/tests/unit/test_context.py (1)

876-889: ⚡ Quick win

Add a regression test for non-string hierarchy levels.

Given the hierarchy validation path, a malformed level entry (e.g., object/list) should be reported as a validation error rather than crashing. A targeted test here would lock that behavior.

Proposed test addition
+def test_validate_cube_hierarchy_level_type(tmp_path):
+    _make_v3_cube_project(tmp_path)
+    cubes_dir = tmp_path / "cubes"
+    cubes_dir.mkdir()
+    (cubes_dir / "bad_hierarchy_type.yml").write_text(
+        "name: order_metrics\n"
+        "base_object: orders\n"
+        "measures: [{name: c, expression: 'COUNT(*)', type: BIGINT}]\n"
+        "dimensions: [{name: status, expression: o_orderstatus, type: VARCHAR}]\n"
+        "hierarchies:\n"
+        "  drill: [status, {bad: level}]\n"
+    )
+    errors = validate_project(tmp_path)
+    assert any("hierarchy levels must be strings" in e.message for e in errors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_context.py` around lines 876 - 889, Add a
regression test that ensures malformed hierarchy levels (non-string types) are
reported as validation errors instead of causing a crash: extend or create a
test similar to test_validate_cube_bad_hierarchy that writes a cube YAML where a
hierarchy level is a non-string value (e.g., a list or dict) and then calls
validate_project (the same function used in the diff) and asserts that the
returned errors include a message referencing the bad level (e.g., check
any("nonexistent_dim" or an appropriate indicator in e.message for e in
errors)); ensure the test targets the hierarchy validation path so
validate_project handles the non-string level gracefully.
core/wren/tests/unit/test_memory.py (1)

205-259: ⚡ Quick win

Add malformed cube-shape coverage for extract/describe.

The current suite is strong on happy paths; adding one malformed-shape case would guard against runtime regressions in cube parsing.

Proposed test addition
+    def test_extract_tolerates_malformed_cube_entries(self):
+        malformed = {
+            **_CUBE_MANIFEST,
+            "cubes": [
+                {
+                    "name": "order_metrics",
+                    "baseObject": "orders",
+                    "measures": ["not-an-object"],
+                    "dimensions": [123],
+                    "timeDimensions": [{"name": "created_at", "expression": "o_orderdate"}],
+                    "hierarchies": {"time_drill": ["created_at", {"bad": "node"}]},
+                }
+            ],
+        }
+        items = extract_schema_items(malformed)
+        assert any(i["item_type"] == "cube" for i in items)
+
+    def test_describe_schema_tolerates_malformed_hierarchy_levels(self):
+        malformed = {
+            **_CUBE_MANIFEST,
+            "cubes": [{**_CUBE_MANIFEST["cubes"][0], "hierarchies": {"time_drill": ["created_at", 1]}}],
+        }
+        text = describe_schema(malformed)
+        assert "### Cube: order_metrics (base: orders)" in text
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_memory.py` around lines 205 - 259, Add a unit test
that supplies a deliberately malformed cube-shaped manifest (e.g., missing
"cubes" entry, a cube with missing "measures" or malformed measure entries, or a
cube with non-dict children) to extract_schema_items and describe_schema and
assert the functions do not raise and either skip the malformed cube or return a
safe/empty representation; specifically create a new test similar to
TestCubeSchemaItems that calls extract_schema_items(_MALFORMED_CUBE_MANIFEST)
and asserts no items with item_type in
{"cube","measure","cube_dimension","time_dimension"} are produced (or that
malformed entries are excluded), and call
describe_schema(_MALFORMED_CUBE_MANIFEST) to assert it returns a string without
throwing (optionally contains an explicit error or notice text if your
implementation emits one). Ensure the test references extract_schema_items and
describe_schema and uses a local _MALFORMED_CUBE_MANIFEST fixture constructed in
the test file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/context.py`:
- Around line 997-1005: The validation error message in context.py uses
camelCase "baseObject" which mismatches the YAML field name; update the
ValidationError invocation that builds the message f"{src_path} > {name}" ...
f"baseObject '{base}' is not a defined model or view" (and the earlier "cube
missing 'baseObject'") to use snake_case "base_object" instead so users see the
YAML field name they edit (refer to the ValidationError calls around the
variables base, name, src_path and all_entity_names).
- Around line 1022-1038: The loop over hierarchies may perform "level not in
known_dims" on non-string values causing TypeError; in the block that iterates
hierarchies (use symbols hierarchies, hname, levels, level, known_dims, errors),
first guard that each level is a string (e.g., isinstance(level, str)); if it is
not, append a validation error to errors describing the invalid hierarchy level
type (include hname and the offending level) and continue, otherwise perform the
membership check and append the existing "unknown dimension" error when
appropriate.

In `@core/wren/src/wren/cube_cli.py`:
- Around line 110-119: The helper _load_cube_query_from currently calls
json.loads on stdin or a file without handling malformed JSON; wrap the
json.loads calls in try/except catching json.JSONDecodeError (from json) and
when caught call typer.echo with a clear "Error: malformed JSON in CubeQuery:
<error msg>" to stderr and raise typer.Exit(1); apply the same pattern to the
other JSON-parsing sites referenced (the other functions that call json.loads at
the same module lines), so all malformed JSON returns a user-friendly error and
exit code instead of a traceback.
- Around line 59-63: The parser that handles filter values in cube_cli.py
currently sets f["value"] = [] when op is "in" or "not_in" (using variables op,
raw, f) which allows empty lists to proceed; change that block so after
computing vals = [v.strip() for v in raw.split(",") if v.strip()] you validate
that vals is non-empty and, if empty, raise a CLI validation error (e.g.,
argparse.ArgumentTypeError or the CLI framework's appropriate error) with a
clear message like "filter '<field>:in:' requires at least one value"; otherwise
set f["value"] = vals.

In `@core/wren/src/wren/memory/schema_indexer.py`:
- Around line 125-175: In _describe_cube, guard against malformed shapes by
checking types before accessing dict keys: confirm cube is a dict, and for
measures/dimensions/timeDimensions only iterate items that are dicts (skip
otherwise); when reading fields (name, expression, type) coerce to str or use
safe defaults instead of assuming .get exists; for hierarchies ensure it's a
dict and that each levels value is a list, and include only stringifiable level
entries when joining (skip non-strings); apply the same defensive checks and
safe string coercions to the other similar describer blocks referenced (lines
208-217, 328-429) so malformed YAML won't raise AttributeError/TypeError.

---

Nitpick comments:
In `@core/wren/tests/unit/test_context.py`:
- Around line 876-889: Add a regression test that ensures malformed hierarchy
levels (non-string types) are reported as validation errors instead of causing a
crash: extend or create a test similar to test_validate_cube_bad_hierarchy that
writes a cube YAML where a hierarchy level is a non-string value (e.g., a list
or dict) and then calls validate_project (the same function used in the diff)
and asserts that the returned errors include a message referencing the bad level
(e.g., check any("nonexistent_dim" or an appropriate indicator in e.message for
e in errors)); ensure the test targets the hierarchy validation path so
validate_project handles the non-string level gracefully.

In `@core/wren/tests/unit/test_memory.py`:
- Around line 205-259: Add a unit test that supplies a deliberately malformed
cube-shaped manifest (e.g., missing "cubes" entry, a cube with missing
"measures" or malformed measure entries, or a cube with non-dict children) to
extract_schema_items and describe_schema and assert the functions do not raise
and either skip the malformed cube or return a safe/empty representation;
specifically create a new test similar to TestCubeSchemaItems that calls
extract_schema_items(_MALFORMED_CUBE_MANIFEST) and asserts no items with
item_type in {"cube","measure","cube_dimension","time_dimension"} are produced
(or that malformed entries are excluded), and call
describe_schema(_MALFORMED_CUBE_MANIFEST) to assert it returns a string without
throwing (optionally contains an explicit error or notice text if your
implementation emits one). Ensure the test references extract_schema_items and
describe_schema and uses a local _MALFORMED_CUBE_MANIFEST fixture constructed in
the test file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: eb3df530-d0ad-47d5-a523-f8d6794f4ce5

📥 Commits

Reviewing files that changed from the base of the PR and between 2ace7cc and 3b73847.

📒 Files selected for processing (11)
  • core/wren-core-py/src/cube.rs
  • core/wren-core-py/src/lib.rs
  • core/wren-core-py/tests/test_cube.py
  • core/wren/src/wren/cli.py
  • core/wren/src/wren/context.py
  • core/wren/src/wren/context_cli.py
  • core/wren/src/wren/cube_cli.py
  • core/wren/src/wren/memory/schema_indexer.py
  • core/wren/tests/unit/test_context.py
  • core/wren/tests/unit/test_cube_cli.py
  • core/wren/tests/unit/test_memory.py

Comment thread core/wren/src/wren/context.py Outdated
Comment thread core/wren/src/wren/context.py
Comment thread core/wren/src/wren/cube_cli.py
Comment thread core/wren/src/wren/cube_cli.py
Comment thread core/wren/src/wren/memory/schema_indexer.py
* `context validate` now uses the YAML field name (`base_object`) in
  error messages so users edit the same identifier they see.
* Hierarchy-level type check rejects non-string entries up front, avoiding
  a TypeError when nested lists / null leak through YAML.
* `wren cube query --filter dim:in:` (empty value list) is now rejected
  with a CLI BadParameter instead of producing an empty `IN ()` clause.
* `--from <file|->` and MDL JSON loading wrap json.loads in try/except
  so malformed input surfaces as `Error: ...` + exit 1 rather than a
  raw traceback. The JSON must also be an object.
* Memory schema indexer (`extract_schema_items` / `describe_schema`)
  skips non-dict cube/measure/dimension entries and non-string hierarchy
  levels instead of crashing — keeps memory rebuilds resilient.

Adds regression tests for each of the five paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant