feat: cube PyO3 binding + wren cube CLI + memory indexing#2277
Conversation
PyO3 binding for the Rust cube translator added in PR #2265. The function takes CubeQuery JSON and manifest JSON, returns the generated SQL, and raises ValueError on bad input or translation errors (unknown cube/measure/dimension, cyclic derived measures, …). Includes 8 pytest cases covering the basic happy path plus error modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
context build now picks up cube definitions from <project>/cubes/*.yml and includes them in the generated manifest (snake_case for build_manifest, camelCase via build_json for the engine). context init scaffolds an empty cubes/ directory next to models/ and views/, and context validate gains structural checks for cube name uniqueness, baseObject reference, and hierarchy levels — deep semantic checks (cycle detection, measure resolution) still happen Rust-side at analyze time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three commands under `wren cube`:
* list — show all cubes and their measures/dimensions
* describe NAME — pretty-print the full cube schema (JSON)
* query — build a CubeQuery from CLI flags (--cube,
--measures, --dimensions, --time-dimension,
--filter, --limit, --offset) or load it from JSON
via --from <file|->, translate it to SQL through
the new wren_core.cube_query_to_sql binding, then
run it through WrenEngine. --sql-only prints the
generated SQL without executing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extract_schema_items now emits records for cubes, measures, cube dimensions, and time dimensions so semantic search surfaces them alongside models and views. describe_schema gains a cube section that prints measures (with expressions), dimensions, time dimensions, and hierarchies. Both walks tolerate manifests without a cubes key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
core/wren/tests/unit/test_context.py (1)
876-889: ⚡ Quick winAdd a regression test for non-string hierarchy levels.
Given the hierarchy validation path, a malformed level entry (e.g., object/list) should be reported as a validation error rather than crashing. A targeted test here would lock that behavior.
Proposed test addition
+def test_validate_cube_hierarchy_level_type(tmp_path): + _make_v3_cube_project(tmp_path) + cubes_dir = tmp_path / "cubes" + cubes_dir.mkdir() + (cubes_dir / "bad_hierarchy_type.yml").write_text( + "name: order_metrics\n" + "base_object: orders\n" + "measures: [{name: c, expression: 'COUNT(*)', type: BIGINT}]\n" + "dimensions: [{name: status, expression: o_orderstatus, type: VARCHAR}]\n" + "hierarchies:\n" + " drill: [status, {bad: level}]\n" + ) + errors = validate_project(tmp_path) + assert any("hierarchy levels must be strings" in e.message for e in errors)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/unit/test_context.py` around lines 876 - 889, Add a regression test that ensures malformed hierarchy levels (non-string types) are reported as validation errors instead of causing a crash: extend or create a test similar to test_validate_cube_bad_hierarchy that writes a cube YAML where a hierarchy level is a non-string value (e.g., a list or dict) and then calls validate_project (the same function used in the diff) and asserts that the returned errors include a message referencing the bad level (e.g., check any("nonexistent_dim" or an appropriate indicator in e.message for e in errors)); ensure the test targets the hierarchy validation path so validate_project handles the non-string level gracefully.core/wren/tests/unit/test_memory.py (1)
205-259: ⚡ Quick winAdd malformed cube-shape coverage for extract/describe.
The current suite is strong on happy paths; adding one malformed-shape case would guard against runtime regressions in cube parsing.
Proposed test addition
+ def test_extract_tolerates_malformed_cube_entries(self): + malformed = { + **_CUBE_MANIFEST, + "cubes": [ + { + "name": "order_metrics", + "baseObject": "orders", + "measures": ["not-an-object"], + "dimensions": [123], + "timeDimensions": [{"name": "created_at", "expression": "o_orderdate"}], + "hierarchies": {"time_drill": ["created_at", {"bad": "node"}]}, + } + ], + } + items = extract_schema_items(malformed) + assert any(i["item_type"] == "cube" for i in items) + + def test_describe_schema_tolerates_malformed_hierarchy_levels(self): + malformed = { + **_CUBE_MANIFEST, + "cubes": [{**_CUBE_MANIFEST["cubes"][0], "hierarchies": {"time_drill": ["created_at", 1]}}], + } + text = describe_schema(malformed) + assert "### Cube: order_metrics (base: orders)" in text🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/unit/test_memory.py` around lines 205 - 259, Add a unit test that supplies a deliberately malformed cube-shaped manifest (e.g., missing "cubes" entry, a cube with missing "measures" or malformed measure entries, or a cube with non-dict children) to extract_schema_items and describe_schema and assert the functions do not raise and either skip the malformed cube or return a safe/empty representation; specifically create a new test similar to TestCubeSchemaItems that calls extract_schema_items(_MALFORMED_CUBE_MANIFEST) and asserts no items with item_type in {"cube","measure","cube_dimension","time_dimension"} are produced (or that malformed entries are excluded), and call describe_schema(_MALFORMED_CUBE_MANIFEST) to assert it returns a string without throwing (optionally contains an explicit error or notice text if your implementation emits one). Ensure the test references extract_schema_items and describe_schema and uses a local _MALFORMED_CUBE_MANIFEST fixture constructed in the test file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/context.py`:
- Around line 997-1005: The validation error message in context.py uses
camelCase "baseObject" which mismatches the YAML field name; update the
ValidationError invocation that builds the message f"{src_path} > {name}" ...
f"baseObject '{base}' is not a defined model or view" (and the earlier "cube
missing 'baseObject'") to use snake_case "base_object" instead so users see the
YAML field name they edit (refer to the ValidationError calls around the
variables base, name, src_path and all_entity_names).
- Around line 1022-1038: The loop over hierarchies may perform "level not in
known_dims" on non-string values causing TypeError; in the block that iterates
hierarchies (use symbols hierarchies, hname, levels, level, known_dims, errors),
first guard that each level is a string (e.g., isinstance(level, str)); if it is
not, append a validation error to errors describing the invalid hierarchy level
type (include hname and the offending level) and continue, otherwise perform the
membership check and append the existing "unknown dimension" error when
appropriate.
In `@core/wren/src/wren/cube_cli.py`:
- Around line 110-119: The helper _load_cube_query_from currently calls
json.loads on stdin or a file without handling malformed JSON; wrap the
json.loads calls in try/except catching json.JSONDecodeError (from json) and
when caught call typer.echo with a clear "Error: malformed JSON in CubeQuery:
<error msg>" to stderr and raise typer.Exit(1); apply the same pattern to the
other JSON-parsing sites referenced (the other functions that call json.loads at
the same module lines), so all malformed JSON returns a user-friendly error and
exit code instead of a traceback.
- Around line 59-63: The parser that handles filter values in cube_cli.py
currently sets f["value"] = [] when op is "in" or "not_in" (using variables op,
raw, f) which allows empty lists to proceed; change that block so after
computing vals = [v.strip() for v in raw.split(",") if v.strip()] you validate
that vals is non-empty and, if empty, raise a CLI validation error (e.g.,
argparse.ArgumentTypeError or the CLI framework's appropriate error) with a
clear message like "filter '<field>:in:' requires at least one value"; otherwise
set f["value"] = vals.
In `@core/wren/src/wren/memory/schema_indexer.py`:
- Around line 125-175: In _describe_cube, guard against malformed shapes by
checking types before accessing dict keys: confirm cube is a dict, and for
measures/dimensions/timeDimensions only iterate items that are dicts (skip
otherwise); when reading fields (name, expression, type) coerce to str or use
safe defaults instead of assuming .get exists; for hierarchies ensure it's a
dict and that each levels value is a list, and include only stringifiable level
entries when joining (skip non-strings); apply the same defensive checks and
safe string coercions to the other similar describer blocks referenced (lines
208-217, 328-429) so malformed YAML won't raise AttributeError/TypeError.
---
Nitpick comments:
In `@core/wren/tests/unit/test_context.py`:
- Around line 876-889: Add a regression test that ensures malformed hierarchy
levels (non-string types) are reported as validation errors instead of causing a
crash: extend or create a test similar to test_validate_cube_bad_hierarchy that
writes a cube YAML where a hierarchy level is a non-string value (e.g., a list
or dict) and then calls validate_project (the same function used in the diff)
and asserts that the returned errors include a message referencing the bad level
(e.g., check any("nonexistent_dim" or an appropriate indicator in e.message for
e in errors)); ensure the test targets the hierarchy validation path so
validate_project handles the non-string level gracefully.
In `@core/wren/tests/unit/test_memory.py`:
- Around line 205-259: Add a unit test that supplies a deliberately malformed
cube-shaped manifest (e.g., missing "cubes" entry, a cube with missing
"measures" or malformed measure entries, or a cube with non-dict children) to
extract_schema_items and describe_schema and assert the functions do not raise
and either skip the malformed cube or return a safe/empty representation;
specifically create a new test similar to TestCubeSchemaItems that calls
extract_schema_items(_MALFORMED_CUBE_MANIFEST) and asserts no items with
item_type in {"cube","measure","cube_dimension","time_dimension"} are produced
(or that malformed entries are excluded), and call
describe_schema(_MALFORMED_CUBE_MANIFEST) to assert it returns a string without
throwing (optionally contains an explicit error or notice text if your
implementation emits one). Ensure the test references extract_schema_items and
describe_schema and uses a local _MALFORMED_CUBE_MANIFEST fixture constructed in
the test file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eb3df530-d0ad-47d5-a523-f8d6794f4ce5
📒 Files selected for processing (11)
core/wren-core-py/src/cube.rscore/wren-core-py/src/lib.rscore/wren-core-py/tests/test_cube.pycore/wren/src/wren/cli.pycore/wren/src/wren/context.pycore/wren/src/wren/context_cli.pycore/wren/src/wren/cube_cli.pycore/wren/src/wren/memory/schema_indexer.pycore/wren/tests/unit/test_context.pycore/wren/tests/unit/test_cube_cli.pycore/wren/tests/unit/test_memory.py
* `context validate` now uses the YAML field name (`base_object`) in error messages so users edit the same identifier they see. * Hierarchy-level type check rejects non-string entries up front, avoiding a TypeError when nested lists / null leak through YAML. * `wren cube query --filter dim:in:` (empty value list) is now rejected with a CLI BadParameter instead of producing an empty `IN ()` clause. * `--from <file|->` and MDL JSON loading wrap json.loads in try/except so malformed input surfaces as `Error: ...` + exit 1 rather than a raw traceback. The JSON must also be an object. * Memory schema indexer (`extract_schema_items` / `describe_schema`) skips non-dict cube/measure/dimension entries and non-string hierarchy levels instead of crashing — keeps memory rebuilds resilient. Adds regression tests for each of the five paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Phase D + E of the cube rollout (impl-cube-cli.md): expose the Rust cube translator to Python, surface it through the
wrenCLI, and teach the memory indexer about cubes.Phase D —
wren-core-pybinding (#0674865c)cube_query_to_sql(cube_query_json, manifest_json) -> strpyfunction incore/wren-core-py/src/cube.rs. Serde-driven: both inputs are JSON, output is the generated SQL. Bad JSON / unknown cube / cyclic measures all raiseValueError.tests/test_cube.pycovering the happy path plus error modes.Phase E1+E2 — YAML pipeline (#d05a3efe)
core/wren/src/wren/context.pygainsload_cubes(), wires cubes intobuild_manifest()/build_json()(snake → camel), and extendsvalidate_project()with structural cube checks (name uniqueness,baseObjectexists, hierarchy levels reference declared dimensions). Deep validation (measure cycles) still happens Rust-side atanalyzetime.wren context initnow scaffolds an emptycubes/directory.Phase E3+E4+E6 —
wren cubeCLI (#25fcfa1c)core/wren/src/wren/cube_cli.pywith three commands:wren cube listwren cube describe <name>wren cube query— build a CubeQuery from CLI flags (--cube,--measures,--dimensions,--time-dimension name:granularity[:start,end],--filter dim:op[:value],--limit,--offset) or load it from JSON via--from <file|->.--sql-onlyprints the generated SQL without executing.app.add_typer(cube_app)incli.py.Phase E5 — Memory indexing (#3b73847c)
extract_schema_items()now emits records for cubes, measures, cube dimensions, and time dimensions so semantic search surfaces them alongside models and views.describe_schema()gains a cube section printing measures/dimensions/time dimensions/hierarchies.describe_schemaoutput.Test plan
cargo test --lib -p wren-core cube::— 33 cube unit tests pass (no regressions)cd core/wren-core-py && just test-py— 33 pass (8 new + 25 existing)cd core/wren && uv run pytest tests/unit/— 167 pass for cube-related modules (test_cube_cli, test_memory, test_context); a single unrelatedtest_validate_strict_warnscontinues to fail on this branch and onfeat/wasm-cubeHEAD, so the regression is pre-existing.cd core/wren && just lint— cleancd core/wren-core-py && cargo clippy -- -D warnings— clean🤖 Generated with Claude Code
Summary by CodeRabbit
wren cube list,describe, andqueryfor working with cubes in your project.