feat(wren-core): add cube_query_to_sql translator#2265
Conversation
Port the wren-engine-saas Python cube SQL translator (ibis-server app/mdl/cube.py) to Rust so the CLI (via PyO3) and the upcoming WASM build share one implementation. The new module accepts a CubeQuery (JSON/serde) and emits SELECT … GROUP BY SQL that references the cube's baseObject; wren-core MDL analysis then resolves the underlying model or view as usual. Includes 31 unit tests transcribed from saas tests/mdl/test_cube.py covering SELECT/FROM, time dimensions, derived-measure inlining, filters, validation errors, JSON deserialization, and quote_value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
core/wren-core/core/src/mdl/mod.rs (2)
15-15: 💤 Low valueConsider placing re-exports after module declarations.
The
pub use cube::{...}appears beforepub mod cube;(line 42). While Rust allows forward references, the conventional pattern is to declare modules before re-exporting from them for better readability.♻️ Suggested reordering
Move this line to after line 42 (the
pub mod cube;declaration) to follow the pattern used elsewhere in the file where re-exports follow their module declarations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren-core/core/src/mdl/mod.rs` at line 15, Move the re-export statement pub use cube::{cube_query_to_sql, CubeQuery}; so it appears after the module declaration pub mod cube; (i.e., place the pub use below the pub mod cube; line) to follow the file's convention of declaring modules before re-exporting their items.
42-42: ⚡ Quick winClarify intended API surface for the cube module.
The cube module exposes 8 public items (CubeQuery, TimeDimensionFilter, Granularity, CubeFilter, FilterOperator, FilterValue, cube_query_to_sql, quote_value), but only 2 are re-exported at the crate level (cube_query_to_sql and CubeQuery). This creates inconsistent API visibility.
Either:
- Make the module
pub(crate) mod cube;and rely solely on re-exports (aligning with the dataset pattern):-pub mod cube; +pub(crate) mod cube;
- Or accept that all 8 items are part of the public API and potentially add missing re-exports if needed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren-core/core/src/mdl/mod.rs` at line 42, The cube module currently exposes many public items but only re-exports CubeQuery and cube_query_to_sql at crate level; pick one approach: either change the module declaration to pub(crate) mod cube so only the intended crate-level API (CubeQuery and cube_query_to_sql) is visible, or keep pub mod cube and add explicit crate-level re-exports for the other public types/functions (pub use crate::mdl::cube::{TimeDimensionFilter, Granularity, CubeFilter, FilterOperator, FilterValue, quote_value, ...}) so the public API is consistent; update the mod declaration or add the missing pub use lines accordingly and ensure CubeQuery and cube_query_to_sql remain re-exported.core/wren-core/core/src/mdl/cube.rs (2)
207-254: ⚡ Quick winResolve only the measures the query needs.
resolve_measureswalks the entiremeasure_map(every measure defined on the cube). A single broken or cyclic derived measure that the current query does not reference will fail every query against the cube with a "possible cycle" error, even though the offending measure is unused. Restrict resolution to the transitive closure ofquery.measuresto make failures local to what the query actually requests.♻️ Sketch
fn resolve_measures( requested: &[String], measure_map: &HashMap<&str, &Measure>, ) -> Result<HashMap<String, String>> { // Seed `remaining` with the transitive closure of `requested` // by BFS over find_measure_refs, then run the existing fixpoint loop. }Then pass
&query.measuresfromcube_query_to_sql.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren-core/core/src/mdl/cube.rs` around lines 207 - 254, The current resolve_measures(measure_map: &HashMap<&str,&Measure>) traverses every measure and fails on unused cycles; change its signature to accept the requested measures (e.g. resolve_measures(requested: &[String], measure_map: &HashMap<&str,&Measure>) ) and compute the transitive closure of requested by BFS using find_measure_refs to collect only measures the query actually needs, seed remaining with that closure, then run the existing fixpoint substitution loop exactly as before on that restricted set; finally update the call site in cube_query_to_sql to pass &query.measures (or an appropriate slice of measure names) into resolve_measures.
256-272: 💤 Low valueRecompiling regex per measure on every call.
find_measure_refscompiles a freshRegexfor each candidate name on every invocation, and the function itself is called once per remaining measure in each iteration ofresolve_measures's fixpoint loop. For typical cube sizes this is negligible, but caching the compiled patterns (e.g. build aVec<(name, Regex)>once inresolve_measuresand reuse it) is a straightforward win if cubes grow.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren-core/core/src/mdl/cube.rs` around lines 256 - 272, find_measure_refs currently recompiles a Regex for each candidate name on every call; instead, precompile the patterns once in resolve_measures and reuse them: build a Vec<(name: &str, regex: Regex)> (or HashMap<&str, Regex>) in resolve_measures before the fixpoint loop, then change find_measure_refs (or add a helper) to accept the precompiled collection and match using the stored Regexes rather than calling Regex::new each time; update calls from resolve_measures to pass the precompiled patterns so regex compilation is done only once per measure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core/core/src/mdl/cube.rs`:
- Around line 222-236: The replacement string passed to Regex::replace_all in
the loop (using variables resolved_expr, sorted_deps, dep, replacement and re)
must be wrapped with regex::NoExpand to avoid `$`-style template expansion
corrupting SQL (e.g., `$1`, `$$tag$$`); change the call from
re.replace_all(&resolved_expr, replacement.as_str()) to using
re.replace_all(&resolved_expr, regex::NoExpand(replacement.as_str())) (adjust
imports if needed) so replacements are treated literally and then call
.into_owned() as before.
---
Nitpick comments:
In `@core/wren-core/core/src/mdl/cube.rs`:
- Around line 207-254: The current resolve_measures(measure_map:
&HashMap<&str,&Measure>) traverses every measure and fails on unused cycles;
change its signature to accept the requested measures (e.g.
resolve_measures(requested: &[String], measure_map: &HashMap<&str,&Measure>) )
and compute the transitive closure of requested by BFS using find_measure_refs
to collect only measures the query actually needs, seed remaining with that
closure, then run the existing fixpoint substitution loop exactly as before on
that restricted set; finally update the call site in cube_query_to_sql to pass
&query.measures (or an appropriate slice of measure names) into
resolve_measures.
- Around line 256-272: find_measure_refs currently recompiles a Regex for each
candidate name on every call; instead, precompile the patterns once in
resolve_measures and reuse them: build a Vec<(name: &str, regex: Regex)> (or
HashMap<&str, Regex>) in resolve_measures before the fixpoint loop, then change
find_measure_refs (or add a helper) to accept the precompiled collection and
match using the stored Regexes rather than calling Regex::new each time; update
calls from resolve_measures to pass the precompiled patterns so regex
compilation is done only once per measure.
In `@core/wren-core/core/src/mdl/mod.rs`:
- Line 15: Move the re-export statement pub use cube::{cube_query_to_sql,
CubeQuery}; so it appears after the module declaration pub mod cube; (i.e.,
place the pub use below the pub mod cube; line) to follow the file's convention
of declaring modules before re-exporting their items.
- Line 42: The cube module currently exposes many public items but only
re-exports CubeQuery and cube_query_to_sql at crate level; pick one approach:
either change the module declaration to pub(crate) mod cube so only the intended
crate-level API (CubeQuery and cube_query_to_sql) is visible, or keep pub mod
cube and add explicit crate-level re-exports for the other public
types/functions (pub use crate::mdl::cube::{TimeDimensionFilter, Granularity,
CubeFilter, FilterOperator, FilterValue, quote_value, ...}) so the public API is
consistent; update the mod declaration or add the missing pub use lines
accordingly and ensure CubeQuery and cube_query_to_sql remain re-exported.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 46d60ebd-85c7-4178-a332-b89c8b24e87c
📒 Files selected for processing (2)
core/wren-core/core/src/mdl/cube.rscore/wren-core/core/src/mdl/mod.rs
Wrap the Regex::replace_all replacement in regex::NoExpand so that strings like `$1` (Postgres parameter placeholders) and `$$tag$$` (dollar-quoted literals) survive measure inlining instead of being silently consumed as capture-group templates. Adds a regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
ibis-server/app/mdl/cube.py) to Rust so thewrenCLI (via PyO3) and the upcoming WASM build share one implementation.core/wren-core/core/src/mdl/cube.rsexposesCubeQuery,cube_query_to_sql(&CubeQuery, &Manifest) -> Result<String>, plus the supportingTimeDimensionFilter/Granularity/CubeFilter/FilterOperator/FilterValuetypes with camelCase / snake_case serde to match saas JSON.dateRangeJSON shape["start","end"]is accepted via a custom deserializer that rejects lists of length ≠ 2.Test plan
RUST_MIN_STACK=8388608 cargo test --lib --tests --bins— 115 tests pass (31 new + 84 prior)cargo clippy --all-targets --all-features -- -D warnings— cleancargo fmt --all -- --check— cleantests/mdl/test_cube.pycovering SELECT/FROM, time dimensions, derived-measure inlining, filters, validation errors, JSON deserialization, andquote_valueNotes
feat/wasm-cubeper the Phase C plan; Phase B (feat(wren-core): validate cubes in AnalyzedWrenMDL::analyze #2264) already lives on that branch.🤖 Generated with Claude Code
Summary by CodeRabbit