feat(wasm): full Cube support — validate, translate, PyO3, CLI, WASM, docs#2282
Conversation
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds Browser/WebAssembly SDK docs and examples: updates SDK overview, introduces a full wasm SDK guide (install, Quickstart modes, API reference for registerCsv/cubeQuery/listCubes), integration patterns, demos, troubleshooting, compatibility, and limitations. ChangesWASM SDK docs & examples
Estimated code review effort 🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
core/wren-mdl/mdl.schema.json (1)
224-265: 💤 Low valueVerbose but explicit enum covers both uppercase and lowercase data sources.
The enum explicitly lists both uppercase canonical form and lowercase aliases for all 24 data sources (48 total values). While verbose, this ensures exact validation and clear documentation of accepted values. The duplication is a tradeoff for explicitness.
If JSON Schema supported case-insensitive enums, a single list would suffice. Given that limitation, the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren-mdl/mdl.schema.json` around lines 224 - 265, The enum under the data source type currently duplicates each provider in both uppercase and lowercase; remove the lowercase entries and keep only the canonical uppercase names in the "enum" array, and add a single case-insensitive regex check to allow lowercase inputs by adding a "pattern" (e.g. "(?i)^(?:BIGQUERY|CLICKHOUSE|CANNER|TRINO|MSSQL|MYSQL|DORIS|POSTGRES|SNOWFLAKE|DATAFUSION|DUCKDB|LOCAL_FILE|S3_FILE|GCS_FILE|MINIO_FILE|ORACLE|ATHENA|REDSHIFT|DATABRICKS|SPARK)$") alongside the enum (using "anyOf" or "allOf") so validation accepts any case while keeping documentation and the canonical enum values intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core-wasm/examples/cube-explorer.html`:
- Around line 201-202: The code uses label.innerHTML to insert unescaped
cube/data-driven values (id, item.name, item.type) which risks XSS; replace the
innerHTML interpolation with safe DOM construction and textContent assignments:
create the checkbox input element with its id/value set via setAttribute or
element.id/value, append a text node for item.name and a span element for
item.type (setting span.textContent and style via style.*), then append those
nodes to container (keep using container.appendChild(label)). Apply the same
replacement pattern for the other occurrences that interpolate untrusted data
(the blocks at ~216-226, 295-297, 317, 386) so all dynamic values use
textContent/element properties instead of innerHTML.
In `@core/wren-core-wasm/examples/cube-quickstart.html`:
- Around line 50-55: The renderTable function and the nearby error/result
rendering inject unescaped HTML into the page; update renderTable to build DOM
elements or escape all cell values before concatenating (use
document.createElement for table/thead/tbody/tr/th/td and set textContent) and
change the error/result rendering (the code around the current error/result
innerHTML assignments) to assign user-provided strings via textContent or
escaped text rather than innerHTML so table cell values and error strings cannot
inject scripts.
In `@core/wren-core-wasm/README.md`:
- Line 106: Update the README table row for "CDN smoke test" to replace the term
"published wheel" with a more accurate npm/browser term such as "published
package" or "published bundle"; locate the table row containing "CDN smoke test
| http://localhost:8787/examples/test-cdn.html | Loading the published wheel
from unpkg" and change the trailing cell to read "Loading the published package
from unpkg" (or "published bundle") so the artifact terminology reflects the
JavaScript/npm distribution rather than a Python wheel.
In `@core/wren-core/core/src/mdl/cube.rs`:
- Around line 162-200: The validate_query function currently only checks that
referenced measures/dimensions exist but doesn't ensure the query projects
anything; if query.measures, query.dimensions, and query.time_dimensions are all
empty it will produce invalid SQL like "SELECT FROM ...". Update validate_query
(taking CubeQuery as input) to check that at least one of query.measures,
query.dimensions, or query.time_dimensions is non-empty and return a
plan_err!("Query must select at least one measure, dimension, or timeDimension
in cube '{}'", query.cube) (or similar) when all three are empty so invalid
SELECTs are rejected early.
- Around line 353-357: The date_range bounds (td_filter.date_range -> start/end)
are interpolated directly into SQL via col and where_parts, allowing
quote-breaking input; change this to avoid raw interpolation by either using
parameterized query placeholders or escaping the bounds before embedding:
replace the direct format!() uses that push "{col} >= '{start}'" / "{col} <
'{end}'" with code that binds start/end as SQL parameters or runs a safe escape
function on start/end (e.g., replace single quote with doubled single quote) so
the values cannot break the SQL string while still using the same
time_dim_map[col].expression and where_parts push locations.
- Around line 398-427: In filter_to_sql, add explicit validation for
operator/value shapes: for operators that require a scalar value
(FilterOperator::Eq, Neq, Gt, Gte, Lt, Lte) return plan_err! when f.value is
None instead of generating e.g. "col = NULL"; check the
Some(FilterValue::Scalar(_)) shape (or equivalent) before calling quote_value;
for FilterOperator::In and FilterOperator::NotIn ensure f.value is
Some(FilterValue::List(items)) and that items is non-empty, returning plan_err!
on missing or empty lists rather than emitting "IN ()"/"NOT IN ()"; keep using
quote_value when mapping item values and preserve existing match arms in
filter_to_sql to locate changes.
In `@docs/core/guides/modeling/cube.md`:
- Around line 117-121: The two statements conflict: the phrase "Only the
transitive closure of measures actually requested by the query is resolved"
should be clarified to distinguish runtime substitution behavior from schema
validation during MDL analysis; update the text to explicitly say that while
substitution and resolution at query time only walk the transitive closure of
requested measures, cycle detection for measures is performed earlier during MDL
analysis and will cause the cube to be rejected/validated as invalid even if the
cycle is not referenced by a particular query. Replace or augment the phrases
"transitive closure of measures actually requested by the query" and "cycle
validation happens during MDL analysis" with a single clear sentence that
describes both behaviors and their order (MDL validation first, query-time
resolution second).
---
Nitpick comments:
In `@core/wren-mdl/mdl.schema.json`:
- Around line 224-265: The enum under the data source type currently duplicates
each provider in both uppercase and lowercase; remove the lowercase entries and
keep only the canonical uppercase names in the "enum" array, and add a single
case-insensitive regex check to allow lowercase inputs by adding a "pattern"
(e.g.
"(?i)^(?:BIGQUERY|CLICKHOUSE|CANNER|TRINO|MSSQL|MYSQL|DORIS|POSTGRES|SNOWFLAKE|DATAFUSION|DUCKDB|LOCAL_FILE|S3_FILE|GCS_FILE|MINIO_FILE|ORACLE|ATHENA|REDSHIFT|DATABRICKS|SPARK)$")
alongside the enum (using "anyOf" or "allOf") so validation accepts any case
while keeping documentation and the canonical enum values intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 36bdc201-308a-49ea-ae3f-6aadfdf890ae
📒 Files selected for processing (36)
README.mdcore/wren-core-py/src/cube.rscore/wren-core-py/src/lib.rscore/wren-core-py/tests/test_cube.pycore/wren-core-wasm/AGENT_GUIDE.mdcore/wren-core-wasm/README.mdcore/wren-core-wasm/examples/cube-explorer.htmlcore/wren-core-wasm/examples/cube-quickstart.htmlcore/wren-core-wasm/examples/serve.mjscore/wren-core-wasm/sdk/src/index.tscore/wren-core-wasm/sdk/src/wren_core_wasm.d.tscore/wren-core-wasm/sdk/tests/index.test.mjscore/wren-core-wasm/src/lib.rscore/wren-core/core/src/mdl/cube.rscore/wren-core/core/src/mdl/lineage.rscore/wren-core/core/src/mdl/mod.rscore/wren-mdl/mdl.schema.jsoncore/wren/README.mdcore/wren/src/wren/cli.pycore/wren/src/wren/context.pycore/wren/src/wren/context_cli.pycore/wren/src/wren/cube_cli.pycore/wren/src/wren/memory/schema_indexer.pycore/wren/tests/unit/test_context.pycore/wren/tests/unit/test_cube_cli.pycore/wren/tests/unit/test_memory.pydocs/core/get_started/quickstart.mddocs/core/guides/modeling/cube.mddocs/core/guides/modeling/overview.mddocs/core/reference/cli.mdskills/index.jsonskills/versions.jsonskills/wren-generate-mdl/SKILL.mdskills/wren-usage/SKILL.mdskills/wren-usage/references/memory.mdskills/wren-usage/references/wren-sql.md
- Escape dateRange bounds via quote_string() to defeat SQL injection (wren-core/cube.rs). - Reject empty CubeQuery projections (no measures/dimensions/timeDimensions) that would emit invalid SELECT. - Reject invalid filter shapes per operator: scalar ops require a scalar value (not None, not list); in/not_in require a non-empty list. - Escape dynamic HTML in cube-quickstart.html and cube-explorer.html to prevent XSS in the WASM demo pages. - README: "published wheel" → "published package" (npm artifact, not Python). - docs/cube.md: clarify cycle detection runs at MDL analyze time even though runtime substitution walks only the requested measures' closure. Adds 6 new tests covering the validation paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core-wasm/src/lib.rs`:
- Around line 244-249: register_csv currently proceeds to create a MemTable from
the collected `batches` without validating they contain data; add the same
empty-batches guard used in `register_json` and `register_parquet` by checking
the collected `batches` (and/or that all batches have zero rows) after
`reader.collect::<Result<Vec<_>, _>>()` and return a `JsError::new("Empty CSV:
no batches found")` (or similar) before calling `MemTable::try_new` so empty
CSVs are rejected early; reference `register_csv`, the `batches` variable, and
`MemTable::try_new` when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7f2ec492-92d1-4d9c-bd51-60671d5b4aa4
⛔ Files ignored due to path filters (3)
core/wren-core-wasm/examples/data/orders.csvis excluded by!**/*.csvcore/wren-core-wasm/examples/data/products.tsvis excluded by!**/*.tsvcore/wren-core-wasm/examples/data/region_targets.csvis excluded by!**/*.csv
📒 Files selected for processing (9)
core/wren-core-wasm/AGENT_GUIDE.mdcore/wren-core-wasm/Cargo.tomlcore/wren-core-wasm/README.mdcore/wren-core-wasm/examples/csv-quickstart.htmlcore/wren-core-wasm/examples/serve.mjscore/wren-core-wasm/sdk/src/index.tscore/wren-core-wasm/sdk/src/wren_core_wasm.d.tscore/wren-core-wasm/sdk/tests/index.test.mjscore/wren-core-wasm/src/lib.rs
✅ Files skipped from review due to trivial changes (2)
- core/wren-core-wasm/Cargo.toml
- core/wren-core-wasm/AGENT_GUIDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- core/wren-core-wasm/sdk/src/wren_core_wasm.d.ts
Mirror the empty-batches guard already in register_json (line 107) and register_parquet (line 145). A header-only CSV or a blank file would otherwise create a MemTable with zero batches, which leads to confusing downstream behaviour. Caught by CodeRabbit on PR #2282. Adds a Rust wasm-bindgen test plus a Node SDK test covering the header-only case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # docs/core/sdk/overview.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/core/sdk/wasm.md (1)
49-49: 💤 Low valueConsider using a version placeholder or latest.
The hardcoded version
@0.3.0will become outdated as new releases are published. Consider using@latestor a placeholder like@VERSION, and add a note directing users to check npm for the current version.📝 Suggested alternatives
Option 1 - Use latest:
- import { WrenEngine } from 'https://unpkg.com/@wrenai/wren-core-wasm@0.3.0/dist/index.js'; + import { WrenEngine } from 'https://unpkg.com/@wrenai/wren-core-wasm@latest/dist/index.js';Option 2 - Add a note:
import { WrenEngine } from 'https://unpkg.com/@wrenai/wren-core-wasm@0.3.0/dist/index.js'; + // Check https://www.npmjs.com/package/@wrenai/wren-core-wasm for the latest version </script>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/core/sdk/wasm.md` at line 49, The import line using a hardcoded version for WrenEngine (import { WrenEngine } from 'https://unpkg.com/@wrenai/wren-core-wasm@0.3.0/dist/index.js';) should be updated to use a version placeholder or `@latest` to avoid becoming outdated; change the module specifier to use '@latest' or a template like '@VERSION' and add a short note below explaining users should check npm (or the package changelog) for the current release and replace the placeholder accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/core/sdk/wasm.md`:
- Line 85: The MDL example includes an invalid field "metrics" which isn't
defined in mdl.schema.json (schema uses additionalProperties: false); remove the
"metrics: []" entry from the example so the object only contains allowed keys
(e.g., keep relationships and views but drop metrics) to prevent validation
errors.
---
Nitpick comments:
In `@docs/core/sdk/wasm.md`:
- Line 49: The import line using a hardcoded version for WrenEngine (import {
WrenEngine } from
'https://unpkg.com/@wrenai/wren-core-wasm@0.3.0/dist/index.js';) should be
updated to use a version placeholder or `@latest` to avoid becoming outdated;
change the module specifier to use '@latest' or a template like '@VERSION' and
add a short note below explaining users should check npm (or the package
changelog) for the current release and replace the placeholder accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 191b22a2-abaf-4cc0-8708-3f0364072250
📒 Files selected for processing (3)
docs/core/guides/modeling/cube.mddocs/core/sdk/overview.mddocs/core/sdk/wasm.md
✅ Files skipped from review due to trivial changes (1)
- docs/core/guides/modeling/cube.md
PR #2281 renamed the top-level `metrics` array to `cubes` and the JSON schema is `additionalProperties: false`, so a stray `metrics: []` fails schema validation (the Rust deserializer silently ignores unknown fields, which is why nothing broke at runtime). Sweep every MDL example in docs, examples/, README, AGENT_GUIDE, and test fixtures to match the current schema. CodeRabbit on PR #2282. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
This PR lands complete Cube support across the Wren Engine stack — from Rust core to Python CLI to WASM browser SDK — plus JSON Schema alignment and full user-facing documentation.
Phases included
validate_cubes()inAnalyzedWrenMDL::analyze— cycle detection for derived measures (self-ref + transitive), orphanedbaseObject, duplicate namescube_query_to_sql()inwren-core— resolves measures (transitive closure), time-dimension granularity (DATE_TRUNC/DATE_FORMAT), filters,ORDER BY,LIMITregex::NoExpandfor$in SQL, pre-compiled patterns map,pub(crate)module visibility#[pyfunction] cube_query_to_sqlbinding;wren cube list/describe/queryTyper CLI; LanceDB memory indexing for cubes/measures/dimensionsWrenEngine.cubeQuery()+WrenEngine.listCubes()+ TypeScript types; interactive HTML exampleswren-usageskill v2.3 (cube workflow + aggregation decision tree);docs/core/guides/modeling/cube.md; CLI reference section; README quick-startmdl.schema.jsonalignment:metrics→cubes,measure/cubeDimension/timeDimension$defs,layoutVersion, dual-casedataSourceenumKey capabilities shipped
wren cube list— table of cubes with measure/dimension countswren cube describe <name>— full measure/dimension/time-dimension detailswren cube query --cube … --measures … --dimensions … --time-dimension … --filter …— structured pre-aggregation query with granularity + filters, no manualGROUP BYcubeQuery()/listCubes()— WASM browser API for the same functionalityload_mdltimewren memory indexnow includes cubes for semantic schema retrievalBreaking changes
mdl.schema.json:metricsarray removed; replaced bycubeswithmeasures,dimensions,timeDimensions,hierarchies. Consumers validating against the schema must migrate.dataSourceenum now accepts both UPPERCASE and lowercase aliases (backward-compatible in Rust via serde aliases; schema now reflects both).Test plan
cargo test --lib --tests --binsincore/wren-core— 9 cube validation tests + 33 cube translator tests passjust test-pyincore/wren-core-py— PyO3 binding smoke tests passjust testincore/wren— unit tests including cube CLI + context validation passwren cube list / describe / queryagainst a real DuckDB projectcore/wren-core-wasm/examples/cube-quickstart.htmlandcube-explorer.htmlin browserwren context validateRelated PRs (merged into
feat/wasm-cube)#2264 · #2265 · #2276 · #2277 · #2278 · #2280 · #2281
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores