feat(wren-core-wasm): add registerCsv to load CSV data#2283
Conversation
Mirror registerJson/registerParquet for CSV data. The Rust side accepts
bytes + JSON-encoded options; the TS wrapper accepts string | BufferSource
and serializes the options object for you.
Options (camelCase): header, delimiter, quote, escape, terminator,
batchSize, inferRows, schema. Schema is an optional `[{name, type, nullable?}]`
list that maps to Arrow types (case-insensitive); omit it to infer from the
first 1000 rows. Single-byte options reject non-ASCII inputs to avoid
silently slicing a multi-byte codepoint.
Adds 6 Rust wasm-bindgen tests, 1 native unit test, and 6 Node SDK tests
covering: inferred schema, string + Uint8Array inputs, custom
delimiter/quote, header=false + explicit schema, unknown type rejection,
non-ASCII delimiter rejection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- examples/data/orders.csv — 15-row standard CSV with header - examples/data/products.tsv — 8-row tab-separated file - examples/data/region_targets.csv — 6-row headerless CSV - examples/csv-quickstart.html — demonstrates the three patterns: inferred schema, custom delimiter (TSV), and explicit Arrow schema. Auto-runs an aggregation query against each table. - serve.mjs: add MIME types for .csv / .tsv; advertise the new demo URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core-wasm/README.md`:
- Around line 195-198: The schema type list is missing the supported type
"date32"; update the documentation entry that enumerates Schema column types
(the line listing `date`/`date64`, `timestamp` etc.) to include `date32`
alongside the other date types so the README matches the SDK/Rust implementation
and avoids confusion for users defining CSV schemas.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 290894e7-dd55-4d06-b904-26fe1634d32e
📒 Files selected for processing (7)
core/wren-core-wasm/AGENT_GUIDE.mdcore/wren-core-wasm/Cargo.tomlcore/wren-core-wasm/README.mdcore/wren-core-wasm/sdk/src/index.tscore/wren-core-wasm/sdk/src/wren_core_wasm.d.tscore/wren-core-wasm/sdk/tests/index.test.mjscore/wren-core-wasm/src/lib.rs
- csv-quickstart.html imports the raw `WrenEngine` from `pkg/`, like the other examples, so it must use the raw 3-arg `registerCsv(name, bytes, options_json)` signature. The previous version passed a string + object through the SDK-style overload, which crashed at the bindgen layer with 'undefined is not an object (evaluating arg.length)'. Switched to fetching bytes via fetch().arrayBuffer() and wrapping the raw call in a small registerCsv() helper. - README: add `date32` to the supported CSV schema type list (CodeRabbit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Mirror
registerJson/registerParquetfor CSV. WASM SDK callers can now load CSV tables in-browser without pre-converting to JSON or Parquet.Implementation
core/wren-core-wasm/src/lib.rs):#[wasm_bindgen(js_name = registerCsv)] register_csv(&self, table_name, data: &[u8], options_json: &str). Options deserialize from JSON (deny_unknown_fields). Schema-from-columns mapper handles common Arrow types case-insensitively. Single-byte options reject non-ASCII to avoid slicing a UTF-8 codepoint.sdk/src/index.ts):async registerCsv(name, data: string | BufferSource, options?: CsvReadOptions). JS-side serializes options + encodes strings → UTF-8 bytes.csvfeature on thearrowcrate.Options (camelCase)
headerbooleantruedelimiterstring(1 ASCII char),quotestring(1 ASCII char)"escapestring(1 ASCII char)terminatorstring(1 ASCII char)\n/\r\nbatchSizenumber8192inferRowsnumber1000schema[{name,type,nullable?}]Type names mapped to Arrow (case-insensitive):
int8/int16/int32/int64,uint*,float32/float64,boolean,string/utf8/varchar/text,date/date32/date64,timestamp/timestamp_{s,ms,us,ns}.Test plan
just build— wasm-pack + dist builds clean (72.3 MB)just test— all 27 SDK tests pass (6 new forregisterCsv)just typecheck— TypeScript types compile cleancargo clippy --all-targets— no warningscargo test --lib test_arrow_schema_from_columns_maps_aliases— passeswasm-pack test --node(CI; 6 newregister_csvwasm-bindgen tests)PR target
Targets
feat/wasm-cubeso it ships alongside the rest of the cube work.🤖 Generated with Claude Code
Summary by CodeRabbit