Skip to content

feat(wren-core-wasm): add registerCsv to load CSV data#2283

Merged
goldmedal merged 3 commits into
feat/wasm-cubefrom
feat/wasm-register-csv
May 15, 2026
Merged

feat(wren-core-wasm): add registerCsv to load CSV data#2283
goldmedal merged 3 commits into
feat/wasm-cubefrom
feat/wasm-register-csv

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 15, 2026

Summary

Mirror registerJson / registerParquet for CSV. WASM SDK callers can now load CSV tables in-browser without pre-converting to JSON or Parquet.

// Inferred schema, defaults (header=true, comma, double-quote)
await engine.registerCsv('orders', 'id,amount\n1,100\n2,200');

// Or with full control
await engine.registerCsv('metrics', csvBytes, {
  header: false,
  delimiter: ';',
  quote: "'",
  schema: [
    { name: 'id', type: 'int64' },
    { name: 'amount', type: 'float64' },
  ],
});

Implementation

  • Rust (core/wren-core-wasm/src/lib.rs): #[wasm_bindgen(js_name = registerCsv)] register_csv(&self, table_name, data: &[u8], options_json: &str). Options deserialize from JSON (deny_unknown_fields). Schema-from-columns mapper handles common Arrow types case-insensitively. Single-byte options reject non-ASCII to avoid slicing a UTF-8 codepoint.
  • TypeScript (sdk/src/index.ts): async registerCsv(name, data: string | BufferSource, options?: CsvReadOptions). JS-side serializes options + encodes strings → UTF-8 bytes.
  • Enabled csv feature on the arrow crate.

Options (camelCase)

Field Type Default
header boolean true
delimiter string (1 ASCII char) ,
quote string (1 ASCII char) "
escape string (1 ASCII char) unset
terminator string (1 ASCII char) \n / \r\n
batchSize number 8192
inferRows number 1000
schema [{name,type,nullable?}] inferred

Type names mapped to Arrow (case-insensitive): int8/int16/int32/int64, uint*, float32/float64, boolean, string/utf8/varchar/text, date/date32/date64, timestamp/timestamp_{s,ms,us,ns}.

Test plan

  • just build — wasm-pack + dist builds clean (72.3 MB)
  • just test — all 27 SDK tests pass (6 new for registerCsv)
  • just typecheck — TypeScript types compile clean
  • cargo clippy --all-targets — no warnings
  • cargo test --lib test_arrow_schema_from_columns_maps_aliases — passes
  • wasm-pack test --node (CI; 6 new register_csv wasm-bindgen tests)

PR target

Targets feat/wasm-cube so it ships alongside the rest of the cube work.


🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • CSV data support: Register and query CSV data with customizable options including delimiters, header configuration, and explicit schema definition
    • New CSV quickstart example and updated documentation guide users through CSV registration and data querying

Review Change Stack

Mirror registerJson/registerParquet for CSV data. The Rust side accepts
bytes + JSON-encoded options; the TS wrapper accepts string | BufferSource
and serializes the options object for you.

Options (camelCase): header, delimiter, quote, escape, terminator,
batchSize, inferRows, schema. Schema is an optional `[{name, type, nullable?}]`
list that maps to Arrow types (case-insensitive); omit it to infer from the
first 1000 rows. Single-byte options reject non-ASCII inputs to avoid
silently slicing a multi-byte codepoint.

Adds 6 Rust wasm-bindgen tests, 1 native unit test, and 6 Node SDK tests
covering: inferred schema, string + Uint8Array inputs, custom
delimiter/quote, header=false + explicit schema, unknown type rejection,
non-ASCII delimiter rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1bd9c4d3-349c-41ad-9594-7ec422f611b0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/wasm-register-csv

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file rust Pull requests that update rust code core wasm labels May 15, 2026
- examples/data/orders.csv — 15-row standard CSV with header
- examples/data/products.tsv — 8-row tab-separated file
- examples/data/region_targets.csv — 6-row headerless CSV
- examples/csv-quickstart.html — demonstrates the three patterns:
  inferred schema, custom delimiter (TSV), and explicit Arrow schema.
  Auto-runs an aggregation query against each table.
- serve.mjs: add MIME types for .csv / .tsv; advertise the new demo URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren-core-wasm/README.md`:
- Around line 195-198: The schema type list is missing the supported type
"date32"; update the documentation entry that enumerates Schema column types
(the line listing `date`/`date64`, `timestamp` etc.) to include `date32`
alongside the other date types so the README matches the SDK/Rust implementation
and avoids confusion for users defining CSV schemas.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 290894e7-dd55-4d06-b904-26fe1634d32e

📥 Commits

Reviewing files that changed from the base of the PR and between 15a5950 and 56f725b.

📒 Files selected for processing (7)
  • core/wren-core-wasm/AGENT_GUIDE.md
  • core/wren-core-wasm/Cargo.toml
  • core/wren-core-wasm/README.md
  • core/wren-core-wasm/sdk/src/index.ts
  • core/wren-core-wasm/sdk/src/wren_core_wasm.d.ts
  • core/wren-core-wasm/sdk/tests/index.test.mjs
  • core/wren-core-wasm/src/lib.rs

Comment thread core/wren-core-wasm/README.md
- csv-quickstart.html imports the raw `WrenEngine` from `pkg/`, like the
  other examples, so it must use the raw 3-arg `registerCsv(name, bytes,
  options_json)` signature. The previous version passed a string + object
  through the SDK-style overload, which crashed at the bindgen layer with
  'undefined is not an object (evaluating arg.length)'. Switched to
  fetching bytes via fetch().arrayBuffer() and wrapping the raw call in a
  small registerCsv() helper.
- README: add `date32` to the supported CSV schema type list (CodeRabbit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal merged commit d189a79 into feat/wasm-cube May 15, 2026
3 of 4 checks passed
@goldmedal goldmedal deleted the feat/wasm-register-csv branch May 15, 2026 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation rust Pull requests that update rust code wasm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant