feat: add INLINE + ARROW_STREAM format support for analytics plugin by jamesbroadhead · Pull Request #256 · databricks/appkit

jamesbroadhead · 2026-04-07T15:57:16Z

Summary

Serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in result.attachment rather than result.data_array. The previous code path discarded inline data for any ARROW_STREAM response (designed for EXTERNAL_LINKS), so these warehouses silently returned empty results.

This PR makes the analytics plugin work across classic and serverless warehouses by handling both dispositions for ARROW_STREAM, decoding inline Arrow IPC attachments server-side, and falling back to JSON_ARRAY when a warehouse rejects ARROW_STREAM + INLINE.

Changes

Format model collapsed to API enum names: two formats — JSON_ARRAY and ARROW_STREAM — replacing the prior JSON / ARROW / ARROW_STREAM triple. ARROW_STREAM now supports both INLINE and EXTERNAL_LINKS dispositions.
- Note: these strings (JSON_ARRAY, ARROW_STREAM, INLINE, EXTERNAL_LINKS) are the verbatim values defined by the Statement Execution API — we no longer translate to/from local aliases. Confirmed against the Databricks SDK Go source: Format (JSON_ARRAY / ARROW_STREAM / CSV) and Disposition (INLINE / EXTERNAL_LINKS). See also the API reference.
Inline Arrow IPC decoding: _transformDataArray detects result.attachment and decodes via apache-arrow's tableFromIPC, producing the same row-object shape as JSON_ARRAY regardless of warehouse backend. apache-arrow@21.1.0 added as a server dependency.
Format fallback: ARROW_STREAM + INLINE requests automatically fall back to JSON_ARRAY if a classic warehouse rejects them. Explicit format requests are respected without fallback.
Default remains JSON_ARRAY for compatibility.
UI hooks updated to use json_array / arrow_stream naming in useAnalyticsQuery, useChartData, and chart types.

Tests

Added ~147 unit tests covering coverage gaps surfaced during the rework:

connectors/genie/tests/client.test.ts (new, 786 lines)
connectors/sql-warehouse/tests/client.test.ts (new, 286 lines) — uses real base64 Arrow IPC captured from a live serverless warehouse
context/tests/service-context.test.ts (new, 457 lines)
plugins/files/tests/upload-and-write.test.ts (new, 1245 lines)
stream/tests/stream-registry.test.ts (new, 582 lines)
plugins/analytics/tests/analytics.test.ts (+217)

Coverage deltas: service-context 7%→100%, stream-registry 32%→100%, genie connector 61%→~~97%, files plugin 69%→~~89%.

Test plan

Deploy an app against a serverless warehouse that returns ARROW_STREAM + INLINE attachments — verify useAnalyticsQuery returns rows
Deploy against a classic warehouse — verify JSON_ARRAY default works and ARROW_STREAM + EXTERNAL_LINKS still works
Verify automatic fallback when a warehouse rejects ARROW_STREAM + INLINE

Fixes #242

This pull request was AI-assisted by Isaac.

MarioCadenas · 2026-04-09T09:00:29Z


 /** Supported data formats for analytics queries */
-export type DataFormat = "json" | "arrow" | "auto";
+export type DataFormat = "json" | "arrow" | "arrow_stream" | "auto";


in theory arrow is the same as arrow_stream, so I'm not following what's the problem?

MarioCadenas · 2026-04-09T09:02:51Z

+  /** Format configurations in fallback order. */
+  private static readonly FORMAT_CONFIGS = {
+    ARROW_STREAM: {
+      formatParameters: { disposition: "INLINE", format: "ARROW_STREAM" },


from this URL
https://docs.databricks.com/api/workspace/statementexecution/executestatement#format

Important: The formats ARROW_STREAM and CSV are supported only with EXTERNAL_LINKS disposition. JSON_ARRAY is supported in INLINE and EXTERNAL_LINKS disposition.

so before changing anything this was already supporting arrow, can I know what's the case where this was failing? I would like to see it

MarioCadenas · 2026-04-09T09:15:36Z

Seeing that there's a case of Arrow + inline, let's refactor what we had instead of introducing a new format. Let's change the format "ARROW" to "ARROW_STREAM" and allow it to use both "EXTERNAL_LINKS" and "INLINE". Then for now let's keep JSON + inline as the default. This might require some UI hooks changes too

jamesbroadhead · 2026-04-16T13:16:35Z

Hi Mario — thanks for the review, you were right on both counts. I've rebased and reworked the PR:

What's happening: Serverless warehouses return format: ARROW_STREAM with disposition: INLINE, putting base64 Arrow IPC data in result.attachment instead of data_array. The docs say ARROW_STREAM only works with EXTERNAL_LINKS, but serverless violates that — and the existing code silently returned empty results for these responses.

What I changed (per your suggestion):

Collapsed the three formats (JSON, ARROW, ARROW_STREAM) into two that match the API enums: JSON_ARRAY and ARROW_STREAM
ARROW_STREAM now supports both INLINE (with attachment decoding) and EXTERNAL_LINKS dispositions
Default remains JSON_ARRAY as you requested
Added format fallback: if ARROW_STREAM+INLINE is rejected by a classic warehouse, it falls back to JSON_ARRAY automatically
Updated the UI hooks to use json_array / arrow_stream naming

Also added 147 new unit tests covering major coverage gaps (service-context 7%→100%, stream-registry 32%→100%, genie connector 61%→97%, files plugin 69%→89%). All 1711 tests pass.

Cleaned up the branch — it's now TS-only, no unrelated changes.

Some serverless warehouses only support ARROW_STREAM with INLINE disposition, but the analytics plugin only offered JSON_ARRAY (INLINE) and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM" format option that uses INLINE disposition, making the plugin compatible with these warehouses. Fixes #242

Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides

The server-side ARROW_STREAM format added in the previous commit was not exposed to the frontend or typegen: - Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks - Add "arrow_stream" to DataFormat in chart types - Handle "arrow_stream" in useChartData's resolveFormat() - Make typegen resilient to ARROW_STREAM-only warehouses by retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

…compatibility ARROW_STREAM with INLINE disposition is the only format that works across all warehouse types, including serverless warehouses that reject JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout: - Server: defaults.ts, analytics plugin request handler - Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData - Tests: update assertions for new default JSON and ARROW formats remain available via explicit format parameter. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

When using the default ARROW_STREAM format, the analytics plugin now automatically falls back through formats if the warehouse rejects one: ARROW_STREAM → JSON → ARROW. This handles warehouses that only support a subset of format/disposition combinations without requiring users to know their warehouse's capabilities. Explicit format requests (JSON, ARROW) are respected without fallback. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Previously, _transformDataArray unconditionally called updateWithArrowStatus for any ARROW_STREAM response, which discards inline data and returns only statement_id + status. This was designed for EXTERNAL_LINKS (where data is fetched separately) but broke INLINE disposition where data is in data_array. Changes: - _transformDataArray now checks for data_array before routing to the EXTERNAL_LINKS path: if data_array is present, it falls through to the standard row-to-object transform. - JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying on connector defaults. This prevents the connector default format from leaking into explicit JSON requests. - Connector defaults reverted to JSON_ARRAY for backward compatibility with classic warehouses (the analytics plugin sets formats explicitly). - Added connector-level tests for _transformDataArray covering ARROW_STREAM + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Some serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in `result.attachment` rather than `result.data_array`. This adds server-side decoding using apache-arrow's tableFromIPC to convert the attachment into row objects, producing the same response shape as JSON_ARRAY regardless of warehouse backend. This abstracts a Databricks internal implementation detail (different warehouses returning different response formats) so app developers get a consistent `type: "result"` response with named row objects. Changes: - Add apache-arrow@21.1.0 as a server dependency (already used client-side) - _transformDataArray detects `attachment` field and decodes via tableFromIPC - Connector tests use real base64 Arrow IPC captured from a live serverless warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS, serverless INLINE attachment, data_array fallback, and edge cases Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

…ctor, files plugin New test files covering major coverage gaps: - context/tests/service-context.test.ts (35 tests, 7% -> 100%) - stream/tests/stream-registry.test.ts (34 tests, 32% -> 100%) - connectors/genie/tests/client.test.ts (28 tests, 61% -> 97%) - plugins/files/tests/upload-and-write.test.ts (50 tests, 69% -> 89%) Total: 1566 -> 1713 tests, all passing. Co-authored-by: Isaac

…format model Address Mario's review: collapse three formats (JSON, ARROW, ARROW_STREAM) into two that match the Databricks API enums. ARROW_STREAM supports both INLINE and EXTERNAL_LINKS dispositions with automatic fallback. Default remains JSON_ARRAY per reviewer request. Co-authored-by: Isaac

Fixes issues found by parallel review across Claude harsh-reviewer, GPT 5.4 xhigh, and Gemini 3.1 Pro on PR #256. - Validate `format` at the route boundary (reject anything other than JSON_ARRAY / ARROW_STREAM with 400) so legacy `"JSON"` / `"ARROW"` callers fail loudly instead of silently falling through to ARROW_STREAM. - Disable cache for ARROW_STREAM queries — EXTERNAL_LINKS pre-signed URLs expire much sooner than the default 1h TTL, so caching the statement_id hands out dead URLs on cache hits. - Re-check `signal.aborted` before issuing the EXTERNAL_LINKS fallback — previously an aborted query would still bill a second statement that the client never reads. - Replace fragile `msg.includes("ARROW_STREAM") || msg.includes("INLINE")` format-error detection with `_isInlineArrowUnsupported()` that requires both keywords plus a marker phrase, or a structured `error_code` of INVALID_PARAMETER_VALUE / NOT_IMPLEMENTED. Reduces false-positives on legitimate SQL/permission errors that mention those words. - Normalize BigInt values from `row.toJSON()` when decoding inline Arrow attachments — `apache-arrow` returns BigInt for INT64/DECIMAL columns, which JSON.stringify rejects when the SSE writer serializes the response. - Cap inline Arrow attachment decode at 64 MiB and wrap decode in try/catch so malformed payloads surface a typed ExecutionError instead of a generic 500. - Detect external_links explicitly in `_transformDataArray` rather than treating any "no attachment + no data_array" response as EXTERNAL_LINKS. Previously an empty INLINE response would return a phantom statement_id the client could not fetch. - Remove the typegen DESCRIBE QUERY fallback that retried without `format` and got ARROW_STREAM responses with no `data_array`, silently degrading generated types to `unknown`. The surrounding allSettled already converts thrown errors to `unknown` types via `generateUnknownResultQuery`. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

useAnalyticsQuery({ format: "ARROW_STREAM" }) is typed as TypedArrowTable, but the previous implementation decoded inline IPC attachments to row objects on the server and emitted them as { type: "result", data: rows }. Same requested format produced a TypedArrowTable for EXTERNAL_LINKS warehouses and a plain Row[] for INLINE/serverless warehouses, breaking direct callers that invoked Arrow Table methods. This commit makes ARROW_STREAM always deliver an Arrow Table to the client: - SQL connector forwards the base64 IPC attachment through unchanged (with a 64 MiB size cap) instead of decoding it server-side. - Analytics route emits a new SSE message { type: "arrow_inline", attachment: <base64> } when ARROW_STREAM + INLINE returns an attachment. - useAnalyticsQuery decodes the base64 payload via the existing ArrowClient.processArrowBuffer pipeline, producing a Table with the same runtime shape as the EXTERNAL_LINKS path. - Drops apache-arrow from the appkit (server) dependencies — decoding now lives only in appkit-ui where Arrow was already loaded. Tests: - Updated client.test.ts to assert the connector preserves the attachment. - Added a size-cap test for oversized inline attachments. - Added analytics.test.ts coverage for: arrow_inline SSE emission, format validation rejecting unknown values with 400, and aborted-signal short-circuit before fallback retry. - Added use-analytics-query.test.ts (new) covering arrow_inline decoding, decode-failure error path, and JSON_ARRAY result handling. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

upload-and-write.test.ts (added in 7164d3b) covers FilesPlugin internals (_handleApiError mapping, upload stream size enforcement, cache invalidation, etc.) — entirely separate from this PR's analytics/Arrow scope. The file ships with 9 failing tests against the current files plugin (assertion mismatches on HTTP status codes, Readable.toWeb mock type errors) that block CI here. Removing it from this PR. The coverage push will come back as its own PR where the failures can be properly debugged. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Some serverless warehouses reject JSON_ARRAY + INLINE for DESCRIBE QUERY and return ARROW_STREAM by default. The previous behavior just removed the broken fallback, which meant typegen produced `unknown` types for those warehouses' queries. This restores the fallback (retry without explicit format if JSON_ARRAY is rejected) and teaches `convertToQueryType` to decode an inline base64 Arrow IPC attachment when `data_array` is empty. The DESCRIBE QUERY result is itself a table with rows shaped (col_name, data_type, comment), so the decode reads `table.toArray().map(r => r.toJSON())` rather than `table.schema.fields` — reading the schema would yield bogus types (every query would come out shaped like the metadata columns). Re-adds apache-arrow as an appkit dependency (only the typegen uses it; the runtime SDK does not). Tests cover: schema extraction from data rows, lowercase type normalization, data_array taking precedence when both are present, and graceful degradation on malformed attachments. Supersedes #316. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Tightens correctness gaps surfaced by the second ACE review pass (GPT 5.4 + Gemini 3.1 Pro + Claude harsh-reviewer): - _isInlineArrowUnsupported now requires both INLINE and ARROW_STREAM in the message regardless of whether a structured error_code is present. The previous shape would fall back on any INVALID_PARAMETER_VALUE that mentioned just one keyword. - useAnalyticsQuery validates that arrow_inline messages carry a non-empty string attachment before invoking atob, so a malformed payload surfaces a clear error instead of a confusing decode crash. - ARROW_STREAM cache TTL is capped at 600s rather than disabling the cache outright. Pre-signed EXTERNAL_LINKS URLs typically expire in ~15min, so 10min is a safe upper bound that still preserves caching for INLINE attachment responses. - Type generator's JSON_ARRAY-rejection retry now (a) requires both JSON_ARRAY and a marker phrase before retrying, mirroring the analytics-plugin tightening, and (b) explicitly requests ARROW_STREAM + INLINE on the retry rather than letting the warehouse default kick in. Previously the warehouse default could return EXTERNAL_LINKS, in which case neither data_array nor attachment were populated and types silently degraded to `unknown`. - _validateArrowAttachment strips whitespace and accounts for `=` padding so the size check is exact rather than an upper bound. Also uses `Math.floor` so the cap matches the actual decoded byte count. Tests: - New: structured error_code path triggers fallback when both keywords are present. - New: regression — error mentioning only one of INLINE / ARROW_STREAM must not escalate to EXTERNAL_LINKS even if the retry interceptor attempts the query multiple times. - New: hook rejects arrow_inline with attachment values that are undefined / null / empty / non-string / object, never invoking atob or processArrowBuffer on bad input. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The hook is typed `TypedArrowTable` for ARROW_STREAM, but empty results (zero rows / no inline attachment / no external_links) previously fell through to `{ type: "result", ...result }` with no `data` field. Direct useAnalyticsQuery callers expecting a Table would crash on `.numRows` or `.getChild()`. This commit makes ARROW_STREAM truly contract-consistent: when the response has a manifest schema but no row payload, the connector synthesizes a zero-row Arrow IPC stream from the schema and stashes it in `result.attachment`. The existing arrow_inline path emits it to the client, which decodes it as a real Arrow Table — same shape as a non-empty result, just with `numRows === 0` and empty vectors per column. A new module `connectors/sql-warehouse/arrow-schema.ts` parses Databricks SQL `type_text` values into Apache Arrow `DataType`s. It is fully exhaustive: - Scalars (STRING, BIGINT, DECIMAL, TIMESTAMP, BOOLEAN, …) - Parameterized: DECIMAL(p,s), VARCHAR(n), CHAR(n) - Nested: ARRAY<T>, MAP<K,V>, STRUCT<f1:T1, f2:T2 COMMENT '…', …> - INTERVAL year-month and day-time variants - Backtick-quoted struct field names (incl. `` `` `` escapes) - Recursive nesting (verified up to MAP<STRING,ARRAY<STRUCT<…>>>) - NOT NULL annotations on struct fields Unknown / unparseable types degrade to Utf8 rather than throwing, so future Databricks types don't crash empty-result handling. Tests: 72 cases covering every scalar, parameterization edge case, deeply nested combinations, struct field annotations / comments / backtick escapes, INTERVAL variants, and IPC round-trip. 7 connector tests covering attachment synthesis + the cases where it must NOT synthesize (external_links present, schema absent). Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The previous SSE event-size limits (1 MiB on both server and client) were below the connector's 64 MiB cap on inline Arrow IPC attachments. Anything between 1 MiB and 64 MiB would pass the connector check but get rejected at the SSE layer with a confusing "Buffer size exceeded" error. The 64 MiB cap was effectively unreachable. This commit aligns all three caps at 8 MiB: - `streamDefaults.maxEventSize` (server SSE event cap) - `connectSSE.maxBufferSize` (client SSE buffer cap) - `MAX_INLINE_ATTACHMENT_BYTES` (connector inline attachment cap) 8 MiB is well above what analytics queries return in practice (most are well under 1 MiB), gives ~3x headroom for compact ARROW_STREAM payloads, and stays below the Databricks Statement Execution API hard limit on INLINE disposition (~25 MiB). Anything larger than 8 MiB now fails fast at the connector with a typed `ExecutionError`, and ARROW_STREAM falls back to EXTERNAL_LINKS which has no size pressure. A follow-up draft PR explores stash-and-serve via /arrow-result/:jobId as a stronger architectural solution that removes the SSE size cap entirely. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

DRAFT proposal. Replaces the `arrow_inline` SSE message type from #256 with an out-of-band delivery mechanism. Motivation: remove the SSE event-size cap as a constraint on inline Arrow result size. Architecture ------------ ARROW_STREAM responses are now delivered uniformly: 1. The connector returns the base64 Arrow IPC attachment in `result.attachment` for INLINE responses (unchanged from #256). 2. The analytics route base64-decodes it once, stashes the resulting buffer in `InlineArrowStash` (bounded LRU + TTL), and emits the same `{ type: "arrow", statement_id }` SSE message that EXTERNAL_LINKS already uses — with a synthetic id prefixed `inline-`. 3. The client (unchanged from main) calls `/arrow-result/:jobId` for any `type: "arrow"` message. 4. The route handler checks the stash first; if the id has the `inline-` prefix and is still present, it serves the bytes from memory. Otherwise it falls through to the existing warehouse fetch path for real EXTERNAL_LINKS statement_ids. The `arrow_inline` SSE message type, the client-side base64 decode helper, and the SSE buffer-size bumps from #256 all go away. Reasoning --------- SSE was designed for streams of small control messages. Pushing multi-MB Arrow IPC bytes through it has two unavoidable costs: - single-event memory ceiling on both server and client, - proxy/load-balancer compatibility issues for large events. Bumping the SSE buffer (5b in the design discussion) raises both caps but doesn't fix the architectural mismatch. This PR moves bulk bytes to HTTP, where they belong: - HTTP/2 streaming, gzip, and browser background-fetch all work naturally. - SSE buffer can stay at the conservative 1 MiB default. - The wire protocol unifies INLINE and EXTERNAL_LINKS — the client has a single code path for ARROW_STREAM data. - Inline result size cap goes back up to the Databricks API limit (25 MiB) instead of being constrained by SSE. Tradeoffs --------- - Server holds IPC buffers in memory until the client fetches them (one-shot `take()` removes on read; passive TTL evicts otherwise). For 100 entries × 25 MiB worst case, that's 2.5 GiB peak — but steady state is much smaller because entries are removed as soon as the client fetches. - Single-process only. A multi-server deployment would need a shared store (Redis or equivalent) keyed on the synthetic id. The stash interface is small enough to swap implementations. - One-shot reads mean the client cannot retry a failed fetch. Acceptable for our use case (warehouse query returned data, the hook either succeeds in fetching or surfaces a generic error). Status ------ DRAFT: opened to discuss the architectural direction before landing. #256 ships with the simpler 5b SSE-buffer-bump approach (8 MiB) which is sufficient for the realistic analytics range. This proposal is the cleaner long-term solution if the team wants to invest in the server-side state. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

#2 — replace Object.defineProperty(signal, "aborted", ...) with vi.spyOn(signal, "aborted", "get").mockReturnValue(true). The native AbortSignal.aborted is a non-configurable getter; the previous form worked under vitest's Node env but could throw "Cannot redefine property" on stricter engines. The getter spy is vitest's idiomatic way to mock non-configurable accessors and works regardless of configurability. #3 — defensive client-side size cap on arrow_inline attachments (8 MiB), mirroring the connector cap. Prevents a misconfigured proxy or future server bug from pushing the browser into allocating an unbounded Uint8Array. Both caps carry a comment pointing to PR #320 which removes the arrow_inline SSE path entirely; that PR can drop the client cap and raise the connector cap to the API's 25 MiB hard limit when it lands. Tests: 1 new — hook rejects oversized arrow_inline payloads before decodeBase64 or processArrowBuffer is called. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The fallback decision in analytics.ts previously depended on substring- matching warehouse error messages, which is brittle against any change in error wording from the SDK or warehouse. The SDK already exposes a structured ApiError with `errorCode`, and the warehouse's `status.error.error_code` is a typed enum on the SDK's ServiceError. Both signals were being thrown away when we wrapped them into ExecutionError. This commit propagates the code through: - ExecutionError gains a `readonly errorCode?: string` field and `statementFailed(message, errorCode?)` accepts it. - The connector forwards `status.error?.error_code` at both Path B sites (initial SUCCEEDED/FAILED switch and the polling loop). - The connector's catch-all wrap detects an SDK ApiError-shaped object via duck-typing on `errorCode` and forwards it through Path A too. - _isInlineArrowUnsupported now checks `err instanceof ExecutionError && err.errorCode === "INVALID_PARAMETER_VALUE"` as the primary signal. The substring backstop remains for legacy SDK builds that don't surface structured codes, but it's secondary. Net effect: format-rejection detection becomes a structured-code check that's stable across error-message wording changes. Substring matching shrinks to a backstop role. Tests: new — assert the route falls back correctly when the connector hands up a structured ExecutionError, not just a substring-matchable message. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Replaces the loosely-typed Promise<{ type: string; [key: string]: any }> return type with a Zod-validated discriminated union shared between the server and the client. The schema lives in `shared/src/sse/analytics.ts` and is the single source of truth for what messages can flow over /api/analytics/query: - result — JSON_ARRAY rows (or empty results) - arrow — ARROW_STREAM via /arrow-result/:jobId - arrow_inline — ARROW_STREAM with base64 IPC bytes inline Server side: - _executeWithFormatFallback now returns AnalyticsSseMessage. - makeResultMessage / makeArrowMessage / makeArrowInlineMessage builders enforce required fields at construction time. A typo on a key fails to compile. Client side (useAnalyticsQuery): - Validates the parsed JSON via AnalyticsSseMessage.safeParse before dispatching to a handler. Type narrows correctly across branches — no more `parsed.attachment as string` casts. - A malformed payload (server bug or version skew) surfaces a clear error instead of allocating a buffer from `undefined` or treating a missing field as data. Adds zod 3.23.8 to `shared`. CLAUDE.md already lists zod as a key dependency for runtime validation, but the codebase didn't actually use it anywhere — this lands the first instance. Tests: - 13 new schema tests in shared (each variant accept/reject, builder round-trips). - Existing analytics route + hook tests pass unchanged because the schema accepts the message shapes the route was already producing. - One hook test was rewritten: bad-attachment cases now hit the schema validator (which rejects non-string / empty / null) and surface the same generic error as before. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The "logs a warning and yields no columns on malformed attachment" test name promised a logger.warn assertion but only checked hasResults === false. A regression that silently produced empty types (no telemetry signal) would still pass. Mock the logger module via vi.hoisted + vi.mock and assert: - logger.warn fires with the expected message, - the generated type is the unknown-result fallback (contains "unknown"), - DESCRIBE QUERY metadata column names ("col_name", "data_type") don't leak into the user-facing type — that would mean the parser swallowed the failure and produced bogus columns from misinterpreted IPC bytes instead of degrading cleanly. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Two CI failures in the previous run, both behind-the-scenes regressions from recent commits: 1. Playground integration tests timed out because chart canvases never rendered. Root cause: pnpm resolved zod 4.1.13 (rather than the 3.23.8 specified in package.json — peer/dedup behavior), and zod 4 changed `z.record()` to require both key and value type arguments. The single-arg form `z.record(z.unknown())` from the AnalyticsResultMessage schema crashed at runtime when validating any payload with a `data` array, the safeParse returned success:false, the hook treated every mock SSE response as malformed, and the chart loading state never cleared. Fix: switch to the two-arg form `z.record(z.string(), z.unknown())` which is valid in both zod 3 and zod 4. 2. Docs build sync check failed because Class.ExecutionError.md was stale relative to the new `errorCode` field added in 3d32491. Regenerated via `pnpm docs:build`. No new tests needed — the existing analytics route + hook tests now exercise the corrected schema, and the regression would have been caught earlier had the schema test been driven through the SDK's JSON.stringify → JSON.parse roundtrip the way the integration test does. Worth a follow-up to add an integration-style schema test that asserts a sample mock payload validates successfully. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

jamesbroadhead · 2026-04-29T15:18:56Z

Hi — Claude here, on James's behalf.

Closing in favour of a 3-PR stack carved out of this branch. Same end-state, easier to review:

test: backfill coverage for genie connector, service context, stream registry #327 — test: backfill coverage for genie connector, service context, stream registry (pure test additions, ~1825 lines)
refactor: rename AnalyticsFormat to API enum names (JSON_ARRAY, ARROW_STREAM) #328 — refactor: rename AnalyticsFormat to API enum names (mechanical JSON/ARROW → JSON_ARRAY/ARROW_STREAM)
feat: decode inline Arrow IPC + warehouse-compat fallback #329 — feat: decode inline Arrow IPC + warehouse-compat fallback (the actual fix — arrow-schema.ts, inline IPC decode, format fallback, zod-validated SSE wire protocol)

Merge order: #327 → #328 → #329. Verified the sum of the three branches matches the PR-touched files on this branch exactly (0 diffs across all 30 files).

jamesbroadhead requested a review from MarioCadenas April 7, 2026 16:51

MarioCadenas reviewed Apr 9, 2026

View reviewed changes

jamesbroadhead mentioned this pull request Apr 15, 2026

feat: inline Arrow IPC support with format fallback #273

Closed

4 tasks

jamesbroadhead mentioned this pull request Apr 16, 2026

test: close major coverage gaps in service-context, stream-registry, genie connector, files plugin #278

Closed

3 tasks

jamesbroadhead requested a review from MarioCadenas April 16, 2026 16:29

jamesbroadhead added 9 commits April 27, 2026 17:36

Add tests for ARROW_STREAM and ARROW format parameter handling

b1566ea

Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides

jamesbroadhead force-pushed the fix/arrow-stream-inline-support branch from d4e4cea to a003274 Compare April 27, 2026 17:37

jamesbroadhead added 3 commits April 27, 2026 19:58

jamesbroadhead mentioned this pull request Apr 27, 2026

test: add files plugin upload + error-handling coverage (50 tests) #319

Open

3 tasks

This was referenced Apr 28, 2026

fix: convert BigInt to strings in Arrow IPC row.toJSON() #315

Closed

fix: handle ARROW_STREAM attachment in type generator #316

Closed

jamesbroadhead added 3 commits April 28, 2026 11:38

jamesbroadhead mentioned this pull request Apr 28, 2026

RFC: stash inline Arrow IPC server-side, serve via /arrow-result #320

Draft

jamesbroadhead added 5 commits April 28, 2026 19:19

This was referenced Apr 29, 2026

test: backfill coverage for genie connector, service context, stream registry #327

Open

refactor: rename AnalyticsFormat to API enum names (JSON_ARRAY, ARROW_STREAM) #328

Open

feat: decode inline Arrow IPC + warehouse-compat fallback #329

Open

jamesbroadhead closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add INLINE + ARROW_STREAM format support for analytics plugin#256

feat: add INLINE + ARROW_STREAM format support for analytics plugin#256
jamesbroadhead wants to merge 21 commits intomainfrom
fix/arrow-stream-inline-support

jamesbroadhead commented Apr 7, 2026 •

edited

Loading

Uh oh!

MarioCadenas Apr 9, 2026

Uh oh!

MarioCadenas Apr 9, 2026

Uh oh!

MarioCadenas commented Apr 9, 2026 •

edited

Loading

Uh oh!

jamesbroadhead commented Apr 16, 2026

Uh oh!

jamesbroadhead commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamesbroadhead commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Test plan

Uh oh!

MarioCadenas Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

MarioCadenas Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

MarioCadenas commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesbroadhead commented Apr 16, 2026

Uh oh!

jamesbroadhead commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jamesbroadhead commented Apr 7, 2026 •

edited

Loading

MarioCadenas commented Apr 9, 2026 •

edited

Loading