fix: handle ARROW_STREAM attachment in type generator by jamesbroadhead · Pull Request #316 · databricks/appkit

jamesbroadhead · 2026-04-27T17:09:30Z

Summary

When serverless warehouses return ARROW_STREAM format, DESCRIBE QUERY results come as an inline base64 Arrow IPC attachment rather than data_array. This caused convertToQueryType to generate empty types {}.
Added a fallback in convertToQueryType that decodes the Arrow IPC attachment schema to extract column names and types when data_array is empty.
Extended DatabricksStatementExecutionResponse type with the optional attachment field.

Test plan

Verify type generation against a serverless warehouse that returns ARROW_STREAM format
Verify existing JSON_ARRAY warehouses still produce correct types (no regression)
Run pnpm typecheck and pnpm test to confirm no build or test regressions

Replaces #271 (fork-based PR where CI couldn't run).

This pull request was AI-assisted by Isaac.

When serverless warehouses return ARROW_STREAM format, the DESCRIBE QUERY result comes as an inline base64 Arrow IPC attachment rather than data_array. This caused convertToQueryType to generate empty types {}. Add a fallback that decodes the Arrow IPC attachment schema to extract column names and types when data_array is empty. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The Arrow IPC schema-decoding code in this PR imports apache-arrow from packages/appkit/src/type-generator/query-registry.ts. Until now it resolved transitively via packages/appkit-ui, which knip flags as an unlisted dependency. Declare it directly to satisfy knip and make the dependency explicit. Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

The previous implementation read `table.schema.fields` from the ARROW_STREAM attachment. A DESCRIBE QUERY response is a result *table* whose own schema is `(col_name, data_type, comment)` — so this would generate the same bogus type `{ col_name: string; data_type: string; comment: string }` for every query routed through serverless. - Replace columnsFromArrowAttachment with one that iterates table.toArray() and reads the col_name / data_type / comment values per row, matching the legacy data_array path. - Drop arrowTypeToSqlName entirely. The numeric TypeId map was wrong (e.g. case 1 -> Bool but apache-arrow Type=Null, case 6 -> Binary but Type=Bool, case 14 -> Struct but Type=Union); since data_type already carries the SQL type name as a string, the helper is no longer needed. - Add tests covering the attachment fallback, the data_array-prefers-attachment case, lowercase data_type normalization, and the malformed-attachment path. Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

Some serverless warehouses reject JSON_ARRAY + INLINE for DESCRIBE QUERY and return ARROW_STREAM by default. The previous behavior just removed the broken fallback, which meant typegen produced `unknown` types for those warehouses' queries. This restores the fallback (retry without explicit format if JSON_ARRAY is rejected) and teaches `convertToQueryType` to decode an inline base64 Arrow IPC attachment when `data_array` is empty. The DESCRIBE QUERY result is itself a table with rows shaped (col_name, data_type, comment), so the decode reads `table.toArray().map(r => r.toJSON())` rather than `table.schema.fields` — reading the schema would yield bogus types (every query would come out shaped like the metadata columns). Re-adds apache-arrow as an appkit dependency (only the typegen uses it; the runtime SDK does not). Tests cover: schema extraction from data rows, lowercase type normalization, data_array taking precedence when both are present, and graceful degradation on malformed attachments. Supersedes #316. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>

jamesbroadhead · 2026-04-28T11:16:28Z

Closing — superseded by #256.

The ARROW_STREAM attachment decode fallback in convertToQueryType (and the type extension on DatabricksStatementExecutionResponse.result.attachment) was ported into #256 in commit e1e9017, along with the four supporting tests. #256 also restores the JSON_ARRAY-rejection retry loop so the new decode path actually engages on warehouses that don't support JSON_ARRAY + INLINE.

Closing without merging.

jamesbroadhead mentioned this pull request Apr 27, 2026

fix: handle ARROW_STREAM attachment in type generator #271

Closed

3 tasks

jamesbroadhead added 2 commits April 27, 2026 17:28

jamesbroadhead force-pushed the fix/type-generator-arrow-stream branch from 6e34164 to 55a3a97 Compare April 27, 2026 17:30

jamesbroadhead requested a review from atilafassina April 27, 2026 20:28

jamesbroadhead closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle ARROW_STREAM attachment in type generator#316

fix: handle ARROW_STREAM attachment in type generator#316
jamesbroadhead wants to merge 3 commits intomainfrom
fix/type-generator-arrow-stream

jamesbroadhead commented Apr 27, 2026

Uh oh!

jamesbroadhead commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesbroadhead commented Apr 27, 2026

Summary

Test plan

Uh oh!

jamesbroadhead commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant