fix: handle ARROW_STREAM attachment in type generator#316
Closed
jamesbroadhead wants to merge 3 commits intomainfrom
Closed
fix: handle ARROW_STREAM attachment in type generator#316jamesbroadhead wants to merge 3 commits intomainfrom
jamesbroadhead wants to merge 3 commits intomainfrom
Conversation
3 tasks
When serverless warehouses return ARROW_STREAM format, the DESCRIBE QUERY
result comes as an inline base64 Arrow IPC attachment rather than data_array.
This caused convertToQueryType to generate empty types {}.
Add a fallback that decodes the Arrow IPC attachment schema to extract column
names and types when data_array is empty.
Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
The Arrow IPC schema-decoding code in this PR imports apache-arrow from packages/appkit/src/type-generator/query-registry.ts. Until now it resolved transitively via packages/appkit-ui, which knip flags as an unlisted dependency. Declare it directly to satisfy knip and make the dependency explicit. Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
6e34164 to
55a3a97
Compare
The previous implementation read `table.schema.fields` from the
ARROW_STREAM attachment. A DESCRIBE QUERY response is a result *table*
whose own schema is `(col_name, data_type, comment)` — so this would
generate the same bogus type `{ col_name: string; data_type: string;
comment: string }` for every query routed through serverless.
- Replace columnsFromArrowAttachment with one that iterates table.toArray()
and reads the col_name / data_type / comment values per row, matching
the legacy data_array path.
- Drop arrowTypeToSqlName entirely. The numeric TypeId map was wrong
(e.g. case 1 -> Bool but apache-arrow Type=Null, case 6 -> Binary but
Type=Bool, case 14 -> Struct but Type=Union); since data_type already
carries the SQL type name as a string, the helper is no longer needed.
- Add tests covering the attachment fallback, the data_array-prefers-attachment
case, lowercase data_type normalization, and the malformed-attachment path.
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
jamesbroadhead
added a commit
that referenced
this pull request
Apr 28, 2026
Some serverless warehouses reject JSON_ARRAY + INLINE for DESCRIBE QUERY and return ARROW_STREAM by default. The previous behavior just removed the broken fallback, which meant typegen produced `unknown` types for those warehouses' queries. This restores the fallback (retry without explicit format if JSON_ARRAY is rejected) and teaches `convertToQueryType` to decode an inline base64 Arrow IPC attachment when `data_array` is empty. The DESCRIBE QUERY result is itself a table with rows shaped (col_name, data_type, comment), so the decode reads `table.toArray().map(r => r.toJSON())` rather than `table.schema.fields` — reading the schema would yield bogus types (every query would come out shaped like the metadata columns). Re-adds apache-arrow as an appkit dependency (only the typegen uses it; the runtime SDK does not). Tests cover: schema extraction from data rows, lowercase type normalization, data_array taking precedence when both are present, and graceful degradation on malformed attachments. Supersedes #316. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Contributor
Author
|
Closing — superseded by #256. The ARROW_STREAM attachment decode fallback in Closing without merging. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
attachmentrather thandata_array. This causedconvertToQueryTypeto generate empty types{}.convertToQueryTypethat decodes the Arrow IPC attachment schema to extract column names and types whendata_arrayis empty.DatabricksStatementExecutionResponsetype with the optionalattachmentfield.Test plan
pnpm typecheckandpnpm testto confirm no build or test regressionsReplaces #271 (fork-based PR where CI couldn't run).
This pull request was AI-assisted by Isaac.