Skip to content

fix: handle ARROW_STREAM attachment in type generator#316

Closed
jamesbroadhead wants to merge 3 commits intomainfrom
fix/type-generator-arrow-stream
Closed

fix: handle ARROW_STREAM attachment in type generator#316
jamesbroadhead wants to merge 3 commits intomainfrom
fix/type-generator-arrow-stream

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

Summary

  • When serverless warehouses return ARROW_STREAM format, DESCRIBE QUERY results come as an inline base64 Arrow IPC attachment rather than data_array. This caused convertToQueryType to generate empty types {}.
  • Added a fallback in convertToQueryType that decodes the Arrow IPC attachment schema to extract column names and types when data_array is empty.
  • Extended DatabricksStatementExecutionResponse type with the optional attachment field.

Test plan

  • Verify type generation against a serverless warehouse that returns ARROW_STREAM format
  • Verify existing JSON_ARRAY warehouses still produce correct types (no regression)
  • Run pnpm typecheck and pnpm test to confirm no build or test regressions

Replaces #271 (fork-based PR where CI couldn't run).

This pull request was AI-assisted by Isaac.

When serverless warehouses return ARROW_STREAM format, the DESCRIBE QUERY
result comes as an inline base64 Arrow IPC attachment rather than data_array.
This caused convertToQueryType to generate empty types {}.

Add a fallback that decodes the Arrow IPC attachment schema to extract column
names and types when data_array is empty.

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
The Arrow IPC schema-decoding code in this PR imports apache-arrow
from packages/appkit/src/type-generator/query-registry.ts. Until
now it resolved transitively via packages/appkit-ui, which knip
flags as an unlisted dependency. Declare it directly to satisfy
knip and make the dependency explicit.

Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
@jamesbroadhead jamesbroadhead force-pushed the fix/type-generator-arrow-stream branch from 6e34164 to 55a3a97 Compare April 27, 2026 17:30
The previous implementation read `table.schema.fields` from the
ARROW_STREAM attachment. A DESCRIBE QUERY response is a result *table*
whose own schema is `(col_name, data_type, comment)` — so this would
generate the same bogus type `{ col_name: string; data_type: string;
comment: string }` for every query routed through serverless.

- Replace columnsFromArrowAttachment with one that iterates table.toArray()
  and reads the col_name / data_type / comment values per row, matching
  the legacy data_array path.
- Drop arrowTypeToSqlName entirely. The numeric TypeId map was wrong
  (e.g. case 1 -> Bool but apache-arrow Type=Null, case 6 -> Binary but
  Type=Bool, case 14 -> Struct but Type=Union); since data_type already
  carries the SQL type name as a string, the helper is no longer needed.
- Add tests covering the attachment fallback, the data_array-prefers-attachment
  case, lowercase data_type normalization, and the malformed-attachment path.

Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
jamesbroadhead added a commit that referenced this pull request Apr 28, 2026
Some serverless warehouses reject JSON_ARRAY + INLINE for DESCRIBE QUERY
and return ARROW_STREAM by default. The previous behavior just removed
the broken fallback, which meant typegen produced `unknown` types for
those warehouses' queries.

This restores the fallback (retry without explicit format if JSON_ARRAY
is rejected) and teaches `convertToQueryType` to decode an inline base64
Arrow IPC attachment when `data_array` is empty. The DESCRIBE QUERY
result is itself a table with rows shaped (col_name, data_type, comment),
so the decode reads `table.toArray().map(r => r.toJSON())` rather than
`table.schema.fields` — reading the schema would yield bogus types
(every query would come out shaped like the metadata columns).

Re-adds apache-arrow as an appkit dependency (only the typegen uses it;
the runtime SDK does not).

Tests cover: schema extraction from data rows, lowercase type
normalization, data_array taking precedence when both are present, and
graceful degradation on malformed attachments.

Supersedes #316.

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
@jamesbroadhead
Copy link
Copy Markdown
Contributor Author

Closing — superseded by #256.

The ARROW_STREAM attachment decode fallback in convertToQueryType (and the type extension on DatabricksStatementExecutionResponse.result.attachment) was ported into #256 in commit e1e9017, along with the four supporting tests. #256 also restores the JSON_ARRAY-rejection retry loop so the new decode path actually engages on warehouses that don't support JSON_ARRAY + INLINE.

Closing without merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant