Skip to content

fix(namespace): support nested types in convert_json_arrow_type#6300

Merged
BubbleCal merged 3 commits intolance-format:mainfrom
jiaoew1991:fix/json-arrow-schema-deserialize-nested-types
Mar 27, 2026
Merged

fix(namespace): support nested types in convert_json_arrow_type#6300
BubbleCal merged 3 commits intolance-format:mainfrom
jiaoew1991:fix/json-arrow-schema-deserialize-nested-types

Conversation

@jiaoew1991
Copy link
Copy Markdown
Contributor

convert_json_arrow_type only handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types.

This made arrow_type_to_json and convert_json_arrow_type asymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type".

In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary.

Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map.

Add a roundtrip test covering all supported types.

@github-actions github-actions Bot added the bug Something isn't working label Mar 26, 2026
jiaoew1991 added a commit to jiaoew1991/lance-duckdb that referenced this pull request Mar 26, 2026
…ABLES

When DuckDB runs SHOW ALL TABLES, the lance extension calls
GetDefaultEntries (list_tables) then CreateDefaultEntry (describe_table)
for each table sequentially. With ~1800 tables, this took ~600s.

Changes:
- list_tables_inner now concurrently describes all tables (10 at a time)
  to prefetch their JSON schemas alongside the table names
- GetDefaultEntries parses the prefetched schemas into a cache
- CreateDefaultEntry checks the cache before making HTTP calls
- Add tmp_schema module with convert_json_arrow_type that supports
  nested types (list, struct, large_binary, etc.) — the upstream
  lance-namespace only handles scalars (fix pending in lance-format/lance#6300)

Result: SHOW ALL TABLES drops from ~600s to ~21s (~28x speedup).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jiaoew1991 jiaoew1991 force-pushed the fix/json-arrow-schema-deserialize-nested-types branch from 895ed07 to c6bfff3 Compare March 26, 2026 11:26
jiaoew1991 and others added 2 commits March 26, 2026 19:29
`convert_json_arrow_type` only handled scalar types (int/float/utf8/binary),
causing deserialization failures for any schema containing list, struct,
large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types.

This made `arrow_type_to_json` and `convert_json_arrow_type` asymmetric:
serialization worked for all types but deserialization rejected most of them
with "Unsupported Arrow type".

In practice this broke the DuckDB lance extension's fast schema-from-REST
path — tables with list/struct columns fell back to opening the S3 dataset
for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary.

Add support for: float16, large_utf8, large_binary, fixed_size_binary,
decimal32/64/128/256, date32, date64, timestamp, duration, list,
large_list, fixed_size_list, struct, and map.

Add a roundtrip test covering all supported types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jiaoew1991 jiaoew1991 force-pushed the fix/json-arrow-schema-deserialize-nested-types branch from c6bfff3 to 31898f0 Compare March 26, 2026 11:31
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 94.61883% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-namespace/src/schema.rs 94.61% 1 Missing and 11 partials ⚠️

📢 Thoughts on this report? Let us know!

Comment thread rust/lance-namespace/src/schema.rs Outdated
let encoded = json_type.length.unwrap_or(0);
Ok(DataType::Decimal32(
(encoded / 1000) as u8,
(encoded % 1000) as i8,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for decimal the scale can be negative, so we need to restore precision and scale in such way:

let precision = ((encoded + 128) / 1000) as u8;
let scale = (encoded - precision as i64 * 1000) as i8;

need tests for:

  • Decimal32(10, -2)
  • Decimal128(9, -2)
  • Decimal256(38, 10)

…improve test coverage

Fix decimal type deserialization to correctly handle negative scale values.
The previous decode logic (encoded / 1000, encoded % 1000) produced wrong
results for negative scales. Use ((encoded + 128) / 1000) to recover
precision, then derive scale from the remainder.

Add tests for: decimal types with negative scale, schema/field metadata
roundtrip, dictionary type unwrapping, map keys_sorted error, unsupported
type errors (RunEndEncoded, ListView, Utf8View, BinaryView), and nested
list with field metadata.

schema.rs line coverage: 83.87% -> 97.54%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@BubbleCal BubbleCal merged commit 8c99eef into lance-format:main Mar 27, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants