fix(namespace): support nested types in convert_json_arrow_type#6300
Merged
BubbleCal merged 3 commits intolance-format:mainfrom Mar 27, 2026
Merged
Conversation
jiaoew1991
added a commit
to jiaoew1991/lance-duckdb
that referenced
this pull request
Mar 26, 2026
…ABLES When DuckDB runs SHOW ALL TABLES, the lance extension calls GetDefaultEntries (list_tables) then CreateDefaultEntry (describe_table) for each table sequentially. With ~1800 tables, this took ~600s. Changes: - list_tables_inner now concurrently describes all tables (10 at a time) to prefetch their JSON schemas alongside the table names - GetDefaultEntries parses the prefetched schemas into a cache - CreateDefaultEntry checks the cache before making HTTP calls - Add tmp_schema module with convert_json_arrow_type that supports nested types (list, struct, large_binary, etc.) — the upstream lance-namespace only handles scalars (fix pending in lance-format/lance#6300) Result: SHOW ALL TABLES drops from ~600s to ~21s (~28x speedup). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
895ed07 to
c6bfff3
Compare
`convert_json_arrow_type` only handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types. This made `arrow_type_to_json` and `convert_json_arrow_type` asymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type". In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary. Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map. Add a roundtrip test covering all supported types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c6bfff3 to
31898f0
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
BubbleCal
reviewed
Mar 26, 2026
| let encoded = json_type.length.unwrap_or(0); | ||
| Ok(DataType::Decimal32( | ||
| (encoded / 1000) as u8, | ||
| (encoded % 1000) as i8, |
Contributor
There was a problem hiding this comment.
for decimal the scale can be negative, so we need to restore precision and scale in such way:
let precision = ((encoded + 128) / 1000) as u8;
let scale = (encoded - precision as i64 * 1000) as i8;
need tests for:
- Decimal32(10, -2)
- Decimal128(9, -2)
- Decimal256(38, 10)
…improve test coverage Fix decimal type deserialization to correctly handle negative scale values. The previous decode logic (encoded / 1000, encoded % 1000) produced wrong results for negative scales. Use ((encoded + 128) / 1000) to recover precision, then derive scale from the remainder. Add tests for: decimal types with negative scale, schema/field metadata roundtrip, dictionary type unwrapping, map keys_sorted error, unsupported type errors (RunEndEncoded, ListView, Utf8View, BinaryView), and nested list with field metadata. schema.rs line coverage: 83.87% -> 97.54% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BubbleCal
approved these changes
Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
convert_json_arrow_typeonly handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types.This made
arrow_type_to_jsonandconvert_json_arrow_typeasymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type".In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary.
Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map.
Add a roundtrip test covering all supported types.