fix(namespace): support nested types in convert_json_arrow_type by jiaoew1991 · Pull Request #6300 · lance-format/lance

jiaoew1991 · 2026-03-26T10:23:32Z

convert_json_arrow_type only handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types.

This made arrow_type_to_json and convert_json_arrow_type asymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type".

In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary.

Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map.

Add a roundtrip test covering all supported types.

…ABLES When DuckDB runs SHOW ALL TABLES, the lance extension calls GetDefaultEntries (list_tables) then CreateDefaultEntry (describe_table) for each table sequentially. With ~1800 tables, this took ~600s. Changes: - list_tables_inner now concurrently describes all tables (10 at a time) to prefetch their JSON schemas alongside the table names - GetDefaultEntries parses the prefetched schemas into a cache - CreateDefaultEntry checks the cache before making HTTP calls - Add tmp_schema module with convert_json_arrow_type that supports nested types (list, struct, large_binary, etc.) — the upstream lance-namespace only handles scalars (fix pending in lance-format/lance#6300) Result: SHOW ALL TABLES drops from ~600s to ~21s (~28x speedup). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

`convert_json_arrow_type` only handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types. This made `arrow_type_to_json` and `convert_json_arrow_type` asymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type". In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary. Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map. Add a roundtrip test covering all supported types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-03-26T12:02:33Z

Codecov Report

❌ Patch coverage is 94.61883% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-namespace/src/schema.rs	94.61%	1 Missing and 11 partials ⚠️

📢 Thoughts on this report? Let us know!

BubbleCal · 2026-03-26T15:16:58Z

+            let encoded = json_type.length.unwrap_or(0);
+            Ok(DataType::Decimal32(
+                (encoded / 1000) as u8,
+                (encoded % 1000) as i8,


for decimal the scale can be negative, so we need to restore precision and scale in such way:

let precision = ((encoded + 128) / 1000) as u8; let scale = (encoded - precision as i64 * 1000) as i8;

need tests for:

Decimal32(10, -2)

Decimal128(9, -2)

Decimal256(38, 10)

…improve test coverage Fix decimal type deserialization to correctly handle negative scale values. The previous decode logic (encoded / 1000, encoded % 1000) produced wrong results for negative scales. Use ((encoded + 128) / 1000) to recover precision, then derive scale from the remainder. Add tests for: decimal types with negative scale, schema/field metadata roundtrip, dictionary type unwrapping, map keys_sorted error, unsupported type errors (RunEndEncoded, ListView, Utf8View, BinaryView), and nested list with field metadata. schema.rs line coverage: 83.87% -> 97.54% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added the bug Something isn't working label Mar 26, 2026

jiaoew1991 force-pushed the fix/json-arrow-schema-deserialize-nested-types branch from 895ed07 to c6bfff3 Compare March 26, 2026 11:26

jiaoew1991 and others added 2 commits March 26, 2026 19:29

style: cargo fmt

31898f0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jiaoew1991 force-pushed the fix/json-arrow-schema-deserialize-nested-types branch from c6bfff3 to 31898f0 Compare March 26, 2026 11:31

BubbleCal reviewed Mar 26, 2026

View reviewed changes

BubbleCal approved these changes Mar 27, 2026

View reviewed changes

BubbleCal merged commit 8c99eef into lance-format:main Mar 27, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(namespace): support nested types in convert_json_arrow_type#6300

fix(namespace): support nested types in convert_json_arrow_type#6300
BubbleCal merged 3 commits intolance-format:mainfrom
jiaoew1991:fix/json-arrow-schema-deserialize-nested-types

jiaoew1991 commented Mar 26, 2026

Uh oh!

codecov Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

BubbleCal Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiaoew1991 commented Mar 26, 2026

Uh oh!

codecov Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

BubbleCal Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Mar 26, 2026 •

edited

Loading