docs: add schema data types and field IDs documentation#5925
Merged
wjones127 merged 5 commits intolance-format:mainfrom Feb 17, 2026
Merged
docs: add schema data types and field IDs documentation#5925wjones127 merged 5 commits intolance-format:mainfrom
wjones127 merged 5 commits intolance-format:mainfrom
Conversation
ae6b557 to
c7ca59d
Compare
wjones127
requested changes
Feb 10, 2026
Contributor
wjones127
left a comment
There was a problem hiding this comment.
Got a few minor suggestions, but otherwise this looks pretty good. Thank you for working on this!
| Field IDs can be used in several contexts: | ||
|
|
||
| 1. **Data File References**: Specify which columns are present in each data file | ||
| 2. **Deletion Tracking**: Reference specific columns when applying deletions |
Contributor
There was a problem hiding this comment.
Could you explain more what this means? I'm not sure what it is talking about.
| - **Stable**: IDs are preserved across schema evolution operations | ||
| - **Sparse**: Field IDs may not form a contiguous sequence after schema evolution | ||
|
|
||
| ### Using Field IDs |
Contributor
There was a problem hiding this comment.
Maybe we should just replace this section with the sentence "When referencing fields internally within the format, use the field ids rather than field names or positions."
| - **Drop Column**: Remove field from schema; its ID may be reused in some systems | ||
| - **Rename Column**: Change field name; ID remains the same | ||
| - **Reorder Columns**: Change field order in schema; IDs remain the same | ||
| - **Type Evolution**: Subject to compatibility rules defined by Apache Arrow |
Contributor
There was a problem hiding this comment.
Suggested change
| - **Type Evolution**: Subject to compatibility rules defined by Apache Arrow | |
| - **Type Evolution**: Data type can be changed. This might require rewriting the column in the data, depending on how the type was changed. |
Add comprehensive documentation of Lance schema format including: - Complete reference of all supported data types and their string representations - Mapping between logical types and Apache Arrow types - Field ID assignment and evolution semantics - Field metadata configuration options - Schema examples for common use cases This resolves the gap identified in issue lance-format#5707 by providing detailed specification of what data types are supported and how they map to Arrow types. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Complete the schema.md documentation with: - Primary Key Metadata section that links to index.md - All Arrow types properly formatted with backticks for consistency - Note section referencing discussions lance-format#5864 and lance-format#5817 on logical type simplification - Comprehensive coverage of data types, field IDs, metadata, and examples This resolves lance-format#5707 by providing a complete specification of the schema format including supported data types (with Arrow type mappings), field ID system, field metadata configuration, and practical examples. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…tation Cross-check against actual code revealed discrepancies in protobuf Field definition: 1. Corrected Field message documentation: - Separated `type` (enum: PARENT/REPEATED/LEAF) from `logical_type` (string) - Added missing `parent_id` field for nested field relationships - Added `unenforced_primary_key` and `unenforced_primary_key_position` fields - Corrected metadata type from map<string, string> to map<string, bytes> 2. Enhanced nested field explanation: - Clarified how parent_id links child fields to parent - Updated field ID assignment example to show parent_id relationships - Added note about parent_id=0 for top-level fields 3. Updated Primary Key Metadata section: - Changed from metadata reference to direct protobuf field documentation - Documented both unenforced_primary_key and position fields 4. Improved examples: - Updated all example schemas to use logical_type - Changed Primary Key example to use protobuf fields instead of metadata - Updated nested structure example to show parent_id relationships - Added clarifying note about simplified representation vs actual protobuf format Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Critical fix discovered during comprehensive code validation:
Corrected parent_id value for top-level fields from 0 to -1:
- Top-level fields (no parent) have parent_id: -1
- Nested fields have parent_id: <parent_field_id>
This matches the actual implementation in:
- lance-file/src/datatypes.rs: if f.parent_id == -1 { ... }
- Field deserialization logic uses -1 to detect top-level fields
Updated:
1. Field ID assignment examples (all top-level: parent_id: -1)
2. All example schemas (Simple Table, Nested Structure, Vector Embeddings)
3. Protobuf Field definition documentation
4. Field ID Assignment section explanation
5. Note about children vector in Rust in-memory representation
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
c7ca59d to
bc6dbd1
Compare
wjones127
approved these changes
Feb 17, 2026
Contributor
wjones127
left a comment
There was a problem hiding this comment.
This looks good now. Will merge once CI is finished. Thanks for working on this! 😄
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR documents the supported data types in Lance schemas and field ID semantics.
Changes
How to use
Users can now refer to the schema documentation for understanding data type representations and field ID behavior.