Skip to content

feat: create an arrow-scalar crate utilizing arrow-row and arrow-data#5955

Merged
westonpace merged 4 commits intolance-format:mainfrom
westonpace:feat/arrow-scalar-arr
Feb 18, 2026
Merged

feat: create an arrow-scalar crate utilizing arrow-row and arrow-data#5955
westonpace merged 4 commits intolance-format:mainfrom
westonpace:feat/arrow-scalar-arr

Conversation

@westonpace
Copy link
Copy Markdown
Member

This provides an ArrowScalar which supports:

  • Implementations of Ord, Hash, and Eq
  • Stable serialization to/from binary

The Ord/Eq implementations come from arrow_row which should provide the same comparison logic as arrow-rs compute functions (and thus Datafusion)

The binary serialization uses the C data interface for optional serialization of the data type and then dumps the array data as-is. This should be a stable format.

My primary goal with this PR was to minimize the amount of code required (compared to something like datafusion's ScalarValue to keep maintenance simple. There may be more performant options but since I don't expect scalars to be used in hot spots I think this will be ok. Also, the public API surface of ArrowScalar is pretty small so if we need to replace the internal implementation later for performance we can do so.

@github-actions github-actions Bot added the enhancement New feature or request label Feb 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: arrow-scalar crate

This PR introduces a new ArrowScalar type with Ord, Hash, and Eq support via arrow_row. Overall, the design is clean and the code is well-tested.

P0/P1 Issues

1. (P1) Missing BinaryView/Utf8View serialization test coverage
serde.rs has format string mappings for BinaryView/Utf8View (lines 643-644, 698-699), but there's no roundtrip test for these types. Views have variadic buffers (inline vs. out-of-line data), and the current buffer serialization may not correctly handle this. Consider adding a test to verify these types serialize/deserialize correctly, or explicitly document that views are unsupported for encoding.

2. (P1) Potential panic on invalid buffer counts during decode
In decode_with_options, the code builds ArrayData with whatever buffers are in the encoded bytes (rust/arrow-scalar/src/serde.rs:895-900). If the buffer count doesn't match what the DataType expects, builder.build() will fail. This is handled as an error, but the error message from Arrow may be unclear. Consider validating expected buffer counts before building to provide a clearer error message.

3. (P0) Boolean arrays have no value buffer when using packed bits
For DataType::Boolean, the value data is stored in a bitmap buffer which may not be captured correctly by iterating data.buffers(). The current approach copies data.buffers() which for boolean arrays may be empty or not include the validity buffer correctly. This could cause boolean scalars to fail encode/decode roundtrip. Please verify with a test:

let b = ArrowScalar::from(true);
let encoded = b.encode().unwrap();
let decoded = ArrowScalar::decode(&encoded).unwrap();
assert_eq!(b, decoded);

Minor Notes (not blocking)

  • The crate version is hardcoded to 57.0.0 which matches current arrow-rs, but consider whether this should track lance's versioning scheme instead.
  • The README.md referenced in Cargo.toml doesn't exist yet.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 84.28373% with 113 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/arrow-scalar/src/serde.rs 67.36% 81 Missing and 28 partials ⚠️
rust/arrow-scalar/src/lib.rs 98.81% 0 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Comment on lines +14 to +15
[dependencies]
arrow-array = { workspace = true }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be defensive, might want to leave a brief comment that you want to keep the dependency list here small.

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable and small.

@westonpace westonpace merged commit 884d18e into lance-format:main Feb 18, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants