Skip to content

feat: add an arrow-stats crate with the ability to calculate basic stats on arrays#5967

Merged
westonpace merged 7 commits intolance-format:mainfrom
westonpace:feat/arrow-stats
Apr 1, 2026
Merged

feat: add an arrow-stats crate with the ability to calculate basic stats on arrays#5967
westonpace merged 7 commits intolance-format:mainfrom
westonpace:feat/arrow-stats

Conversation

@westonpace
Copy link
Copy Markdown
Member

We currently have stats spread out in a few places and they all require Datafusion which is a kind of heavy dependency. With the addition of column statistics (#5639) we were going to get yet another implementation. Instead I have made a simple standalone stats crate which does not require Datafusion.

@github-actions github-actions Bot added the enhancement New feature or request label Feb 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: arrow-stats crate

Good addition that consolidates statistics calculation with a lightweight dependency footprint (no DataFusion required).

P1: Missing README.md

The Cargo.toml references readme = "README.md" but no README file is included in the PR. This will cause issues when publishing the crate.

Fix: Either add a README.md or remove the readme line from Cargo.toml.

Observations (non-blocking)

  • Test coverage is excellent with both unit tests and property-based tests
  • The Float16 support via half crate is well-handled
  • The total_cmp usage for floats correctly handles -0.0 vs 0.0 ordering

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 91.48936% with 56 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/arrow-stats/src/lib.rs 91.70% 40 Missing and 14 partials ⚠️
rust/arrow-stats/src/nan.rs 71.42% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread rust/arrow-stats/Cargo.toml
//!
//! # List Types
//!
//! List types are supported. The `item_nulls` field will be set to the number of null items within list entries.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nested list is also not supported?

Comment thread rust/arrow-stats/src/lib.rs
Comment thread rust/arrow-stats/Cargo.toml
@HaochengLIU
Copy link
Copy Markdown
Member

@westonpace added some minor comments, overall code LGTM

@HaochengLIU
Copy link
Copy Markdown
Member

@westonpace gentle ping

westonpace and others added 4 commits March 31, 2026 05:58
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Recurse through nested list types (List, LargeList, FixedSizeList) to
compute min/max over leaf values and track item_nulls at every nesting
level. Previously nested lists fell through to the unsupported-type
fallback, yielding None for min/max.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove redundant `ref mut` and `ref` binding modifiers that are now
disallowed under the Rust 2024 default binding mode rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@HaochengLIU HaochengLIU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now

@HaochengLIU
Copy link
Copy Markdown
Member

@westonpace good to go!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@westonpace westonpace merged commit 21ae66b into lance-format:main Apr 1, 2026
28 checks passed
@westonpace
Copy link
Copy Markdown
Member Author

Thanks for the review @HaochengLIU !

Next step: update the column statistics PR to use lance_arrow_scalar and lance_arrow_stats

@HaochengLIU
Copy link
Copy Markdown
Member

Thanks for the review @HaochengLIU !

Next step: update the column statistics PR to use lance_arrow_scalar and lance_arrow_stats

sure thing! I've semi-settled in in my new environment, let me know if there is anything I can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants