feat: add an arrow-stats crate with the ability to calculate basic stats on arrays#5967
Conversation
PR Review: arrow-stats crateGood addition that consolidates statistics calculation with a lightweight dependency footprint (no DataFusion required). P1: Missing README.mdThe Fix: Either add a README.md or remove the Observations (non-blocking)
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
78d69f1 to
232b612
Compare
| //! | ||
| //! # List Types | ||
| //! | ||
| //! List types are supported. The `item_nulls` field will be set to the number of null items within list entries. |
There was a problem hiding this comment.
I think nested list is also not supported?
|
@westonpace added some minor comments, overall code LGTM |
|
@westonpace gentle ping |
232b612 to
81f530c
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Recurse through nested list types (List, LargeList, FixedSizeList) to compute min/max over leaf values and track item_nulls at every nesting level. Previously nested lists fell through to the unsupported-type fallback, yielding None for min/max. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove redundant `ref mut` and `ref` binding modifiers that are now disallowed under the Rust 2024 default binding mode rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@westonpace good to go! |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the review @HaochengLIU ! Next step: update the column statistics PR to use lance_arrow_scalar and lance_arrow_stats |
sure thing! I've semi-settled in in my new environment, let me know if there is anything I can help |
We currently have stats spread out in a few places and they all require Datafusion which is a kind of heavy dependency. With the addition of column statistics (#5639) we were going to get yet another implementation. Instead I have made a simple standalone stats crate which does not require Datafusion.