feat: add an arrow-scalar crate with a scalar definition#5845
feat: add an arrow-scalar crate with a scalar definition#5845westonpace wants to merge 1 commit intolance-format:mainfrom
Conversation
|
Here I've used an enum approach (similar to There is another approach which could result in even less code. Instead of an enum we could say that all scalars are simply |
Code ReviewThis PR introduces a new P0 Issues
P1 Issues
Minor Notes
|
|
Leaving in draft as I still want to do more review (this is mostly vibe coded at the moment) |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
0f25c62 to
430ddfc
Compare
|
Superseded by #5955 |
There are places in lance-file and lance-index where we pull in datafusion just so that we can have access to
ScalarValue(we may useExprinlance-indextoo so maybe not the best example). I'd like to not need to pull in Datafusion to get access to scalars.In addition, two different definitions for binary serialization of arrow scalars has popped up recently. First, in the constant encoding and second in the column statistics. Although in column statistics we might be able to avoid the need for scalar serialization and in the constant encoding we shouldn't need the full set of arrow types (but it wouldn't hurt).
It's enough I toyed around with making a small standalone lightweight crate to provide scalars.
If we use this then I'd also like to make a small statistics crate with a an accumulator that keeps track of min/max/nan count/null count/etc. We could use this in some of the file encoding spots as well as some of the column statistics work (such an accumulator needs a definition for scalar because that is the min/max output)