-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-4193: [Rust] Add support for decimal data type #8640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
|
@jorgecarleitao @nevi-me may I ask you for a review? From a functionality point of view I think this is more or less done, however I am very much at the beginning of writing stuff in Rust, so help is much appreciated! |
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wow! Impressive work, @sweb. Virtually flawless in my opinion.
You really grasped all the ideas on the crate and implemented this really well. Congratulations! 💯
P.S. Thank you very much for your first contribution!
@nevi-me or @alamb , could you take a look, particularly on the bit ops, that I am less confident about?
|
@jorgecarleitao I will put it on my queue for tomorrow. Hopefully the morning |
|
Hi @sweb, I'm providing general comments, I'll look at this in detail over the days. I see that you're using On the JIRA itself, the plan was to add the type, then also:
I'm fine with us doing the above as follow-ups, as they'd be quite involved. |
Thank you very much for your first feedback! I can try to create a second pull request for adding the IPC reader / writer after this one. I didn't think of it since I mainly use arrow as a processing layer after reading parquet files and never use the IPC parts. I have some ideas concerning the conversion from f64 to decimal, but I will try to find a reference implementation in the Python sources - maybe we can borrow some ideas from there. |
|
@jorgecarleitao I didn't have a chance to get to this PR today in my allotted time budget, but it sounds like @nevi-me may be planning on doing so, so I am going to lower the priority. I'll try and read it more carefully shortly |
I am sorry, I did not mean to push for reviewing to neither of you. It was just a comment to prepare @sweb that there is a part of the code that I can't review well and thus I am relying on some help ❤️ |
|
@jorgecarleitao could you check the following strange thing I cannot explain to myself: I added two additional asserts in From my current understanding, these slices should not be the same, since one is [-8887, None, None] and the other one is [15887, None, None]. However the test states that they are equal. Same for the fixed_binary test. Do you have an idea what I am missing? It is very likely that I do not really understand how the offsets work and that these checks should in fact evaluate to Thanks! |
|
@sweb , it is a bug in the equality itself. I have a PR for it, hang on a sec. |
|
PR for it: #8695 |
|
@jorgecarleitao Thanks for the quick feedback! I will remove the two asserts since this is already addressed in your PR. |
|
@jorgecarleitao may I ask why you closed this PR? :) |
|
I merged it. Wasn't it ready? |
|
Ah sorry, I missed the merge and thought it was just closed. Thank you for merging! |
|
@jorgecarleitao In closing remarks, I just wanted to say that this was the most pleasant contribution experience I had so far on an open source project. Thank you very much for this. |
|
There's some follow-up to eventually support decimals in the integration testing (and Parquet reader and writer). Would you be interested in assisting us with that @sweb? You can open PRs on the Integration testing umbrella JIRA, or as individual/standalone ones. I'm fine with either appraoch. Thanks |
|
I've created https://issues.apache.org/jira/browse/ARROW-10674 |
|
@nevi-me I will gladly continue to work on supporting decimals. I will start with the IPC reader / writer. |
…r/Writer for Decimal type to allow integration tests This is a follow up to #8640 Currently, there is a first working IPC reader/writer test using data from `testing/arrow-ipc-stream/integration/0.14.1/generated_decimal.arrow_file` However, this lead me to discover that my first decimal type implementation is wrong, in that it uses BigEndian, whereas this is parquet specific and therefore should not be used in arrow/array and so on. I will try to address this in this PR as well. Closes #8784 from sweb/rust-decimal-ipc Authored-by: Florian Müller <florian@tomueller.de> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
This PR adds support for the decimal data type by adding
DataType::Decimal,array::DecimalArrayandbuilder::DecimalBuilder.The implementation is heavily based on
FixedSizeBinaryArrayin order to store values as fixed size binary arrays. However, values are returned asi128and the builder expectsi128values for constructing aDecimalArray.Some additional notes:
precisionfield. The necessary calculation is implemented as a static method inarray::DecimalArray. This is probably the wrong place for it.i128, even though we could check the precision and potentially return a smaller integer.DecimalArraya bit more difficult but I did not want to introduce new dependencies in this change."5000". This is due to the fact that I am not sure whether Json runs into problems when representing integers beyond i64. For starting I assumed that this way is safer.