-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-27494: [R] Implement RPrimitiveConverter for Decimal type #15211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
Awesomesauce! Thanks @paleolimbot! |
|
@paleolimbot I've pushed a test here; is this ready for a final review now? |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Just a few comments to test all the branches of the C++. I'm going to try to solicit a review of the C++ too since this is my first time using the Converter API in any meaningful way.
r/tests/testthat/test-Array.R
Outdated
| decimal_array <- Array$create(1, type = decimal128(10, 2)) | ||
| decimal_array2 <- Array$create(1, type = decimal256(10, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))? (The NA is to get full test coverage since there's a branch for nulls in the C++; the truncation is to check that type gets passed through properly).
The purpose of decimal_array2 isn't clear to me here...am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of decimal_array2 isn't clear to me here...am I missing something?
Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).
Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?
I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the 128/256 difference!
| decimal_array <- Array$create(1, type = decimal128(10, 2)) | ||
| decimal_array2 <- Array$create(1, type = decimal256(10, 2)) | ||
|
|
||
| expect_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's a branch for ALTREP, maybe decimal_array2 could be Array$create(1:10, type = decimal128(10, 2))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In trying my own example here I'm realizing that I didn't consider integer arrays...hang tight and I'll add it to the C++!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting...I now get this result:
> Array$create(c(1:10, NA))$cast(decimal128(10, 2))
Error: Invalid: Precision is not great enough for the result. It should be at least 12
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:828 kernel_->exec(kernel_ctx_, input, out)
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:796 ExecuteSingleSpan(input, &output)
/home/nic2/arrow/cpp/src/arrow/compute/function.cc:276 executor->Execute(input, &listener)
Looks like a bug given 10 significant digits is more than enough to represent the integers 1 through 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the tests to use 12 and open a ticket
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts
r/tests/testthat/test-Array.R
Outdated
| decimal_array <- Array$create(1, type = decimal128(10, 2)) | ||
| decimal_array2 <- Array$create(1, type = decimal256(10, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of decimal_array2 isn't clear to me here...am I missing something?
Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).
Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?
I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).
r/tests/testthat/test-Array.R
Outdated
| delete_arrow_array(array_ptr) | ||
| }) | ||
|
|
||
| test_that("direct creation of Decimal Arrays (ARROW-11631)", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused. I expected to see an R array of doubles get converted to an Arrow array. I didn't expect to see decimals get created directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I know what the distinction is between those two things!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in python it would be something like:
# create an array of python decimals
x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')]
# convert to an array of arrow decimals
arr = pa.array(x)
# arr
# <pyarrow.lib.Decimal128Array object at 0x7fd6a40dd120>
# [
# 12.34,
# 23.45
# ]
It looks like there are tests like this in test-Array.R:
test_that("Integer Array", {
ints <- c(1:10, 1:10, 1:5)
x <- expect_array_roundtrip(ints, int32())
})
Also intest-Array.R I see something like...
expect_error(
Array$create(as.double(1:10), type = decimal(4, 2)),
"You might want to try casting manually"
)
Shouldn't this work now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this work now?
It does...that test fails and should be updated!
x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')]
Maybe the difference is that R doesn't have decimals (only reals)? Perhaps a less ambiguous title for the test would be "can convert R integer/double to decimal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, you can't do proper round trip if there is no native decimals. I see now my confusion. I read Array$create(1, type = decimal128(10, 2)) too quickly and thought it was creating an Arrow scalar from a single R value. I didn't realize it was treating 1 as an array of size 1.
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
|
@thisisnic I think there are also some tests that were counting on decimal conversion to fail, which need updating: |
3b62f99 to
2e1368e
Compare
|
Looks like something is broken here: |
c1ccfed to
68aff97
Compare
|
@thisisnic I think you were seeing that as a result of the force-push that overwrote the C++ I added to handle integers. I know you're off today so I merged both branches and added some tests to get this in before feature freeze 🙂 |
|
Benchmark runs are scheduled for baseline = fbcaee1 and contender = 19e459f. 19e459f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
Thanks for making that update, and sorry about the overwrite, brain fart! |
Implements conversion from R to Decimal128/256 in such a way that it works with
Array$create().Before this PR:
Created on 2023-01-05 with reprex v2.0.2
After this PR:
Created on 2023-01-05 with reprex v2.0.2
TODO: test!