GH-27494: [R] Implement RPrimitiveConverter for Decimal type #15211

paleolimbot · 2023-01-05T19:48:35Z

Implements conversion from R to Decimal128/256 in such a way that it works with Array$create().

Before this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
Array$create(1)$cast(decimal(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]

Array$create(1, type = decimal128(10, 2))
#> Error in `value[[3L]]()`:
#> ! NotImplemented: Extend
#> ℹ You might want to try casting manually with `Array$create(...)$cast(...)`.

#> Backtrace:
#>     ▆
#>  1. └─Array$create(1, type = decimal128(10, 2))
#>  2.   └─base::tryCatch(...) at r/R/array.R:198:2
#>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  5.         └─value[[3L]](cond)
#>  6.           └─rlang::abort(...) at r/R/array.R:202:6
Array$create(1, type = decimal256(10, 2))
#> Error in `value[[3L]]()`:
#> ! NotImplemented: Extend
#> ℹ You might want to try casting manually with `Array$create(...)$cast(...)`.

#> Backtrace:
#>     ▆
#>  1. └─Array$create(1, type = decimal256(10, 2))
#>  2.   └─base::tryCatch(...) at r/R/array.R:198:2
#>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  5.         └─value[[3L]](cond)
#>  6.           └─rlang::abort(...) at r/R/array.R:202:6

^{Created on 2023-01-05 with reprex v2.0.2}

After this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
Array$create(1)$cast(decimal(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]

Array$create(1, type = decimal128(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]
Array$create(1, type = decimal256(10, 2))
#> Array
#> <decimal256(10, 2)>
#> [
#>   1.00
#> ]

^{Created on 2023-01-05 with reprex v2.0.2}

TODO: test!

Closes: [R] Implement RPrimitiveConverter for Decimal type #27494

github-actions · 2023-01-05T19:48:54Z

https://issues.apache.org/jira/browse/ARROW-11631

github-actions · 2023-01-05T19:48:55Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

ianmcook · 2023-01-05T22:02:24Z

Awesomesauce! Thanks @paleolimbot!

thisisnic · 2023-01-11T10:58:12Z

@paleolimbot I've pushed a test here; is this ready for a final review now?

paleolimbot

Awesome!

Just a few comments to test all the branches of the C++. I'm going to try to solicit a review of the C++ too since this is my first time using the Converter API in any meaningful way.

paleolimbot · 2023-01-11T13:46:46Z

r/tests/testthat/test-Array.R

+  decimal_array <- Array$create(1, type = decimal128(10, 2))
+  decimal_array2 <- Array$create(1, type = decimal256(10, 2))


Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))? (The NA is to get full test coverage since there's a branch for nulls in the C++; the truncation is to check that type gets passed through properly).

The purpose of decimal_array2 isn't clear to me here...am I missing something?

The purpose of decimal_array2 isn't clear to me here...am I missing something?

Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).

Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?

I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).

I missed the 128/256 difference!

paleolimbot · 2023-01-11T13:48:01Z

r/tests/testthat/test-Array.R

+  decimal_array <- Array$create(1, type = decimal128(10, 2))
+  decimal_array2 <- Array$create(1, type = decimal256(10, 2))
+
+  expect_equal(


Because there's a branch for ALTREP, maybe decimal_array2 could be Array$create(1:10, type = decimal128(10, 2))?

(In trying my own example here I'm realizing that I didn't consider integer arrays...hang tight and I'll add it to the C++!)

Hmm, interesting...I now get this result:

> Array$create(c(1:10, NA))$cast(decimal128(10, 2)) Error: Invalid: Precision is not great enough for the result. It should be at least 12 /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:828 kernel_->exec(kernel_ctx_, input, out) /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:796 ExecuteSingleSpan(input, &output) /home/nic2/arrow/cpp/src/arrow/compute/function.cc:276 executor->Execute(input, &listener)

Looks like a bug given 10 significant digits is more than enough to represent the integers 1 through 10

Will update the tests to use 12 and open a ticket

westonpace

A few thoughts

r/src/r_to_arrow.cpp

westonpace · 2023-01-11T14:20:16Z

r/tests/testthat/test-Array.R

+  decimal_array <- Array$create(1, type = decimal128(10, 2))
+  decimal_array2 <- Array$create(1, type = decimal256(10, 2))


The purpose of decimal_array2 isn't clear to me here...am I missing something?

Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).

Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?

I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).

westonpace · 2023-01-11T14:20:55Z

r/tests/testthat/test-Array.R

  delete_arrow_array(array_ptr)
 })
+
+test_that("direct creation of Decimal Arrays (ARROW-11631)", {


I'm a little confused. I expected to see an R array of doubles get converted to an Arrow array. I didn't expect to see decimals get created directly.

I'm not sure I know what the distinction is between those two things!

Well, in python it would be something like:

# create an array of python decimals x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')] # convert to an array of arrow decimals arr = pa.array(x) # arr # <pyarrow.lib.Decimal128Array object at 0x7fd6a40dd120> # [ # 12.34, # 23.45 # ]

It looks like there are tests like this in test-Array.R:

test_that("Integer Array", { ints <- c(1:10, 1:10, 1:5) x <- expect_array_roundtrip(ints, int32()) })

Also intest-Array.R I see something like...

expect_error( Array$create(as.double(1:10), type = decimal(4, 2)), "You might want to try casting manually" )

Shouldn't this work now?

Shouldn't this work now?

It does...that test fails and should be updated!

x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')]

Maybe the difference is that R doesn't have decimals (only reals)? Perhaps a less ambiguous title for the test would be "can convert R integer/double to decimal"?

True, you can't do proper round trip if there is no native decimals. I see now my confusion. I read Array$create(1, type = decimal128(10, 2)) too quickly and thought it was creating an Arrow scalar from a single R value. I didn't realize it was treating 1 as an array of size 1.

Co-authored-by: Weston Pace <weston.pace@gmail.com>

paleolimbot · 2023-01-11T16:00:36Z

@thisisnic I think there are also some tests that were counting on decimal conversion to fail, which need updating:

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-Array.R:873'): Array$create() should have helpful error ──────
`Array$create(as.double(1:10), type = decimal(4, 2))` did not throw the expected error.
── Failure ('test-Array.R:878'): Array$create() should have helpful error ──────
`Array$create(1:10, type = decimal(12, 2))` did not throw the expected error.

thisisnic · 2023-01-12T16:12:19Z

Looks like something is broken here:

> Array$create(c(1:10, NA), type = decimal128(12, 2))
Array
<decimal128(12, 2)>
[
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00
]

paleolimbot · 2023-01-13T15:21:02Z

@thisisnic I think you were seeing that as a result of the force-push that overwrote the C++ I added to handle integers. I know you're off today so I merged both branches and added some tests to get this in before feature freeze 🙂

github-actions · 2023-01-15T03:07:56Z

Closes: [R] Implement RPrimitiveConverter for Decimal type #27494

ursabot · 2023-01-15T15:50:30Z

Benchmark runs are scheduled for baseline = fbcaee1 and contender = 19e459f. 19e459f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.3% ⬆️0.03%] test-mac-arm
[Finished ⬇️1.53% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.44% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 19e459f2 ec2-t3-xlarge-us-east-2
[Finished] 19e459f2 test-mac-arm
[Finished] 19e459f2 ursa-i9-9960x
[Finished] 19e459f2 ursa-thinkcentre-m75q
[Finished] fbcaee1e ec2-t3-xlarge-us-east-2
[Finished] fbcaee1e test-mac-arm
[Finished] fbcaee1e ursa-i9-9960x
[Finished] fbcaee1e ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-01-15T15:57:10Z

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

thisisnic · 2023-01-16T10:39:03Z

@thisisnic I think you were seeing that as a result of the force-push that overwrote the C++ I added to handle integers. I know you're off today so I merged both branches and added some tests to get this in before feature freeze slightly_smiling_face

Thanks for making that update, and sorry about the overwrite, brain fart!

add decimal extend method

b7c8f51

github-actions bot added the Component: R label Jan 5, 2023

asfimport mentioned this pull request Jan 5, 2023

[R] Implement RPrimitiveConverter for Decimal type #27494

Closed

Add test for decimal array creation

2e1368e

paleolimbot marked this pull request as ready for review January 11, 2023 13:43

paleolimbot commented Jan 11, 2023

View reviewed changes

westonpace reviewed Jan 11, 2023

View reviewed changes

paleolimbot and others added 3 commits January 11, 2023 11:08

Update r/src/r_to_arrow.cpp

3c1c07a

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update r/src/r_to_arrow.cpp

5bf84e2

Co-authored-by: Weston Pace <weston.pace@gmail.com>

format + make sure integers work

3b62f99

thisisnic force-pushed the decimal-extend branch from 3b62f99 to 2e1368e Compare January 12, 2023 14:56

expand test coverage

68aff97

paleolimbot force-pushed the decimal-extend branch from c1ccfed to 68aff97 Compare January 13, 2023 15:14

fix equality

1b33cce

paleolimbot changed the title ~~ARROW-11631: [R] Implement RPrimitiveConverter for Decimal type~~ GH-27494: [R] Implement RPrimitiveConverter for Decimal type Jan 15, 2023

paleolimbot merged commit 19e459f into apache:master Jan 15, 2023

paleolimbot deleted the decimal-extend branch January 16, 2023 14:37

		decimal_array <- Array$create(1, type = decimal128(10, 2))
		decimal_array2 <- Array$create(1, type = decimal256(10, 2))

GH-27494: [R] Implement RPrimitiveConverter for Decimal type #15211

GH-27494: [R] Implement RPrimitiveConverter for Decimal type #15211

Uh oh!

Conversation

paleolimbot commented Jan 5, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2023

Uh oh!

github-actions bot commented Jan 5, 2023

Uh oh!

ianmcook commented Jan 5, 2023

Uh oh!

thisisnic commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paleolimbot commented Jan 11, 2023

Uh oh!

thisisnic commented Jan 12, 2023

Uh oh!

paleolimbot commented Jan 13, 2023

Uh oh!

github-actions bot commented Jan 15, 2023

Uh oh!

ursabot commented Jan 15, 2023

Uh oh!

ursabot commented Jan 15, 2023

Uh oh!

thisisnic commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

paleolimbot commented Jan 5, 2023 •

edited by github-actions bot

Loading

thisisnic commented Jan 11, 2023 •

edited

Loading