Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jan 5, 2023

Implements conversion from R to Decimal128/256 in such a way that it works with Array$create().

Before this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
Array$create(1)$cast(decimal(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]

Array$create(1, type = decimal128(10, 2))
#> Error in `value[[3L]]()`:
#> ! NotImplemented: Extend
#> ℹ You might want to try casting manually with `Array$create(...)$cast(...)`.

#> Backtrace:
#>     ▆
#>  1. └─Array$create(1, type = decimal128(10, 2))
#>  2.   └─base::tryCatch(...) at r/R/array.R:198:2
#>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  5.         └─value[[3L]](cond)
#>  6.           └─rlang::abort(...) at r/R/array.R:202:6
Array$create(1, type = decimal256(10, 2))
#> Error in `value[[3L]]()`:
#> ! NotImplemented: Extend
#> ℹ You might want to try casting manually with `Array$create(...)$cast(...)`.

#> Backtrace:
#>     ▆
#>  1. └─Array$create(1, type = decimal256(10, 2))
#>  2.   └─base::tryCatch(...) at r/R/array.R:198:2
#>  3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  5.         └─value[[3L]](cond)
#>  6.           └─rlang::abort(...) at r/R/array.R:202:6

Created on 2023-01-05 with reprex v2.0.2

After this PR:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
Array$create(1)$cast(decimal(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]

Array$create(1, type = decimal128(10, 2))
#> Array
#> <decimal128(10, 2)>
#> [
#>   1.00
#> ]
Array$create(1, type = decimal256(10, 2))
#> Array
#> <decimal256(10, 2)>
#> [
#>   1.00
#> ]

Created on 2023-01-05 with reprex v2.0.2

TODO: test!

@github-actions
Copy link

github-actions bot commented Jan 5, 2023

@github-actions
Copy link

github-actions bot commented Jan 5, 2023

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@ianmcook
Copy link
Member

ianmcook commented Jan 5, 2023

Awesomesauce! Thanks @paleolimbot!

@thisisnic
Copy link
Member

thisisnic commented Jan 11, 2023

@paleolimbot I've pushed a test here; is this ready for a final review now?

@paleolimbot paleolimbot marked this pull request as ready for review January 11, 2023 13:43
Copy link
Member Author

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Just a few comments to test all the branches of the C++. I'm going to try to solicit a review of the C++ too since this is my first time using the Converter API in any meaningful way.

Comment on lines 1327 to 1328
decimal_array <- Array$create(1, type = decimal128(10, 2))
decimal_array2 <- Array$create(1, type = decimal256(10, 2))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))? (The NA is to get full test coverage since there's a branch for nulls in the C++; the truncation is to check that type gets passed through properly).

The purpose of decimal_array2 isn't clear to me here...am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of decimal_array2 isn't clear to me here...am I missing something?

Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).

Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?

I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the 128/256 difference!

decimal_array <- Array$create(1, type = decimal128(10, 2))
decimal_array2 <- Array$create(1, type = decimal256(10, 2))

expect_equal(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's a branch for ALTREP, maybe decimal_array2 could be Array$create(1:10, type = decimal128(10, 2))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In trying my own example here I'm realizing that I didn't consider integer arrays...hang tight and I'll add it to the C++!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting...I now get this result:

> Array$create(c(1:10, NA))$cast(decimal128(10, 2))
Error: Invalid: Precision is not great enough for the result. It should be at least 12
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:828  kernel_->exec(kernel_ctx_, input, out)
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:796  ExecuteSingleSpan(input, &output)
/home/nic2/arrow/cpp/src/arrow/compute/function.cc:276  executor->Execute(input, &listener)

Looks like a bug given 10 significant digits is more than enough to represent the integers 1 through 10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the tests to use 12 and open a ticket

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts

Comment on lines 1327 to 1328
decimal_array <- Array$create(1, type = decimal128(10, 2))
decimal_array2 <- Array$create(1, type = decimal256(10, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of decimal_array2 isn't clear to me here...am I missing something?

Didn't you write this? My interpretation is that decimal_array2 is meant to verify the decimal256 type works (in addition to the 128 bit variant).

Could the test Array include an NA and something whose precision might get truncated (i.e, maybe Array$create(c(1, 1 / 3, NA), type = decimal128(10, 2)))?

I think truncation is less interesting to me than NA. Testing truncation would just be verifying that FromReal works correctly and we can assume it does (or, if we have concerns, should test it elsewhere). Testing NA will test that R's NA is getting properly recognized (e.g. adding coverage for append_null).

delete_arrow_array(array_ptr)
})

test_that("direct creation of Decimal Arrays (ARROW-11631)", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. I expected to see an R array of doubles get converted to an Arrow array. I didn't expect to see decimals get created directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I know what the distinction is between those two things!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in python it would be something like:

# create an array of python decimals
x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')]
# convert to an array of arrow decimals
arr = pa.array(x)
# arr
# <pyarrow.lib.Decimal128Array object at 0x7fd6a40dd120>
# [
#   12.34,
#   23.45
# ]

It looks like there are tests like this in test-Array.R:

test_that("Integer Array", {
  ints <- c(1:10, 1:10, 1:5)
  x <- expect_array_roundtrip(ints, int32())
})

Also intest-Array.R I see something like...

  expect_error(
    Array$create(as.double(1:10), type = decimal(4, 2)),
    "You might want to try casting manually"
  )

Shouldn't this work now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this work now?

It does...that test fails and should be updated!

x = [decimal.Decimal('12.34'), decimal.Decimal('23.45')]

Maybe the difference is that R doesn't have decimals (only reals)? Perhaps a less ambiguous title for the test would be "can convert R integer/double to decimal"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, you can't do proper round trip if there is no native decimals. I see now my confusion. I read Array$create(1, type = decimal128(10, 2)) too quickly and thought it was creating an Arrow scalar from a single R value. I didn't realize it was treating 1 as an array of size 1.

paleolimbot and others added 3 commits January 11, 2023 11:08
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@paleolimbot
Copy link
Member Author

@thisisnic I think there are also some tests that were counting on decimal conversion to fail, which need updating:

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-Array.R:873'): Array$create() should have helpful error ──────
`Array$create(as.double(1:10), type = decimal(4, 2))` did not throw the expected error.
── Failure ('test-Array.R:878'): Array$create() should have helpful error ──────
`Array$create(1:10, type = decimal(12, 2))` did not throw the expected error.

@thisisnic
Copy link
Member

Looks like something is broken here:

> Array$create(c(1:10, NA), type = decimal128(12, 2))
Array
<decimal128(12, 2)>
[
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00,
  0.00
]

@paleolimbot
Copy link
Member Author

@thisisnic I think you were seeing that as a result of the force-push that overwrote the C++ I added to handle integers. I know you're off today so I merged both branches and added some tests to get this in before feature freeze 🙂

@paleolimbot paleolimbot changed the title ARROW-11631: [R] Implement RPrimitiveConverter for Decimal type GH-27494: [R] Implement RPrimitiveConverter for Decimal type Jan 15, 2023
@github-actions
Copy link

@paleolimbot paleolimbot merged commit 19e459f into apache:master Jan 15, 2023
@ursabot
Copy link

ursabot commented Jan 15, 2023

Benchmark runs are scheduled for baseline = fbcaee1 and contender = 19e459f. 19e459f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.3% ⬆️0.03%] test-mac-arm
[Finished ⬇️1.53% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.44% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 19e459f2 ec2-t3-xlarge-us-east-2
[Finished] 19e459f2 test-mac-arm
[Finished] 19e459f2 ursa-i9-9960x
[Finished] 19e459f2 ursa-thinkcentre-m75q
[Finished] fbcaee1e ec2-t3-xlarge-us-east-2
[Finished] fbcaee1e test-mac-arm
[Finished] fbcaee1e ursa-i9-9960x
[Finished] fbcaee1e ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Jan 15, 2023

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

@thisisnic
Copy link
Member

@thisisnic I think you were seeing that as a result of the force-push that overwrote the C++ I added to handle integers. I know you're off today so I merged both branches and added some tests to get this in before feature freeze slightly_smiling_face

Thanks for making that update, and sorry about the overwrite, brain fart!

@paleolimbot paleolimbot deleted the decimal-extend branch January 16, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[R] Implement RPrimitiveConverter for Decimal type

5 participants