-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-3591: [R] Support for collecting decimal types #2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-3591: [R] Support for collecting decimal types #2819
Conversation
romainfrancois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there is a way to keep the information, rather than converting to a double vector.
Maybe inspired by the rational example in vctrs. https://vctrs.r-lib.org/articles/s3-vector.html#rational
Codecov Report
@@ Coverage Diff @@
## master #2819 +/- ##
===========================================
+ Coverage 73.41% 88.61% +15.19%
===========================================
Files 62 345 +283
Lines 4010 58886 +54876
===========================================
+ Hits 2944 52181 +49237
- Misses 992 6705 +5713
+ Partials 74 0 -74Continue to review full report at Codecov.
|
|
🤔 this is kind of confusing from arrow/cpp/src/arrow/util/decimal.h Line 35 in 42cf69a
but looking at the tests this is indeed a decimal arrow/cpp/src/arrow/util/decimal-test.cc Line 307 in e0f70bb
INSTANTIATE_TEST_CASE_P(Decimal128PrintingTest, Decimal128PrintingTest,
::testing::Values(std::make_tuple(123, 1, "12.3"),
std::make_tuple(123, 5, "0.00123"),
std::make_tuple(123, 10, "1.23E-8"),
std::make_tuple(123, -1, "1.23E+3"),
std::make_tuple(-123, -1, "-1.23E+3"),
std::make_tuple(123, -3, "1.23E+5"),
std::make_tuple(-123, -3, "-1.23E+5"),
std::make_tuple(12345, -3, "1.2345E+7")));@hadley would this be a good example of a Can I assume that if we do the "right" thing and implement this as a record, then the complex vector would only be an implementation detail that would be not so easy for the user to get to ? |
|
Generally speaking, would things like this and int64 be safer if implemented as a |
|
Yes, definitely. |
|
Great. @javierluraschi i’ll play around with a record based implementation for decimal128 |
|
@romainfrancois one thing to mention is that, while in some cases having a true decimal type might be needed, similarly to int64s, the use cases I've seen work much better by casting to a native data type in R, say So, I'm still proposing we merge this and then enhance with support for For BTW. I'm thinking something around the lines of: |
|
We could also do both:
|
|
@romainfrancois I'm hoping I don't need to cast each column with The reasoning behind configuring this in the So net, net, I would prefer to use |
|
For int64, we have to copy what the DBI package do in order to be consistent. |
|
From |
|
My understanding is that by default we should return an @javierluraschi that does not mean that the user has to explicitly use the We might have to consider changing the |
|
Fair enough, so for decimals, since |
|
I would rather like something closer to the arrow layout initially (lossless) and the possibility to convert to doubles then. |
|
That's fine but can we merge this one and then replace with something better? |
|
That’s fine by me, as long as you realise this will likely change. |
|
Yes, that's totally fine with me. |
|
@romainfrancois: The native DBI backends return an |
|
@wesm merge please? Looks like Romain and I are both 👍 to get this merged. |
|
I've merged! |

Fix for https://issues.apache.org/jira/browse/ARROW-3591, implemented following Python string to decimal conversion: https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L723.
Note: We could consider a more efficient conversion from decimal to R's integer or numeric types depending on the precision/scale from Arrows decimal; however, my guess is that this has been explored before for other languages and there is no straightforward
memcpy()version of cross language decimal type conversions since the internal representation has been implemented as language specific.Bonus: Added accessors in the
schemawrapper fornum_fields()andfield()which can help diagnose schema issues.