-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9405: [R] Switch to cpp11 #7819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see this come together. The merge conflict with master is probably that I deleted a cpp function and you adapted it to cpp11.
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some style/lint comments
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more style/lint comments
|
@romainfrancois please rebase, it has some conflicts with the master. |
|
Since this has a bunch of merge commits it will have to be |
|
@romainfrancois I squashed, be sure to |
ba72863 to
d4117ae
Compare
r/R/schema.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cpp11 is doing what it claims wrt unicode, you can remove this and just
names = function() Schema__field_names(self),There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test ?
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes. I think you can also grep for Rf_translateCharUTF8 in r/src and remove those coercions since cpp11 should handle that automatically
|
@github-actions autotune everything |
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a thorough review, just scanned for a few things to make sure to resolve before we merge
r/DESCRIPTION
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we're ready to merge this, we should vendor cpp11. I made https://issues.apache.org/jira/browse/ARROW-9786 to go back to the released cpp11 before we do the 2.0.0 release (October-ish).
|
I'm trying to debug what's wrong with the python tests that I previously silenced in this PR: That looks fine: library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
str(py)
#> [
#> 1,
#> 2,
#> 3
#> ]
class(py)
#> [1] "pyarrow.lib.DoubleArray" "pyarrow.lib.FloatingPointArray"
#> [3] "pyarrow.lib.NumericArray" "pyarrow.lib.Array"
#> [5] "pyarrow.lib._PandasConvertible" "python.builtin.object"But then if I also load library(reticulate)
library(arrow)
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3)) |
|
The error seems to happen on this call: x$`_export_to_c`(array_ptr, schema_ptr)although there seems to be something there: Browse[2]> x$`_export_to_c`
<built-in method _export_to_c of pyarrow.lib.DoubleArray> |
|
And I can confirm that this works on master locally: library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
py
#> Array
#> <double>
#> [
#> 1,
#> 2,
#> 3
#> ]Created on 2020-08-19 by the reprex package (v0.3.0.9001) More investigation needed. |
r/src/arrow_cpp11.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these utilities are necessary? translateCharUTF8 also checks IS_UTF8 and IS_ASCII and exits early: https://github.com/wch/r-source/blob/122dcf452ec5eacdd66e165457985bded1af4fee/src/main/sysutils.c#L1085
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might not all be necessary, I'm trying to avoid the Rf_mkChar(Rf_translateCharUTF8()) combo, which would unnecessarily make a SEXP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is thin for now, but this might evolve into perhaps something like cpp11::utf8_strings class that would eagerly convert at construction time.
…pp11::warning() that fails locally for some reason
…tring to std::string)
- TraverseDotsNoName
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pulled this and ran locally, and compile time (single-threaded) for the r/src bindings is cut in half: master takes 93 seconds on my machine, and this takes 48 seconds.
Thanks for doing this!
No description provided.