Skip to content

Conversation

@romainfrancois
Copy link
Contributor

@romainfrancois romainfrancois commented Jul 22, 2020

No description provided.

@nealrichardson nealrichardson changed the title [R]: Switch from Rcpp to cpp11 ARROW-9405: [R] Switch to cpp11 Jul 22, 2020
@github-actions
Copy link

@bkietz bkietz requested review from bkietz and nealrichardson and removed request for nealrichardson July 23, 2020 13:07
Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this come together. The merge conflict with master is probably that I deleted a cpp function and you adapted it to cpp11.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some style/lint comments

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more style/lint comments

@kszucs
Copy link
Member

kszucs commented Jul 27, 2020

@romainfrancois please rebase, it has some conflicts with the master.

@wesm
Copy link
Member

wesm commented Jul 28, 2020

Since this has a bunch of merge commits it will have to be git merge --squash'd to clean things up, let me know if I can help

@wesm
Copy link
Member

wesm commented Jul 28, 2020

@romainfrancois I squashed, be sure to git fetch origin && git reset --hard origin/cpp11

@romainfrancois romainfrancois force-pushed the cpp11 branch 2 times, most recently from ba72863 to d4117ae Compare August 4, 2020 13:53
r/R/schema.R Outdated
Comment on lines 86 to 88
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cpp11 is doing what it claims wrt unicode, you can remove this and just

names = function() Schema__field_names(self),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test ?

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes. I think you can also grep for Rf_translateCharUTF8 in r/src and remove those coercions since cpp11 should handle that automatically

@nealrichardson
Copy link
Member

@github-actions autotune everything

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thorough review, just scanned for a few things to make sure to resolve before we merge

r/DESCRIPTION Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're ready to merge this, we should vendor cpp11. I made https://issues.apache.org/jira/browse/ARROW-9786 to go back to the released cpp11 before we do the 2.0.0 release (October-ish).

@romainfrancois
Copy link
Contributor Author

I'm trying to debug what's wrong with the python tests that I previously silenced in this PR:

That looks fine:

library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
str(py)
#> [
#>   1,
#>   2,
#>   3
#> ]
class(py)
#> [1] "pyarrow.lib.DoubleArray"        "pyarrow.lib.FloatingPointArray"
#> [3] "pyarrow.lib.NumericArray"       "pyarrow.lib.Array"             
#> [5] "pyarrow.lib._PandasConvertible" "python.builtin.object"

But then if I also load arrow I'm getting a segfault on pa$array() call:

library(reticulate)
library(arrow)
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))

@romainfrancois
Copy link
Contributor Author

romainfrancois commented Aug 19, 2020

The error seems to happen on this call:

x$`_export_to_c`(array_ptr, schema_ptr)

although there seems to be something there:

Browse[2]> x$`_export_to_c`
<built-in method _export_to_c of pyarrow.lib.DoubleArray>

@romainfrancois
Copy link
Contributor Author

And I can confirm that this works on master locally:

library(reticulate)
#> Warning: package 'reticulate' was built under R version 4.0.2
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
pa <- reticulate::import("pyarrow")
py <- pa$array(c(1, 2, 3))
py
#> Array
#> <double>
#> [
#>   1,
#>   2,
#>   3
#> ]

Created on 2020-08-19 by the reprex package (v0.3.0.9001)

More investigation needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these utilities are necessary? translateCharUTF8 also checks IS_UTF8 and IS_ASCII and exits early: https://github.com/wch/r-source/blob/122dcf452ec5eacdd66e165457985bded1af4fee/src/main/sysutils.c#L1085

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not all be necessary, I'm trying to avoid the Rf_mkChar(Rf_translateCharUTF8()) combo, which would unnecessarily make a SEXP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thin for now, but this might evolve into perhaps something like cpp11::utf8_strings class that would eagerly convert at construction time.

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pulled this and ran locally, and compile time (single-threaded) for the r/src bindings is cut in half: master takes 93 seconds on my machine, and this takes 48 seconds.

Thanks for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants