ARROW-9001: [R] Box outputs as correct type in call_function#8256
ARROW-9001: [R] Box outputs as correct type in call_function#8256romainfrancois wants to merge 43 commits intoapache:masterfrom
Conversation
|
Turns out we just don't need to keep track of the result_type anymore I think. |
nealrichardson
left a comment
There was a problem hiding this comment.
I love this and I want this everywhere :) (#8246 (comment))
Just one note, I think there's more code that can be deleted. You're right, with this change you don't need to track result_type, that was a total workaround.
r/R/compute.R
Outdated
There was a problem hiding this comment.
This can be match_arrow.ChunkedArray <- match_arrow.Array now since they're the same. There are probably a few others like this (I see Ops also, for example).
There was a problem hiding this comment.
We can go further when tackling this: https://issues.apache.org/jira/browse/ARROW-10089
I sort of assumed you would go there, as I was working on #8246 too. So in short we would create the R6 object in C++ and don't have to manipulate the external pointers in R ? I can see the appeal, but I need to work out some logistics. We would retain the same signatures ? e.g. // [[arrow::export]]
std::shared_ptr<arrow::DataType> Int8__initialize() { return arrow::int8(); }presently before this goes to R, this is wrapped in the external pointer, so we have a |
Ideally, right?
Right, I think the most developer-friendly experience would be to pack the externalptr -> R6 logic into the as_sexp method. And yeah |
r/src/arrow_cpp11.h
Outdated
There was a problem hiding this comment.
This seems like something we'd like to use cpp11::function for, but among other things that always calls the function in R_GlobalEnv
There was a problem hiding this comment.
Perhaps this is more of a hypothetical cpp11::call thing, e.g.
cpp11::call call(cpp11::call(R_DollarSymbol, symbol, fun_symbol), xp);
cpp11::sexp result = call.eval(arrow::r::ns::arrow);Then I guess cpp11::function could factor out some of its logic in cpp11::call
r/src/compute.cpp
Outdated
There was a problem hiding this comment.
Someday this should probably be
SEXP as_sexp(arrow::Datum datum)so that we don't need to wrap datums in from_datum
There was a problem hiding this comment.
Would that still do the dispatch internally before reaching the R side or would there be an R6 class for arrow::Datum. Probably the latter.
There was a problem hiding this comment.
I don't think an R6 class for Datum has value, since SEXP already fills the role of a discriminated union of the members of Datum.
With the addition of R6 I amend my earlier comment: from_datum probably be replaced by R6::R6(Datum)
library(tidyverse)
brio::read_lines("~/git/apache/arrow/r/src/arrowExports.cpp") %>%
str_subset("^std::[a-z]+_ptr<") %>%
str_remove(" .*$") %>%
unique()
#> [1] "std::shared_ptr<arrow::Array>"
#> [2] "std::shared_ptr<arrow::DataType>"
#> [3] "std::shared_ptr<arrow::ArrayData>"
#> [4] "std::shared_ptr<arrow::ChunkedArray>"
#> [5] "std::shared_ptr<arrow::Buffer>"
#> [6] "std::shared_ptr<arrow::util::Codec>"
#> [7] "std::shared_ptr<arrow::io::CompressedOutputStream>"
#> [8] "std::shared_ptr<arrow::io::CompressedInputStream>"
#> [9] "std::shared_ptr<arrow::compute::CastOptions>"
#> [10] "std::shared_ptr<arrow::RecordBatch>"
#> [11] "std::shared_ptr<arrow::Table>"
#> [12] "std::shared_ptr<arrow::csv::ReadOptions>"
#> [13] "std::shared_ptr<arrow::csv::ParseOptions>"
#> [14] "std::shared_ptr<arrow::csv::ConvertOptions>"
#> [15] "std::shared_ptr<arrow::csv::TableReader>"
#> [16] "std::shared_ptr<ds::ScannerBuilder>"
#> [17] "std::shared_ptr<arrow::Schema>"
#> [18] "std::shared_ptr<ds::Dataset>"
#> [19] "std::shared_ptr<ds::UnionDataset>"
#> [20] "std::shared_ptr<ds::InMemoryDataset>"
#> [21] "std::shared_ptr<ds::FileFormat>"
#> [22] "std::shared_ptr<fs::FileSystem>"
#> [23] "std::shared_ptr<ds::DatasetFactory>"
#> [24] "std::shared_ptr<ds::ParquetFileFormat>"
#> [25] "std::shared_ptr<ds::IpcFileFormat>"
#> [26] "std::shared_ptr<ds::CsvFileFormat>"
#> [27] "std::shared_ptr<ds::Partitioning>"
#> [28] "std::shared_ptr<ds::PartitioningFactory>"
#> [29] "std::shared_ptr<ds::Scanner>"
#> [30] "std::shared_ptr<arrow::Field>"
#> [31] "std::shared_ptr<ds::Expression>"
#> [32] "std::shared_ptr<arrow::ipc::feather::Reader>"
#> [33] "std::shared_ptr<fs::FileSelector>"
#> [34] "std::shared_ptr<arrow::io::InputStream>"
#> [35] "std::shared_ptr<arrow::io::RandomAccessFile>"
#> [36] "std::shared_ptr<arrow::io::OutputStream>"
#> [37] "std::shared_ptr<fs::LocalFileSystem>"
#> [38] "std::shared_ptr<fs::SubTreeFileSystem>"
#> [39] "std::shared_ptr<fs::S3FileSystem>"
#> [40] "std::shared_ptr<arrow::io::MemoryMappedFile>"
#> [41] "std::shared_ptr<arrow::io::ReadableFile>"
#> [42] "std::shared_ptr<arrow::io::BufferReader>"
#> [43] "std::shared_ptr<arrow::io::FileOutputStream>"
#> [44] "std::shared_ptr<arrow::io::BufferOutputStream>"
#> [45] "std::shared_ptr<arrow::json::ReadOptions>"
#> [46] "std::shared_ptr<arrow::json::ParseOptions>"
#> [47] "std::shared_ptr<arrow::json::TableReader>"
#> [48] "std::shared_ptr<arrow::MemoryPool>"
#> [49] "std::shared_ptr<arrow::ipc::MessageReader>"
#> [50] "std::shared_ptr<arrow::ipc::Message>"
#> [51] "std::shared_ptr<parquet::ArrowReaderProperties>"
#> [52] "std::shared_ptr<parquet::arrow::FileReader>"
#> [53] "std::shared_ptr<parquet::ArrowWriterProperties>"
#> [54] "std::shared_ptr<parquet::WriterPropertiesBuilder>"
#> [55] "std::shared_ptr<parquet::WriterProperties>"
#> [56] "std::shared_ptr<parquet::arrow::FileWriter>"
#> [57] "std::shared_ptr<arrow::RecordBatchReader>"
#> [58] "std::shared_ptr<arrow::ipc::RecordBatchFileReader>"
#> [59] "std::shared_ptr<arrow::ipc::RecordBatchWriter>"
#> [60] "std::shared_ptr<arrow::Scalar>"Created on 2020-09-25 by the reprex package (v0.3.0.9001) |
c79c48f to
cb60e7b
Compare
|
started to get away from the R function template <typename T>
SEXP as_sexp(const std::shared_ptr<T>& ptr) {
return cpp11::external_pointer<std::shared_ptr<T>>(new std::shared_ptr<T>(ptr));
}will disappear and we'll have individual functions for each as generated by some macro. I'm not clear yet about the classes that are currently handled with |
r/src/arrow_exports.h
Outdated
There was a problem hiding this comment.
Why do you need this (big) header here now?
There was a problem hiding this comment.
This was because of these classes that were not available in the fw header:
R6_HANDLE(arrow::dataset::DirectoryPartitioning, "DirectoryPartitioning")
R6_HANDLE(arrow::dataset::HivePartitioning, "HivePartitioning")forward declaring them seems to do the trick.
There was a problem hiding this comment.
👍 We should probably add the forward declarations to arrow/dataset/type_fwd.h
I can think of a few possibilities (as I'm sure you can):
|
|
I'd like to make the Now, with |
c5b9e79 to
662871a
Compare
8a9f2a7 to
90f3e08
Compare
|
Finished rebasing this, now seeing, not sure what this is about: |
|
It's a test file that I added after you started this PR, and it's not run on every CI job, and you aren't running it locally unless you set up minio. See https://github.com/apache/arrow/blob/master/r/README.md#running-tests. On macOS you can |
|
If that one fix isn't enough, 883eb57 is probably where the relevant code came in, so see if you see anything there that needs updating. |
|
This last commit is related to https://issues.apache.org/jira/browse/ARROW-10080 and gives a way to release memory immediately for RecordBatch and Table. If we want more classes (maybe Array etc ... ?) we can add them to generated code in
invalidate = function() {
cl <- class(self)[1L]
# if there is a Reset function for that class, call it
reset <- get(paste0("_arrow_", cl, "__Reset"), ns_arrow)
if (!is.null(reset)) {
get(".Call")(reset, self)
}
# but in any case, set the external pointer to NULL
assign(".:xp:.", NULL, envir = self)
}otherwise it just sets the external pointer to NULL which is less useful because the memory will be reclaimed only later as part of a garbage collection. |
nealrichardson
left a comment
There was a problem hiding this comment.
A few things I noticed, mostly rebase cleanup.
It kinda feels like we've just moved the shared_ptr() business down a level since we still have to explicitly wrap and declare types for the output, though that is an improvement. I had envisioned something, either as as_sexp methods or as some custom wrapping in codegen.R, that would handle the mapping from the return signature (e.g. std::shared_ptr<arrow::Table>) to R6 class name (Table). Maybe that was wishful thinking.
ab8a5ce to
bf77261
Compare
Let me try to push this further. Now all the functions return an This is indeed just moving the needle and pain down on level. What I can try to do is that
|
|
Last commit goes in that direction with: namespace cpp11 {
template <typename T>
std::string r6_class_name(const std::shared_ptr<T>& x) ;
template <typename T>
SEXP to_r6(const std::shared_ptr<T>& x) {
if (x == nullptr) return R_NilValue;
auto r_class_name = cpp11::r6_class_name<T>(x);
cpp11::external_pointer<std::shared_ptr<T>> xp(new std::shared_ptr<T>(x));
SEXP r6_class = Rf_install(r_class_name.c_str());
// make call: <symbol>$new(<x>)
SEXP call = PROTECT(Rf_lang3(R_DollarSymbol, r6_class, arrow::r::symbols::new_));
SEXP call2 = PROTECT(Rf_lang2(call, xp));
// and then eval in arrow::
SEXP r6 = PROTECT(Rf_eval(call2, arrow::r::ns::arrow));
UNPROTECT(3);
return r6;
}
}
class R6 {
public:
template <typename T>
R6(const std::shared_ptr<T>& x) : data_(cpp11::to_r6<T>(x)){}
template <typename T>
R6(std::unique_ptr<T> x) : data_(cpp11::to_r6<T>(std::shared_ptr<T>(x.release()))){}
R6(SEXP data) : data_(data){}
operator SEXP() const {
return data_;
}
private:
SEXP data_;
};so that the functions "just" return This however means defining many of these, most of them being trivial: template <> inline std::string r6_class_name<arrow::Field>(const std::shared_ptr<arrow::Field>& array_data) {
return "Field";
}but some of them with some notion of dispatch: |
d901fe0 to
2074be0
Compare
|
@nealrichardson @romainfrancois I've rebased and
|
|
@github-actions crossbow submit -g r |
|
Revision: 2c3dd42 Submitted crossbow builds: ursa-labs/crossbow @ actions-701 |
nealrichardson
left a comment
There was a problem hiding this comment.
Thanks! Will merge when the crossbow jobs finish, then will rebase any outstanding R pull requests because this will probably conflict.
|
@github-actions crossbow submit test-r-linux-as-cran |
|
Revision: 2c3dd42 Submitted crossbow builds: ursa-labs/crossbow @ actions-702
|
|
@kszucs For some reason the FWIW it is still being triggered on the nightly builds: https://github.com/ursa-labs/crossbow/branches/all?query=test-r-linux-as-cran |
|
FYI: #8386 (comment) |
call_function() internally unbox to the right R6 class.
This probably needs some more work, e.g. not sure how to deal with this function: