ARROW-10071: [R] segfault with ArrowObject from previous session, or saved#8246
ARROW-10071: [R] segfault with ArrowObject from previous session, or saved#8246romainfrancois wants to merge 5 commits intoapache:masterfrom
Conversation
| return reinterpret_cast<Pointer>( | ||
| R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp))); | ||
| void* p = R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp)); | ||
| if (p == nullptr) { |
There was a problem hiding this comment.
If we're checking for nullptr here, can we remove the shared_ptr_is_null check inside the shared_ptr() constructor? Then we can just $new() to create the R6 objects.
There was a problem hiding this comment.
We can't really do that because of the implicit else NULL case:
shared_ptr <- function(class, xp) {
if (!shared_ptr_is_null(xp)) class$new(xp)
}There are cases where we want an R NULL when the internal C++ shared pointer holds a C++ null pointer, e.g. when we call:
std::shared_ptr<Array> RecordBatch::GetColumnByName(const std::string& name) const {
auto i = schema_->GetFieldIndex(name);
return i == -1 ? NULLPTR : column(i);
}For example in this tests:
test_that("[[ and $ on RecordBatch", {
[...]
expect_null(batch$qwerty)
[...]
})having the extra layer with the R function shared_ptr(T, .) maybe calling T$new(.) gives the NULL.
There was a problem hiding this comment.
Another way would be to have the ability to directly create the R6 objects internally, either by having more code in
template <typename T>
SEXP as_sexp(const std::shared_ptr<T>& ptr) {
return cpp11::external_pointer<std::shared_ptr<T>>(new std::shared_ptr<T>(ptr));
}but we would need some sort of dispatch from T to the R6 object.
Or we would need to change the interface of many functions:
// [[arrow::export]]
std::shared_ptr<arrow::Array> StructArray__field(
const std::shared_ptr<arrow::StructArray>& array, int i) {
return array->field(i);
}perhaps to:
// [[arrow::export]]
R6 StructArray__field(
const std::shared_ptr<arrow::StructArray>& array, int i) {
return R6(array->field(i), "Array");
}but I don't think it's worth it
There was a problem hiding this comment.
Ah ok, I missed that that check translated some responses into NULL--though I don't think that's something we do widely, and maybe it would be better to make that more explicit in order to simplify the rest of the cases.
Regarding directly returning R6 objects from the cpp code, my thought had been more like the first idea of modifying as_sexp to map T to R6 classes. Our convention is that R6 class names are generally the bare C++ class name (assuming namespaces), i.e. arrow::dataset::Dataset is Dataset R6 class, with some exceptions. So we could record a map of those exceptions (e.g. "arrow::csv::TableReader" maps to "CsvTableReader") in cpp. If I were writing this in R, it could look something like this:
get_r6_constructor <- function(cpp_class) {
if (cpp_class %in% names(special_cases)) {
r6_name <- special_cases[[cpp_class]]
} else {
r6_name <- sub("^.*::(.*)$", "\\1", cpp_class)
}
get(r6_name, asNamespace("arrow"))$new
}cc @bkietz
There was a problem hiding this comment.
I'd prefer to specify a complete mapping T to R6 class names, rather than only listing special cases. This would be a pretty straightforward trait structure in c++
|
Is there something I can use to demangle the type name ? arrow:::Array__length(1)
#> Error in arrow:::Array__length(1): Invalid R object for NSt3__110shared_ptrIN5arrow5ArrayEEE, must be an ArrowObjectCreated on 2020-09-24 by the reprex package (v0.3.0.9001) Rcpp had @bkietz ? |
|
@romainfrancois yes, should I make a PR to your branch? |
|
@bkietz you can probably just push to it, or whatever you want. AFAICT that's the only outstanding issue and I'll merge when it is resolved (since it's earlier in the day here). |
ad2ce7d to
06e4ac6
Compare
Created on 2020-09-23 by the reprex package (v0.3.0.9001)