Skip to content

ARROW-10071: [R] segfault with ArrowObject from previous session, or saved#8246

Closed
romainfrancois wants to merge 5 commits intoapache:masterfrom
romainfrancois:ARROW-10071/segfault
Closed

ARROW-10071: [R] segfault with ArrowObject from previous session, or saved#8246
romainfrancois wants to merge 5 commits intoapache:masterfrom
romainfrancois:ARROW-10071/segfault

Conversation

@romainfrancois
Copy link
Contributor

library(arrow, warn.conflicts = FALSE)

a <- Array$create(1:10)
tf <- tempfile()
saveRDS(a, tf)

b <- readRDS(tf)
b$length()
#> Error in Array__length(self): Invalid <Array>, external pointer to null

Created on 2020-09-23 by the reprex package (v0.3.0.9001)

@github-actions
Copy link

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

return reinterpret_cast<Pointer>(
R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp)));
void* p = R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp));
if (p == nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're checking for nullptr here, can we remove the shared_ptr_is_null check inside the shared_ptr() constructor? Then we can just $new() to create the R6 objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really do that because of the implicit else NULL case:

shared_ptr <- function(class, xp) {
  if (!shared_ptr_is_null(xp)) class$new(xp)
}

There are cases where we want an R NULL when the internal C++ shared pointer holds a C++ null pointer, e.g. when we call:

std::shared_ptr<Array> RecordBatch::GetColumnByName(const std::string& name) const {
  auto i = schema_->GetFieldIndex(name);
  return i == -1 ? NULLPTR : column(i);
}

For example in this tests:

test_that("[[ and $ on RecordBatch", {
  [...]
  expect_null(batch$qwerty)
  [...]
})

having the extra layer with the R function shared_ptr(T, .) maybe calling T$new(.) gives the NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to have the ability to directly create the R6 objects internally, either by having more code in

template <typename T>
SEXP as_sexp(const std::shared_ptr<T>& ptr) {
  return cpp11::external_pointer<std::shared_ptr<T>>(new std::shared_ptr<T>(ptr));
}

but we would need some sort of dispatch from T to the R6 object.

Or we would need to change the interface of many functions:

// [[arrow::export]]
std::shared_ptr<arrow::Array> StructArray__field(
    const std::shared_ptr<arrow::StructArray>& array, int i) {
  return array->field(i);
}

perhaps to:

// [[arrow::export]]
R6 StructArray__field(
    const std::shared_ptr<arrow::StructArray>& array, int i) {
  return R6(array->field(i), "Array");
}

but I don't think it's worth it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I missed that that check translated some responses into NULL--though I don't think that's something we do widely, and maybe it would be better to make that more explicit in order to simplify the rest of the cases.

Regarding directly returning R6 objects from the cpp code, my thought had been more like the first idea of modifying as_sexp to map T to R6 classes. Our convention is that R6 class names are generally the bare C++ class name (assuming namespaces), i.e. arrow::dataset::Dataset is Dataset R6 class, with some exceptions. So we could record a map of those exceptions (e.g. "arrow::csv::TableReader" maps to "CsvTableReader") in cpp. If I were writing this in R, it could look something like this:

get_r6_constructor <- function(cpp_class) {
  if (cpp_class %in% names(special_cases)) {
    r6_name <- special_cases[[cpp_class]]
  } else {
    r6_name <- sub("^.*::(.*)$", "\\1", cpp_class)
  }
  get(r6_name, asNamespace("arrow"))$new
}

cc @bkietz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to specify a complete mapping T to R6 class names, rather than only listing special cases. This would be a pretty straightforward trait structure in c++

@romainfrancois
Copy link
Contributor Author

Is there something I can use to demangle the type name ?

arrow:::Array__length(1)
#> Error in arrow:::Array__length(1): Invalid R object for NSt3__110shared_ptrIN5arrow5ArrayEEE, must be an ArrowObject

Created on 2020-09-24 by the reprex package (v0.3.0.9001)

Rcpp had DEMANGLE https://github.com/RcppCore/Rcpp/blob/85f6b275966cdfb1dfd32575e2da509ea2642084/src/api.cpp#L111

@bkietz ?

@bkietz
Copy link
Member

bkietz commented Sep 25, 2020

@romainfrancois yes, should I make a PR to your branch?

@nealrichardson
Copy link
Member

@bkietz you can probably just push to it, or whatever you want. AFAICT that's the only outstanding issue and I'll merge when it is resolved (since it's earlier in the day here).

@bkietz bkietz force-pushed the ARROW-10071/segfault branch from ad2ce7d to 06e4ac6 Compare September 26, 2020 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants