Skip to content

Conversation

@nealrichardson
Copy link
Member

Following a similar approach to ARROW-11766. This also pulls in withr as a test dependency and refactors some unrelated tests to use it.

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

@nealrichardson
Copy link
Member Author

The error on centos7 is something that I see on about 1 out of every 10 runs locally on the as.data.frame(RecordBatch) test. Locally it looks like:

libc++abi.dylib: Pure virtual function called!
/bin/sh: line 1: 93200 Abort trap: 6

I haven't observed this on repeated runs of the array/chunked-array tests.

Any ideas @romainfrancois @bkietz? Even though my changes are purely in the R code, googling the error message suggests it's a C++ issue.

r/R/util.R Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the call info in the error message at all informative? If not, you might want stop(msg, call. = FALSE) inside the conditional (after appending to the message of course).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought (though I could be remembering wrong) that when you call stop() on a caught simpleError class object, you just re-raise it and that preserves however the original source of the stop() had said call. or whatever options. I'll review/confirm tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed:

> f <- function(...) stop(...)
> f("boo")
Error in f("boo") : boo
> f("boo", call. = FALSE)
Error: boo
> tryCatch(f("boo"), error = function(e) stop(e))
Error in f("boo") : boo
> tryCatch(f("boo", call. = FALSE), error = function(e) stop(e))
Error: boo

@ianmcook
Copy link
Member

ianmcook commented Apr 6, 2021

Looks good to me 👍

@westonpace
Copy link
Member

The error on centos7 is something that I see on about 1 out of every 10 runs locally on the as.data.frame(RecordBatch) test. Locally it looks like:

libc++abi.dylib: Pure virtual function called!
/bin/sh: line 1: 93200 Abort trap: 6

I haven't observed this on repeated runs of the array/chunked-array tests.

Any ideas @romainfrancois @bkietz? Even though my changes are purely in the R code, googling the error message suggests it's a C++ issue.

It's a bug in to_dataframe_parallel (or possibly somewhere deeper). Something is calling stop which is causing the code to bail before tg (the arrow::internal::TaskGroup) is destroyed. The converters are then destroyed first for whatever reason (I don't really know how R cleans this stuff up). When the TaskGroup is destroyed it waits until all running tasks cease. Since this hasn't happened the tasks are still running. The tasks reference this implicitly where this is the converter. Thus, the tasks have a pointer to an object that has been destroyed and they attempt to make a call on it. This yields the error you saw.

A quick patch (but I'm not sure if the right one as I don't understand R's stop well enough) could be...

diff --git a/r/src/array_to_vector.cpp b/r/src/array_to_vector.cpp
index ddcb74946..ef57fa660 100644
--- a/r/src/array_to_vector.cpp
+++ b/r/src/array_to_vector.cpp
@@ -88,11 +88,12 @@ class Converter {
   // for each array, add a task to the task group
   //
   // The task group is Finish() in the caller
-  void IngestParallel(SEXP data, const std::shared_ptr<arrow::internal::TaskGroup>& tg) {
+  void IngestParallel(SEXP data, const std::shared_ptr<arrow::internal::TaskGroup>& tg,
+                      std::shared_ptr<Converter> self) {
     R_xlen_t k = 0, i = 0;
     for (const auto& array : arrays_) {
       auto n_chunk = array->length();
-      tg->Append([=] { return IngestOne(data, array, k, n_chunk, i); });
+      tg->Append([=] { return self->IngestOne(data, array, k, n_chunk, i); });
       k += n_chunk;
       i++;
     }
@@ -1242,7 +1243,7 @@ cpp11::writable::list to_dataframe_parallel(
 
     // add a task to ingest data of that column if that can be done in parallel
     if (converters[i]->Parallel()) {
-      converters[i]->IngestParallel(column, tg);
+      converters[i]->IngestParallel(column, tg, converters[i]);
     }
   }

This copies the shared_ptr into the task and uses that (instead of an implicit this capture).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants