Skip to content

Conversation

@romainfrancois
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Mar 2, 2021

@romainfrancois
Copy link
Contributor Author

bool CanExtendParallel(SEXP x, const std::shared_ptr<arrow::DataType>& type) {
  // TODO: identify when it's ok to do things in parallel
  return false;
}

turning this to true entirely makes everything fail dramatically. So we'll need to pay careful attention about what can and cannot be done concurrently. We might have to dial down the use of the cpp11 package because I believe when we create a cpp11 vector to shell a SEXP this uses a central resource for the protection.

It would probably be better to move the "can this be done in parallel" to a virtual method of RConverter, but we would need to move capture converter in the task lambda or at least figure out some way for the converter to still be alive when the task is run.

@romainfrancois
Copy link
Contributor Author

The issue about doing R things in parallel is that you can't really. Maybe we can have an R specific mutex:

std::mutex& get_r_mutex() {
  static std::mutex m ;
  return m;
}

that we can lock when we do need to call something in the R api, including making a cpp11::doubles for example. Then use this in a wrapper class like:

template <class vector>
class synchronized {
public:
  synchronized(SEXP x) {
    std::lock_guard<std::mutex> lock(get_r_mutex());
    data_ = new vector(x);
  }

  vector& data() {
    return *data_;
  }

  ~synchronized() {
    std::lock_guard<std::mutex> lock(get_r_mutex());
    delete data_;
  }

private:
  vector* data_;
};

so that we can have something like this:

// [[arrow::export]]
int parallel_test(int n) {
  auto tasks = arrow::internal::TaskGroup::MakeThreaded(arrow::internal::GetCpuThreadPool());
  SEXP x = PROTECT(Rf_allocVector(REALSXP, 100));

  std::atomic<int> count(0);
  for (int i = 0; i < n; i++) {
    tasks->Append([x, &count] {
      synchronized<cpp11::doubles> dx(x);

      int nx = dx.data().size();
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
      count += nx;

      return arrow::Status::OK();
    });
  }

  auto status = tasks->Finish();
  UNPROTECT(1);
  return count;
}

Of course this only makes sure that the synchronized<cpp11::doubles> is safe on construction and destruction, access to other methods would also need to lock/unlock.

@romainfrancois romainfrancois marked this pull request as ready for review April 28, 2021 13:38
@romainfrancois
Copy link
Contributor Author

Marking this as ready to review. I've changed the approach this week so that it does not need to resort to locking.

This introduces the RTasks class that factors out handling of tasks that can be run in parallel and tasks that cannot (because they might touch the R central resource, e.g. protect an R object ...). It has void Append(bool parallel, Task&& task) to add a task. Based on parallel the task is either added to the parallel task group, and potentially started immediately, or delayed to run until all the tasks have been added.

Then it has Finish() which 1) runs the tasks that have been delayed, and then waits for the parallel tasks to finish.

With this, the RConverter class gained virtual void DelayedExtend(SEXP values, int64_t size, RTasks& tasks). The idea is that an implementation might first do some setup work that has to happen on the main thread because it uses central R resources, but then the bulk of the work is either run in parallel if possible or delayed.

The RStructConverter implementation is a good example that has to do some work upfront but then can still benefit from parallel ingestion of its columns.

@jonkeane
Copy link
Member

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 299c34f94c61c7017f4a9e32437ddd0d9bbd50ee

Submitted crossbow builds: ursacomputing/crossbow @ actions-362

Task Status
conda-linux-gcc-py36-cpu-r36 Azure
conda-linux-gcc-py37-cpu-r40 Azure
conda-osx-clang-py36-r36 Azure
conda-osx-clang-py37-r40 Azure
conda-win-vs2017-py36-r36 Azure
conda-win-vs2017-py37-r40 Azure
homebrew-r-autobrew Github Actions
test-r-devdocs Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-minimal-build Azure
test-r-rhub-ubuntu-gcc-release Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-version-compatibility Github Actions
test-r-versions Github Actions
test-r-without-arrow Azure
test-ubuntu-18.04-r-sanitizer Azure

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neal asked me to take a look at some of the parallel stuff since I've been working on some parallel code in the C++ code base as well. I think this is a very clever approach. You basically take a quick scheduling pass through all of the data to spawn as much parallel as you can and then tackle the rest serially.

One thing I would watch out for with DelayedExtend is iterating through the data itself both before and after since you will most likely lose your CPU cache between the iterations and be forced to load the data out of RAM twice. I'm pretty sure you are not doing that here so I don't think it is a problem. Future DelayedExtend implementations will need to keep an eye out though.

I'm going to try and take a bit more of a look tomorrow but here are some initial comments.

}

// then wait for the parallel tasks to finish
status &= parallel_tasks->Finish();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there were a good way to trigger the parallel_tasks to fail early if status was not ok here (and we broke out of the serial loop above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to do this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanning through quickly I just thought "it would be nice" but now thinking about implementing it I think "but it sure would be tricky" 😆

I think you would need to use a stop token of some kind. Either arrow::StopToken in arrow/util/cancel.h or an atomic bool of some kind that is set by the serial tasks.

TaskGroup::MakeThreaded can take in a stop token as well. That might be easiest:

  • Create a StopSource
  • Get a StopToken from the source
  • Pass the StopToken into TaskGroup::MakeThreaded
  • After doing the serial work, if there is an error, call RequestStop on the source.

That would cancel any conversion tasks that are scheduled but not yet executing. Any tasks currently executing would simply have to run until completion. If you really wanted to make it responsive then you could pass the StopToken into your conversion functions and check it periodically to see if a stop was requested and, if so, bail early.

template <typename Iterator, typename AppendNull, typename AppendValue>
Status VisitVector(Iterator it, int64_t n, AppendNull&& append_null,
AppendValue&& append_value) {
for (R_xlen_t i = 0; i < n; i++, ++it) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: You're pretty close to being able to use a range-based for loop here. I'm not sure how difficult it would be to create an end() pointer and an iterator equality function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might revisit when we tackle chunking, iterator equality and end() should be straightforward as the iterator classes we use are just wrappers around either pointers or cpp11 iterators.

The only thing is that when we'll chunk, we might not iterate from start to end.

@jonkeane
Copy link
Member

jonkeane commented May 6, 2021

Ok, Finally got these benchmarks re-run and this report put together.

TL;DR:

For multi-core operation:

  • Dict types are massively faster
  • Smaller improvements are seen on most other types (for types that we have all-one-type benchmark fixtures for): integers, floats
  • Strings are either the same as or slightly slower
  • The naturalistic datasets we have are a mixture:
    • nyctaxi is faster (especially on the first iteration)
    • fannie + chicago traffic are slightly longer (possibly because of more strings?)

For single-core operation:
Most datasets/types have very similar performance across the branches (dicts are the only ones that stand out as seeing a decent speed up, but nowhere near what we see on the 8-core test)

Here's a zip* of the report
parallel-data-conversion.html.zip

  • – to get around GH file-extension restrictions

@jonkeane
Copy link
Member

jonkeane commented May 6, 2021

Ok, I've added in a run from the commit that this branch is based off (16a0739 ) of to be a more close comparison, and things are murkier:

  • most of the naturalistic datasets are worse
  • floats, integers, and strings also seem worse to varying degrees
  • the dict type is still very improved

parallel-data-conversion.html.zip

@nealrichardson
Copy link
Member

I'm a little skeptical, with the exception of the big change on the data.frames of factor columns, that this isn't just noise. I don't think there's been any other changes in the data.frame to Arrow code between latest master and where this branch is based.

For the sake of argument, let's assume that "a little better or a little worse" is really just no change. I'm more surprised that there seems only to be that one improvement. The fannie mae dataset has 31 columns: with 8 cores, why is essentially the same performance as before/with 1 core?

@jonkeane
Copy link
Member

jonkeane commented May 6, 2021

Absolutely, aside from factors, all of these differences are compatible with being pure noise / no real change.

If we don't see any speed up with any types other than factors, I'm not totally surprised that the naturalistic data sets aren't seeing an improvement since fannie + nyctaxi when read in as data.frames don't result in any factors. And the chi traffic dataset which starts as a parquet only has two columns which are factors.

@jonkeane
Copy link
Member

jonkeane commented May 6, 2021

Also, I should have been more careful with my words and that "worse" should have really been "not-convincingly-better"

@romainfrancois romainfrancois force-pushed the RConverter_Parallel branch from 299c34f to feaf577 Compare May 7, 2021 09:21
@romainfrancois
Copy link
Contributor Author

What is the schema of the fanni mae data set ? Does it have some missing values ? Maybe the code goes through this branch:

      if (arrow::r::can_reuse_memory(x, options.type)) {
        columns[j] = std::make_shared<arrow::ChunkedArray>(
            arrow::r::vec_to_arrow__reuse_memory(x));
      }

which for now does not benefit from parallelization, and perhaps should, at least when there are some NA to deal with:

// this is only used on some special cases when the arrow Array can just use the memory of
// the R object, via an RBuffer, hence be zero copy
template <int RTYPE, typename RVector, typename Type>
std::shared_ptr<Array> MakeSimpleArray(SEXP x) {
  using value_type = typename arrow::TypeTraits<Type>::ArrayType::value_type;
  RVector vec(x);
  auto n = vec.size();
  auto p_vec_start = reinterpret_cast<const value_type*>(DATAPTR_RO(vec));
  auto p_vec_end = p_vec_start + n;
  std::vector<std::shared_ptr<Buffer>> buffers{nullptr,
                                               std::make_shared<RBuffer<RVector>>(vec)};

  int null_count = 0;

  auto first_na = std::find_if(p_vec_start, p_vec_end, is_NA<value_type>);
  if (first_na < p_vec_end) {
    auto null_bitmap =
        ValueOrStop(AllocateBuffer(BitUtil::BytesForBits(n), gc_memory_pool()));
    internal::FirstTimeBitmapWriter bitmap_writer(null_bitmap->mutable_data(), 0, n);

    // first loop to clear all the bits before the first NA
    auto j = std::distance(p_vec_start, first_na);
    int i = 0;
    for (; i < j; i++, bitmap_writer.Next()) {
      bitmap_writer.Set();
    }

    auto p_vec = first_na;
    // then finish
    for (; i < n; i++, bitmap_writer.Next(), ++p_vec) {
      if (is_NA<value_type>(*p_vec)) {
        bitmap_writer.Clear();
        null_count++;
      } else {
        bitmap_writer.Set();
      }
    }

    bitmap_writer.Finish();
    buffers[0] = std::move(null_bitmap);
  }

  auto data = ArrayData::Make(std::make_shared<Type>(), LENGTH(x), std::move(buffers),
                              null_count, 0 /*offset*/);

  // return the right Array class
  return std::make_shared<typename TypeTraits<Type>::ArrayType>(data);
}

The find_if() and the content of the if (first_na < p_vec_end) { branch is where this does some work, but all things are in place so that we could benefit from parallelization.

Looking at this in the next few days.

@jonkeane
Copy link
Member

jonkeane commented May 7, 2021

Oh, knowing about missing values is helpful, lemme dig more into that and see if I can replicate performance differences on those.

Here's summary() of the fanniemae dataset, and that are a decent chunk of NAs in it (and has types too):

> summary(df_fannie)
       f0                    f1                 f2                  f3       
 Min.   :100001420754   Length:22180168    Length:22180168    Min.   :1.750  
 1st Qu.:326084086722   Class :character   Class :character   1st Qu.:3.375  
 Median :550659611473   Mode  :character   Mode  :character   Median :3.500  
 Mean   :550440259075                                         Mean   :3.561  
 3rd Qu.:775451076920                                         3rd Qu.:3.750  
 Max.   :999999800242                                         Max.   :5.900  
                                                                             
       f4                f5              f6              f7       
 Min.   :      0   Min.   :-1.00   Min.   : 27.0   Min.   :  0.0  
 1st Qu.: 136660   1st Qu.: 8.00   1st Qu.:220.0   1st Qu.:214.0  
 Median : 206558   Median :16.00   Median :336.0   Median :333.0  
 Mean   : 226014   Mean   :16.65   Mean   :292.9   Mean   :287.5  
 3rd Qu.: 299858   3rd Qu.:25.00   3rd Qu.:348.0   3rd Qu.:347.0  
 Max.   :1203000   Max.   :58.00   Max.   :482.0   Max.   :480.0  
 NA's   :4058158                                   NA's   :28840  
      f8                  f9            f10                f11           
 Length:22180168    Min.   :    0   Length:22180168    Length:22180168   
 Class :character   1st Qu.:17460   Class :character   Class :character  
 Mode  :character   Median :31080   Mode  :character   Mode  :character  
                    Mean   :28225                                        
                    3rd Qu.:39580                                        
                    Max.   :49740                                        
                                                                         
      f12               f13                f14                f15           
 Min.   : 1         Length:22180168    Length:22180168    Length:22180168   
 1st Qu.: 1         Class :character   Class :character   Class :character  
 Median : 1         Mode  :character   Mode  :character   Mode  :character  
 Mean   : 1                                                                 
 3rd Qu.: 1                                                                 
 Max.   :16                                                                 
 NA's   :22061889                                                           
     f16                 f17                f18                f19          
 Length:22180168    Min.   :    3      Min.   :  187      Min.   :  65      
 Class :character   1st Qu.: 2945      1st Qu.:  730      1st Qu.:1319      
 Mode  :character   Median : 4658      Median : 3679      Median :2500      
                    Mean   : 5143      Mean   : 6808      Mean   :2561      
                    3rd Qu.: 7026      3rd Qu.: 7542      3rd Qu.:2605      
                    Max.   :23055      Max.   :55625      Max.   :9900      
                    NA's   :22180014   NA's   :22180087   NA's   :22180124  
      f20                f21                f22                f23          
 Min.   :-3561      Min.   :   87      Min.   :  4911     Min.   :   284    
 1st Qu.:  -34      1st Qu.: 1089      1st Qu.: 73622     1st Qu.: 12374    
 Median :  869      Median : 2214      Median :127763     Median : 20665    
 Mean   : 1345      Mean   : 3399      Mean   :147056     Mean   : 36262    
 3rd Qu.: 1877      3rd Qu.: 3980      3rd Qu.:198586     3rd Qu.: 40433    
 Max.   :36497      Max.   :24840      Max.   :465825     Max.   :539401    
 NA's   :22180041   NA's   :22180053   NA's   :22180023   NA's   :22180081  
      f24                f25                f26             f27          
 Min.   :126773     Min.   :     0     Min.   :     0     Mode:logical   
 1st Qu.:126773     1st Qu.:   110     1st Qu.:     0     NA's:22180168  
 Median :126773     Median :   500     Median :     0                    
 Mean   :126773     Mean   : 14636     Mean   :  2807                    
 3rd Qu.:126773     3rd Qu.:  2846     3rd Qu.:     0                    
 Max.   :126773     Max.   :328871     Max.   :129946                    
 NA's   :22180167   NA's   :22180095   NA's   :22151328                  
     f28              f29               f30           
 Length:22180168    Mode:logical    Length:22180168   
 Class :character   NA's:22180168   Class :character  
 Mode  :character                   Mode  :character  

I also have been digging into differences across types. Factors seem to parallelize really well, so I tried to convert the chitraffic data frame which is a mic of strings + numerics + 2 factor columns, and when I do that (with 12 cpu cores available) the most I’m seeing the CPU get to is ~140% and even that is only briefly, most of the time the process is at 100%

> system.time(tab_chi_traffic <- arrow::Table$create(df_chi_traffic))
   user  system elapsed 
 29.093   0.797  28.002 

I then created a silly version of this dataset where I converted each of the columns into a factor (totally naively with as.factor()), and converting that is about half the time + the cpu usage peaks at ~300% though it drops down to 100% and then bumps back up a few times

> system.time(tab_chi_traffic <- arrow::Table$create(df_chi_traffic_factors))
   user  system elapsed 
 31.073   1.194  15.857 

@romainfrancois
Copy link
Contributor Author

Thanks. The special case for arrow::r::can_reuse_memory(x, options.type) predates our doing anything in parallel. I guess I'll fold that in one of the existing RConverter implementation, or create a new special one. One way or another, this should definitely leverage parallelism. Making this my Monday task :-)

@westonpace

This comment has been minimized.

@jonkeane
Copy link
Member

jonkeane commented May 7, 2021

Here's another example of trying a data.frame of strings and not seeing parallelization, but converting those strings to factors and boom we get parallelization:

> library(arrow)

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

> 
> # this sample is located at https://ursa-qa.s3.amazonaws.com/single_types/type_strings.parquet
> # it is 1M rows, 5 columns. The first column has no missing, the second has 10% missing, 
> # the third 25% missing, the fourth 50% missing, and the 5th 90% missing.
> strings_df <- read_parquet("~/repos/ab_store/data/type_strings.parquet")
> 
> # embiggen so that the transform differences are easier to see (and so we have more columns than cores)
> strings_df <- dplyr::bind_cols(strings_df, strings_df, strings_df)
New names:
* jane -> jane...1
* austen -> austen...2
* sense -> sense...3
* and -> and...4
* sensibility -> sensibility...5
* ...
> strings_df <- dplyr::bind_rows(strings_df, strings_df, strings_df, strings_df, strings_df)
> 
> summary(strings_df)
   jane...1          austen...2         sense...3           and...4         
 Length:5000000     Length:5000000     Length:5000000     Length:5000000    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 sensibility...5      jane...6          austen...7         sense...8        
 Length:5000000     Length:5000000     Length:5000000     Length:5000000    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   and...9          sensibility...10    jane...11         austen...12       
 Length:5000000     Length:5000000     Length:5000000     Length:5000000    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  sense...13          and...14         sensibility...15  
 Length:5000000     Length:5000000     Length:5000000    
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
> 
> # when this runs, my cpu usage is always at 100% or slightly below (and note 
> # that user <= elapsed below)
> system.time(tab <- Table$create(strings_df))
   user  system elapsed 
 31.855   0.842  32.806 
> 
> 
> # naively turn the strings into factors:
> strings_as_factors_df <- dplyr::mutate(strings_df, dplyr::across(.fns = as.factor))
> 
> summary(strings_as_factors_df)
    jane...1         austen...2        sense...3          and...4       
 Elino  :   1210   Elinor :   1140          :    915   Elino  :    625  
 Elinor :   1120   Elino  :   1125   Elinor :    895   Elinor :    625  
        :   1115          :    895   Elino  :    890          :    545  
 Mariann:    890   Maria  :    745   Marian :    705   Marian :    480  
 Elinor :    880   Elinor :    725   Elinor :    670   Elinor :    455  
 Marian :    865   (Other):4491725   (Other):3746365   (Other):2497925  
 (Other):4993920   NA's   : 503645   NA's   :1249560   NA's   :2499345  
 sensibility...5      jane...6         austen...7        sense...8      
 Elino  :     65   Elino  :   1210   Elinor :   1140          :    915  
 Marian :     65   Elinor :   1120   Elino  :   1125   Elinor :    895  
 Maria  :     60          :   1115          :    895   Elino  :    890  
 Elinor :     55   Mariann:    890   Maria  :    745   Marian :    705  
 Elinor :     55   Elinor :    880   Elinor :    725   Elinor :    670  
 (Other): 249925   Marian :    865   (Other):4491725   (Other):3746365  
 NA's   :4749775   (Other):4993920   NA's   : 503645   NA's   :1249560  
    and...9        sensibility...10    jane...11        austen...12     
 Elino  :    625   Elino  :     65   Elino  :   1210   Elinor :   1140  
 Elinor :    625   Marian :     65   Elinor :   1120   Elino  :   1125  
        :    545   Maria  :     60          :   1115          :    895  
 Marian :    480   Elinor :     55   Mariann:    890   Maria  :    745  
 Elinor :    455   Elinor :     55   Elinor :    880   Elinor :    725  
 (Other):2497925   (Other): 249925   Marian :    865   (Other):4491725  
 NA's   :2499345   NA's   :4749775   (Other):4993920   NA's   : 503645  
   sense...13         and...14       sensibility...15 
        :    915   Elino  :    625   Elino  :     65  
 Elinor :    895   Elinor :    625   Marian :     65  
 Elino  :    890          :    545   Maria  :     60  
 Marian :    705   Marian :    480   Elinor :     55  
 Elinor :    670   Elinor :    455   Elinor :     55  
 (Other):3746365   (Other):2497925   (Other): 249925  
 NA's   :1249560   NA's   :2499345   NA's   :4749775  
> 
> 
> # when this runs, my cpu usage goes up to 400% (and user >> elapsed below)
> system.time(tab <- Table$create(strings_as_factors_df))
   user  system elapsed 
 31.166   0.794   6.184 

@romainfrancois
Copy link
Contributor Author

@jonkeane I believe the last commit will improve things. The zero copy cases are now handled in parallel, as it appears these cases might actually represent some work when dealing with missing values.

@jonkeane
Copy link
Member

Yes! I reran the benchmarks again (comparing the last commit here with the base commit and apache/arrow@HEAD for today and I see some drastic improvements for floats and ints (and the improvements we saw before with dict are still there of course).

The naturalistic datasets aren't seeing much (if any) speed up — they are all within the noise range for variability that we see here. I'm going to dig into those separately and see if I see any funny patterns there that might explain it.

parallel-data-conversion.html.zip

@nealrichardson
Copy link
Member

I wonder if the logic for "parallelize what you can, then do the rest in serial" isn't working right. Maybe the natural datasets all have at least one column (string, most likely) that can't be parallel, and instead of parallelizing the integer/double/factor columns and then handling the strings, it just keeps them all serial.

@jonkeane
Copy link
Member

Yeah, I'm going to try testing that exactly and see if I can duplicate this behavior (probably tomorrow)

@romainfrancois
Copy link
Contributor Author

Some drastic improvements for floats and ints is what I was after with the last few commits. That's a win.

Looking into strings now, hoping to be able to leverage parallelism there too, it's currently not the case:

  void DelayedExtend(SEXP values, int64_t size, RTasks& tasks) override {
    auto task = [this, values, size]() { return this->Extend(values, size); };
    // TODO: refine this., e.g. extract setup from Extend()
    tasks.Append(false, std::move(task));
  }

@romainfrancois
Copy link
Contributor Author

After all the DelayedExtend() are finished, i.e. after the status &= tasks.Finish(); thing, there is still work to do to actually create the arrays:

for (int j = 0; j < num_fields; j++) {
    auto& converter = converters[j];
    if (converter != nullptr) {
      auto maybe_array = converter->ToArray();
      StopIfNotOk(maybe_array.status());
      columns[j] = std::make_shared<arrow::ChunkedArray>(maybe_array.ValueUnsafe());
    }
  }

I don't think there's any R involved there, so I suppose this could be done in parallel, with some care about the StopIfNotOk() ...

@romainfrancois
Copy link
Contributor Author

romainfrancois commented May 21, 2021

Done. This probably won't have much impact, because I guess by the time the converter does ->ToArray() there isn't much more work to be done because this is just a call to the builder's finish method:

virtual Result<std::shared_ptr<Array>> ToArray() { return builder_->Finish(); }

Table__from_dots() will need to be re-adapted again when we do chuncking. I think that each chunk will need its own Converter.

@romainfrancois
Copy link
Contributor Author

@westonpace can you have a look at the updated RTasks ? I have tried to allow tasks (either that are run in parallel or delayed) to request early stop.

@nealrichardson
Copy link
Member

@romainfrancois @westonpace @jonkeane Is this ready to merge? (The rtools35 error is spurious)

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really sorry for the delay, totally my mistake (I saw the ping, made a mental note, and then let the mental note get pushed out of my brain).

What you have should work fine. I think you could simplify it but if you wanted to do that in a follow-up that should be fine.


// run the delayed tasks now
for (auto& task : delayed_serial_tasks_) {
status &= std::move(task)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than wrapping all of your tasks in StoppingTask you really only need the StopSource as a way to send a signal to parallel_tasks_. Everywhere else you could handle stopping logic on your own. So I think you could change this loop to...

for (auto& task : delayed_serial_tasks_) {
  status &= std::move(task)();
  if (!status.ok()) {
    stop_source_.RequestStop();
    break;
  }
}

...then you can get rid of StoppingTask. If an error happens in a parallel task the ThreadedTaskGroup will already take care of stopping everything.

@nealrichardson
Copy link
Member

nealrichardson commented Jun 2, 2021

Thanks @westonpace, I'll merge now and make a followup (edit: I made ARROW-12939)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants