Skip to content

Conversation

@rtpsw
Copy link
Contributor

@rtpsw rtpsw commented Sep 22, 2022

@rtpsw
Copy link
Contributor Author

rtpsw commented Sep 22, 2022

cc @westonpace, @pitrou

@github-actions
Copy link

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there is value in the user being able to specify the I/O executor. I just think we should default it to the default I/O executor.

@rtpsw
Copy link
Contributor Author

rtpsw commented Sep 25, 2022

@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding inline but this time it didn't work.

@rtpsw
Copy link
Contributor Author

rtpsw commented Sep 25, 2022

@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding inline but this time it didn't work.

I suspect this is due to adding Executor, which is not a public symbol, in a constructor parameter as you requested. If so and we insist on exposing the IO executor, I see a couple of options to consider:

  1. Make Executor a public symbol. I'm not sure what issues that might lead to.
  2. Make Executor derive from a new type which would be a public symbol, and use this type in the constructor. This would lead to a not-so-pretty dynamic cast internally.
  3. Expose one or more different (public-symbol) parameters that are used to internally setup the IO executor. This restricts the kind of IO executors that the caller may set up.

@westonpace
Copy link
Member

Executor is exported and it is included in other places that are part of the public API (e.g. SourceNodeOptions, CSV reader).

I think the problem is that SchemaSourceNodeOptions is templated. Can you try adding something like (note, the first two lines already exist in your PR)...

using ArrayVectorIteratorMaker = std::function<Iterator<std::shared_ptr<ArrayVector>>()>;
using ArrayVectorSourceNodeOptions = SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
template class ARROW_EXPORT ArrayVectorSourceNodeOptions; // Explicitly export the specialization

@rtpsw
Copy link
Contributor Author

rtpsw commented Sep 26, 2022

It turns out template instantiations and attributes (in this case, visibility) do not work together nicely. When I try adding code like template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions, I get compiler errors like:

/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: error: using typedef-name ‘using ArrayVectorSourceNodeOptions = class arrow::compute::SchemaSourceNodeOptions<std::function<arrow::Iterator<std::shared_ptr<std::vector<std::shared_ptr<arrow::Array> > > >()> >’ after ‘class’
  113 | template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions;
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

and when I try adding code like template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>, I get compiler warnings like:

/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: warning: attributes ignored on template instantiation [-Wattributes]
  113 | template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So, both tries do not work, at least with my compiler. There seems to be a proposal documenting this limitation, but I'm not sure what its status is.

In the commit I just made, I opted instead to explicitly define and export non-template symbols.

@rtpsw
Copy link
Contributor Author

rtpsw commented Sep 27, 2022

The CI failures look unrelated to the PR.

@rtpsw rtpsw requested a review from westonpace September 27, 2022 06:36
@rtpsw
Copy link
Contributor Author

rtpsw commented Oct 13, 2022

@westonpace, to make sure we're on the same page, what remains for this PR to go through?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now and should be very useful for helping people get data into Acero, thanks! I'll wait until after we make the RC branch (I think this happens on Monday) before I merge.

@rtpsw
Copy link
Contributor Author

rtpsw commented Nov 6, 2022

@westonpace, is it a good time to get back to this PR?

@westonpace
Copy link
Member

Yes, apologies. Since it's been a month lets give it one last CI pass. I've rebased this (all tests pass locally) and will merge later today.

@westonpace westonpace merged commit c929303 into apache:master Nov 21, 2022
@ursabot
Copy link

ursabot commented Nov 23, 2022

Benchmark runs are scheduled for baseline = 7198676 and contender = c929303. c929303 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.57% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.38% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] c9293039 ec2-t3-xlarge-us-east-2
[Finished] c9293039 test-mac-arm
[Finished] c9293039 ursa-i9-9960x
[Finished] c9293039 ursa-thinkcentre-m75q
[Finished] 7198676a ec2-t3-xlarge-us-east-2
[Finished] 7198676a test-mac-arm
[Finished] 7198676a ursa-i9-9960x
[Finished] 7198676a ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants