Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Feb 11, 2021

The io::IOContext class allows passing various settings such as the MemoryPool used for allocation and the Executor for async methods.

@github-actions
Copy link

@pitrou
Copy link
Member Author

pitrou commented Feb 11, 2021

A couple things to discuss:

  • API choices: should we take a const IOContext&, a const IOContext*, a IOContext*?
  • I've added an owned IOContext to the base FileSystem class. Should we do that on RandomAccessFile or InputStream as well?

@westonpace @bkietz input welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: explicitly closing output files is preferrable, especially with remote filesystems where this might plausibly fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and MockFileSystem will deliberately not write anything out if you don't close it explicitly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about explicitly closing files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this was scheduling file copies on the CPU thread pool.

@pitrou pitrou force-pushed the ARROW-10420-io-context branch 2 times, most recently from 443569d to bb09c58 Compare February 11, 2021 19:11
@westonpace
Copy link
Member

I've added an owned IOContext to the base FileSystem class. Should we do that on RandomAccessFile or InputStream as well?

My gut says no but I could be convinced otherwise. For example, the S3 filesystem (if I understand correctly) would plug the executor into the AWS client configuration and the random access files / etc. would rely on that and not the Executor directly. At best it would just be for convenience right? A way to provide implementors easy access to the context instead of having to take care of passing it around?

@emkornfield
Copy link
Contributor

emkornfield commented Feb 12, 2021

naming nit: IoContext?

@pitrou
Copy link
Member Author

pitrou commented Feb 16, 2021

@westonpace I think I misphrased my question. This PR adds an owned IOContext to filesystem instances. My question was whether stream objects should also have an owned IOContext; currently they only keep a const-ref, meaning it has to be kept alive elsewhere.

@emkornfield Hmm, I'm not sure what the convention should be. We currently have Status::IOError, though.

@emkornfield
Copy link
Contributor

@emkornfield Hmm, I'm not sure what the convention should be. We currently have Status::IOError, though.

According to the style guide IoContext:

For the purposes of the naming rules below, a "word" is anything that you would write in English without internal spaces. This includes abbreviations, such as acronyms and initialisms. For names written in mixed case (also sometimes referred to as "camel case" or "Pascal case"), in which the first letter of each word is capitalized, prefer to capitalize abbreviations as single words, e.g., StartRpc() rather than StartRPC().

However, if we want to keep consistency I'm OK with IO.

@bkietz
Copy link
Member

bkietz commented Feb 18, 2021

Since IOContext only wraps pointers and an id integer, semantically it represents a reference. Therefore I'd recommend never producing references to them; it's redundant and the structure is tiny and trivially copyable anyway.

@pitrou pitrou force-pushed the ARROW-10420-io-context branch from bb09c58 to 37373b4 Compare February 25, 2021 13:59
@pitrou pitrou marked this pull request as ready for review February 25, 2021 13:59
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the CPU and IO executors were passed in the wrong order here.

@pitrou pitrou requested a review from bkietz February 25, 2021 14:12
@pitrou
Copy link
Member Author

pitrou commented Feb 25, 2021

@ursabot please benchmark

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments

The `io::IOContext` class allows passing various settings such as the MemoryPool
used for allocation and the Executor for async methods.
@pitrou pitrou force-pushed the ARROW-10420-io-context branch from f16a80c to a463936 Compare February 25, 2021 19:15
@pitrou
Copy link
Member Author

pitrou commented Feb 25, 2021

@bkietz Do you want to give this another look?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge when CI completes

@pitrou pitrou force-pushed the ARROW-10420-io-context branch from a463936 to da3ece9 Compare February 25, 2021 19:36
return ValueOrStop(
arrow::csv::TableReader::Make(gc_memory_pool(), arrow::io::AsyncContext(), input,
*read_options, *parse_options, *convert_options));
return ValueOrStop(arrow::csv::TableReader::Make(arrow::io::IOContext(gc_memory_pool()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch

@ursabot
Copy link

ursabot commented Feb 25, 2021

Benchmark runs are scheduled for baseline = 9a9baf6 and contender = da3ece9. Results will be available as each benchmark for each run completes:
[Finished] ursa-dgx1: https://conbench.ursa.dev/compare/runs/d21bb0de-dc67-4bd3-8e0e-00cee85a296d...60e9bcc3-0bf6-4386-8b6c-983b398ae676/
[Finished] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/9dda2d10-03ed-4b58-8b57-6cac360134cc...885e0359-8e9d-4302-b464-0ba20e50e8f7/
If you have Ursa Computing Inc's Buildkite access, you can also view benchmark runs logs using these links:
[Finished] https://buildkite.com/ursa-computing/arrow-run-benchmarks-dgx/builds/401
[Finished] https://buildkite.com/ursa-computing/arrow-run-benchmarks/builds/904
[Finished] https://buildkite.com/ursa-computing/arrow-run-benchmarks/builds/903
[Finished] https://buildkite.com/ursa-computing/arrow-run-benchmarks-dgx/builds/400

@bkietz
Copy link
Member

bkietz commented Feb 26, 2021

CI failure is ARROW-11717. Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants