-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10420: [C++] Refactor io and filesystem APIs to take an IOContext #9474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A couple things to discuss:
@westonpace @bkietz input welcome. |
cpp/src/arrow/dataset/file_base.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: explicitly closing output files is preferrable, especially with remote filesystems where this might plausibly fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and MockFileSystem will deliberately not write anything out if you don't close it explicitly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about explicitly closing files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this was scheduling file copies on the CPU thread pool.
443569d to
bb09c58
Compare
My gut says no but I could be convinced otherwise. For example, the S3 filesystem (if I understand correctly) would plug the executor into the AWS client configuration and the random access files / etc. would rely on that and not the Executor directly. At best it would just be for convenience right? A way to provide implementors easy access to the context instead of having to take care of passing it around? |
|
naming nit: IoContext? |
d4608a9 to
356c300
Compare
|
@westonpace I think I misphrased my question. This PR adds an owned @emkornfield Hmm, I'm not sure what the convention should be. We currently have |
According to the style guide IoContext:
However, if we want to keep consistency I'm OK with IO. |
|
Since IOContext only wraps pointers and an id integer, semantically it represents a reference. Therefore I'd recommend never producing references to them; it's redundant and the structure is tiny and trivially copyable anyway. |
bb09c58 to
37373b4
Compare
cpp/src/arrow/csv/reader.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the CPU and IO executors were passed in the wrong order here.
|
@ursabot please benchmark |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor comments
The `io::IOContext` class allows passing various settings such as the MemoryPool used for allocation and the Executor for async methods.
f16a80c to
a463936
Compare
|
@bkietz Do you want to give this another look? |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge when CI completes
a463936 to
da3ece9
Compare
| return ValueOrStop( | ||
| arrow::csv::TableReader::Make(gc_memory_pool(), arrow::io::AsyncContext(), input, | ||
| *read_options, *parse_options, *convert_options)); | ||
| return ValueOrStop(arrow::csv::TableReader::Make(arrow::io::IOContext(gc_memory_pool()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch
|
Benchmark runs are scheduled for baseline = 9a9baf6 and contender = da3ece9. Results will be available as each benchmark for each run completes: |
|
CI failure is ARROW-11717. Merging |
The
io::IOContextclass allows passing various settings such as the MemoryPool used for allocation and the Executor for async methods.