-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6964: [C++][Dataset] Add multithread support to Scanner::ToTable #5721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6964: [C++][Dataset] Add multithread support to Scanner::ToTable #5721
Conversation
cpp/src/arrow/dataset/scanner.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the explicit loop than nested Visitor.
cpp/src/arrow/dataset/scanner.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time to expose a common ResourceContext class that has a MemoryPool and a ThreadPool?
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments.
cpp/src/arrow/dataset/scanner.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's useful to make this inline.
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments
cpp/src/arrow/dataset/scanner.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be a task group instead of a thread pool. Then users can pass a serial task group to signal single threaded operation
cpp/src/arrow/dataset/scanner.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| batches.emplace_back(batch); | |
| batches.emplace_back(std::move(batch)); |
8d6e72e to
10ed013
Compare
The caller may request a parallel construction of the table. Scanner was refactor to own the ScanOptions and ScanContext members. The `use_threads` options was added to ScanOptions so the caller can indicate if Scanner is allowed to use parallelism.
10ed013 to
210f190
Compare
The caller may request a parallel construction of the table. Scanner was refactored to own the ScanOptions and ScanContext members.