Skip to content

[C++] Investigate utilizing aggressive thread task creation when adding callback to finished future #28320

@asfimport

Description

@asfimport

Imagine there is a slow map function (that could run in parallel) and a vector generator given a long vector of tasks.  If we apply map to the generator and then readahead we won't actually get any parallelism because the vector generator returns everything synchronously and so no thread task will ever be submitted.

This hypothetical situation is a reality in some situations in the scanner.  For example, if scanning CSV files and the CPU threads fall behind the I/O threads then all callbacks will be synchronous (since the futures will already have been completed by the I/O threads).

In such a situation we might benefit from creating a new thread task even though we wouldn't normally create one.  For example, if we have an idle core.  You can think of this as an analogue of work stealing.

On the other hand, creating new thread tasks at any random callback might not be the most efficient. We could mitigate this by marking a callback as "potentially long" as some kind of hint when we add the callback to indicate it as eligible for eager thread creation.

Reporter: Weston Pace / @westonpace
Assignee: Weston Pace / @westonpace

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12560. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions