Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Apr 1, 2021

Calling the async streaming CSV reader from the synchronous Scanner::Scan was causing a form of nested parallelism and causing nested deadlocks. This commit brings over some of the work in ARROW-7001 and allows the CSV scan task to be called in an async fashion. In addition, an async path is put in the scanner and dataset write so that all internal uses of ScanTask()->Execute happen in an async-friendly way. External uses of ScanTask()->Execute should already be outside the CPU thread pool and should not cause deadlock.

Some of this PR will be obsoleted by ARROW-7001 but the work in file_csv and the test cases should remain fairly intact.

…n was a form of nested parallelism and causing nested deadlocks. This commit brings over some of the work in ARROW-7001 and allows the CSV scan task to be called in an async fashion. In addition, an async path is put in the scanner and dataset write so that all internal uses of ScanTask()->Execute happen in an async-friendly way. External uses of ScanTask()->Execute should already be outside the CPU thread pool and should not cause deadlock
@github-actions
Copy link

github-actions bot commented Apr 1, 2021

@westonpace
Copy link
Member Author

The JNI failure looks to be a timeout. The integration failure is unrelated (Rust error). The R Windows 3.5 error seems legitimate. I am investigating.

@lidavidm lidavidm closed this in 3d87a0e Apr 2, 2021
pachadotdev pushed a commit to pachadotdev/arrow that referenced this pull request Apr 5, 2021
…run synchronously from datasets

Calling the async streaming CSV reader from the synchronous Scanner::Scan was causing a form of nested parallelism and causing nested deadlocks.  This commit brings over some of the work in ARROW-7001 and allows the CSV scan task to be called in an async fashion.  In addition, an async path is put in the scanner and dataset write so that all internal uses of ScanTask()->Execute happen in an async-friendly way.  External uses of ScanTask()->Execute should already be outside the CPU thread pool and should not cause deadlock.

Some of this PR will be obsoleted by ARROW-7001 but the work in file_csv and the test cases should remain fairly intact.

Closes apache#9868 from westonpace/bugfix/arrow-12161

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: David Li <li.davidm96@gmail.com>
@westonpace westonpace deleted the bugfix/arrow-12161 branch April 14, 2021 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants