Skip to content

Conversation

@westonpace
Copy link
Member

While scanning we do our best to readahead multiple files so we will read files 1, 2, 3, and 4 all at the same time. This helps to maintain bandwidth when some files hit a snag (sometimes happens on AWS). However, when doing an ordered scan, this can cause backpressure to explode when there is a slow consumer.

The sequencer (placed at the end of the pipeline) can get into a situation where it pulls aggressively from files 2, 3, and 4 while waiting for the next chunk from file 1. Since the sequencer is consuming the batches the backpressure mechanism thinks they are being consumed. However, the actual consumer is leaving the batches piling up at the sequencer.

This PR introduces one possible solution (and it may be the only possible solution) which is to sequence the batches at merge time (early in the pipeline). The sequencer won't need to pull aggressively and backpressure will be maintained. This pretty significantly reduces (but does not eliminate) the amount of file readahead we do in ordered scans. We can worry about that if it ends up being a bottleneck at some point but for now I think it is better we do not explode RAM.

This builds on ARROW-13611 and will remain in draft until that PR has merged.

@github-actions
Copy link

github-actions bot commented Oct 2, 2021

@github-actions
Copy link

github-actions bot commented Oct 2, 2021

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@westonpace westonpace force-pushed the feature/ARROW-14192--backpressure-ordered-scan branch 3 times, most recently from 3cf0f03 to daa1840 Compare October 8, 2021 01:05
@westonpace westonpace force-pushed the feature/ARROW-14192--backpressure-ordered-scan branch from daa1840 to 9b63bd5 Compare October 12, 2021 22:53
@westonpace westonpace marked this pull request as ready for review October 12, 2021 23:35
@westonpace westonpace requested a review from bkietz October 12, 2021 23:42
@westonpace westonpace force-pushed the feature/ARROW-14192--backpressure-ordered-scan branch from c9373f4 to 68b997d Compare October 15, 2021 20:32
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but left some questions (more for my own understanding).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we have a good number of these now, but timing-based tests always make me feel somewhat icky. Is there value to testing this in Python as well as C++?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do something like https://issues.apache.org/jira/browse/ARROW-14367 we can remove the timing dependency. That being said, I understand the concern, and there was similar concern in the dataset write backpressure PR: #11286 (comment)

I will remove this test from this PR and rely on the existing python test. When ARROW-14367 is implemented the timing dependency can be removed from the other test (and we can maybe reintroduce this test at that point).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has been removed.

westonpace and others added 6 commits October 18, 2021 11:42
…readahead but won't allow backpressure to explode if scanning in a sequenced fashion.
… some grammar in async_generator.h comments. Removed timing-dependent backpressure test to avoid having too many timing dependent tests.
Co-authored-by: David Li <li.davidm96@gmail.com>
@westonpace westonpace force-pushed the feature/ARROW-14192--backpressure-ordered-scan branch from e6ec85b to 189516f Compare October 18, 2021 21:45
@westonpace
Copy link
Member Author

Rebased and addressed PR comments.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@westonpace
Copy link
Member Author

CI failures are unrelated (could not find LLVM) and these tests passed on earlier versions. I will proceed with merging.

@ursabot
Copy link

ursabot commented Oct 19, 2021

Benchmark runs are scheduled for baseline = f2f663b and contender = 9abd2b1. 9abd2b1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.4% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@kou
Copy link
Member

kou commented Oct 19, 2021

@westonpace It seems that "C++ / AMD64 Ubuntu 20.04 C++ ASAN UBSAN" is failed by this.

@westonpace westonpace deleted the feature/ARROW-14192--backpressure-ordered-scan branch January 6, 2022 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants