-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15968: [C++] Update AsyncGenerator semantics to emit a terminal item only after all outstanding futures have completed #12662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…asks complete before the first terminal item is emitted.
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to need to go back through the merged generator a few times…
| return IsComplete(); | ||
| } | ||
|
|
||
| AsyncGenerator<AsyncGenerator<T>> source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the sake of future readers we should give each of these fields a description and invariant (and probably each of the structs too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also added some extra comments and a (somewhat lengthy) general description of the algorithm. Let me know if it was over the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps a lot - thank you!
| bool first; | ||
| bool broken; | ||
| bool source_exhausted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's a state machine we're moving through but the number of possible states also seems quite large…so I'm not sure if that'd actually make things clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with multiple boolean flags it's also already hard to reason about behavior in different situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably a few coarse grained semantic states:
Unstarted -> Priming -> Running -> Winding Down -> Completed. And any of those middle three can branch into Broken (which then eventually goes to Completed)
But I'm not sure how to use this information to make anything cleaner.
…ge generator was given an empty outer subscription
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is still fairly complicated, but comprehensible
| return IsComplete(); | ||
| } | ||
|
|
||
| AsyncGenerator<AsyncGenerator<T>> source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps a lot - thank you!
… blocking to wait on the fast-sync path of a generator. Also fixing a bug in the arrow::GatingTask impl
|
Benchmark runs are scheduled for baseline = d6a89e5 and contender = acc6c2e. acc6c2e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |

Unfortunately, this seems to have made the merge generator, which was already quite complicated, even more complicated. I'd welcome any suggestions for simplification. In the meantime, even though this one generator is more complicated, I think this allows us to simplify code using async generators considerably.
This is a prerequisite for #12468 because there is no way to keep the serial generator alive after the async generator has been destroyed (we can't use shared_ptr in this case)