Skip to content

Conversation

@westonpace
Copy link
Member

Unfortunately, this seems to have made the merge generator, which was already quite complicated, even more complicated. I'd welcome any suggestions for simplification. In the meantime, even though this one generator is more complicated, I think this allows us to simplify code using async generators considerably.

This is a prerequisite for #12468 because there is no way to keep the serial generator alive after the async generator has been destroyed (we can't use shared_ptr in this case)

…asks complete before the first terminal item is emitted.
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@westonpace
Copy link
Member Author

westonpace commented Mar 18, 2022

Not sure if this helps but here is a rough diagram of the change
Blank diagram(2)

@westonpace westonpace changed the title [C++] Update AsyncGenerator semantics to emit a terminal item only after all outstanding futures have completed ARROW-15968: [C++] Update AsyncGenerator semantics to emit a terminal item only after all outstanding futures have completed Mar 18, 2022
@github-actions
Copy link

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to need to go back through the merged generator a few times…

return IsComplete();
}

AsyncGenerator<AsyncGenerator<T>> source;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the sake of future readers we should give each of these fields a description and invariant (and probably each of the structs too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also added some extra comments and a (somewhat lengthy) general description of the algorithm. Let me know if it was over the top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps a lot - thank you!

Comment on lines 1173 to 1175
bool first;
bool broken;
bool source_exhausted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's a state machine we're moving through but the number of possible states also seems quite large…so I'm not sure if that'd actually make things clearer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with multiple boolean flags it's also already hard to reason about behavior in different situations

Copy link
Member Author

@westonpace westonpace Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a few coarse grained semantic states:
Unstarted -> Priming -> Running -> Winding Down -> Completed. And any of those middle three can branch into Broken (which then eventually goes to Completed)

But I'm not sure how to use this information to make anything cleaner.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is still fairly complicated, but comprehensible

return IsComplete();
}

AsyncGenerator<AsyncGenerator<T>> source;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps a lot - thank you!

… blocking to wait on the fast-sync path of a generator. Also fixing a bug in the arrow::GatingTask impl
@lidavidm lidavidm closed this in acc6c2e Mar 25, 2022
@ursabot
Copy link

ursabot commented Mar 25, 2022

Benchmark runs are scheduled for baseline = d6a89e5 and contender = acc6c2e. acc6c2e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.59% ⬆️0.42%] test-mac-arm
[Finished ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.13% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants