-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add JavaDoc to BundleManager #26287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JavaDoc to BundleManager #26287
Conversation
The current implementation tries to start the bundle, then adds an element to the new bundle. countNewElement is a better naming, because 1. A new element is always added, but a bundle is not always started. 2. The only usage of this method is in DoFnOp.processElement, it reads better to context. We can also add a startBundle interface in the future, but there are currently no use cases. The new name changes the subject to adding an element as a theme, and starting the bundle as a lazy side effect. It aligns more with what the method is actually doing. One symmetry we are breaking is in DoFnOp.processElement, we had a `tryStartBundle` paired with a `tryFinishBundle`. Instead now we have `countNewElement` paired with `tryFinishBundle`. I think the old reading favors 1 element per bundle, while the new reading favors multiple elements in a bundle, and we check if the bundle is full/should be finished after each add. Other changes: 1. Renamed some tests.
| assertEquals( | ||
| "Expected pending bundle count to be 0", 0L, bundleManager.getPendingBundleCount()); | ||
| assertFalse("Error didn't reset the bundle as expected.", bundleManager.isBundleStarted()); | ||
| bundleManager.countNewElement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removed block is already tested in testTryStartBundleThrowsExceptionFromTheListener. The only difference is the cause of tryStartBundle failure. I'm isolating this part to a new and smaller test.
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Overall, I feel the PR doesn't address/improve much rather introduces more confusion. Here are the reasons
The name Additionally, the symmetry of
Not sure I agree with this as well. So it very much reads with the context of All said, I think there is scope for refactor and improvements as I see. e.g., The existing bundle manager does two things
It is the way it is because we don't wait until we create a complete bundle and delegate it to the underlying runner as and when we receive elements. Due to this behavior, 1 & 2 exists together. It is possible to separate them out and have the All our use cases (classic & portability) uses the early delegation strategy and hence not critical to do the above mentioned refactor. There might be benefits in doing above if we plan to have multiple bundles in which case the bundle management might become a bit heavy and so does 2). |
|
+1 stick with the naming
[1] https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/transforms/ParDo.html |
I don't see how I would read Also having a bundle start with 1 element only makes half-sense to me. I like starting with 0, because when I count, I start counting from 0. It gives me security. |
imo two things is too many things. "Process messages, handle watermark and process timers." also makes it 4 things instead of 2 things. My ideal BundleManager should only count the elements and look at the time window to decide if the bundle is closed or not. The code that handles the watermark can ask BundleManager the state of the bundle, but it should not be BundleManager. Not sure how much we agree on this, but I'm not changing this part at the moment. |
xinyuiscool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the enhancements!
runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/BundleManager.java
Show resolved
Hide resolved
xinyuiscool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
The test failures are unrelated (downloading dependencies in mvn repo). Merge the pr since java precommit and other checks passed. |
Extract BundleManager to an Interface in SamzaRunner (apache#26268) Refactor DoFnOp.FutureCollectorImpl to a top level class in SamzaRunner (apache#26274) Add JavaDoc to BundleManager in Samza Runner (apache#26287)
Add JavaDoc to BundleManager.
This was a repurposed PR.
Original description: name tryStartBundle to countNewElement.
The current implementation tries to start the bundle, then adds an element to the new bundle. countNewElement is a better naming, because
We can also add a startBundle interface in the future, but there are currently no use cases.
The new name changes the subject to adding an element as a theme, and starting the bundle as a lazy side effect. It aligns more with what the method is actually doing.
One symmetry we are breaking is in DoFnOp.processElement, we had a
tryStartBundlepaired with atryFinishBundle. Instead now we havecountNewElementpaired withtryFinishBundle. I think the old reading favors 1 element per bundle, while the new reading favors multiple elements in a bundle, and we check if the bundle is full/should be finished after each add.Other changes:
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.