-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Fix grpc data read thread block with finished instruction_id in _GrpcDataChannel #15293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@robertwb I created a PR for easy review. I added a comment. Please let me know if single commit is preferred. I'll squash the commits into one. |
Codecov Report
@@ Coverage Diff @@
## master #15293 +/- ##
==========================================
+ Coverage 83.78% 84.06% +0.28%
==========================================
Files 439 441 +2
Lines 59237 61147 +1910
==========================================
+ Hits 49632 51406 +1774
- Misses 9605 9741 +136
Continue to review full report at Codecov.
|
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks good.
|
For the lint/formatter, you can run |
|
Accidentally closed and reopen PR. |
d3deb0a to
b49b7d9
Compare
|
I fixed Python lint test failure but precheck still fails at Python PreCommit. I guess this failure is due to BEAM-12699, not by my change. @robertwb Could you let me know what I should do to merge this change? thanks! |
|
Run Python PreCommit |
|
All looks good now. Thanks for your contribution! |
Input data for completed instructions (especially, failed instruction) can cause the thread of
_GrpcDataChannel._read_inputs()and another thread of_GrpcDataChannel.input_elemnts()to be stuck.The current implementation makes this issue in the following scenario.
_GrpcDataChannel._read_inputs()._GrpcDataChannel.input_elements()fromBundleProcessor.process_bundle()._receivedin_receiving_queue()from_read_inputs()and put elements to the queue.input_elements()and throw an exception during processing the element. This remove the queue with_clean_receiving_queueininput_elements()'s finally clause._receivedagain! But, be blocked as queue full (maxsize = 5) indefinitely as there is no work thread pulling data elements of "process_bundle-10".input_elements()because Thread A has no progress for "process_bundle-12" data.As a solution, I suggest managing completed instructions in
_GrpcDataChannelto avoid data elements queue to be restored so that gRPC data channel thread and work threads will have no blocking issue.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunnercompliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.