-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9399] Change DataflowWorkerLoggingHandler to report errors to t… #12825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lukecwik Luke can you review as you have context from last time? Changing to synchronizing on buffer instead of the PrintStream changes the precondition to just enforce the invariant on when publish is called within our custom PrintStream. That means that the original deadlock can still occur if this happens: T1: synchronizes on System.err (Throwable.printStackTrace for example), publishes to handler I removed that case by using the custom ErrorManager to print to the original stderr stream, an alternative would be to change the ErrorManager still use our custom PrintStream but remove use of Throwable.printStackTrace so as not to sychronize on the PrintStream. |
|
I believe the trouble here is that it mixes all log levels on stderr, while the log reporting in the UI reports all stderr lines at the same severity. I cannot recall if they are all INFO or all WARN in the UI, but either way it is incorrect and confusing. So we need the corresponding change to make sure that doesn't happen. Just drive by comment because I've encountered this. |
|
I am only changing how errors from within the DataflowWorkerLoggingHandler itself are reported, for example an error publishing to stackdriver. I agree that logging to stderr is difficult to view in the UI, but that seems separate from the deadlock we originally fixed (the Jira has more details) and the erroneous precondition introduced when that was added. Kenn, does the limited scope of this change address your concerns? I am not changing it to use the original System.err generally |
|
@scwhittle Also, wouldn't we solve the locking problem if we always held the same "flush" lock when interacting with the error manager? |
|
@kennknowles |
|
@lukecwik that is how I feel about System.err as well. Java disagrees and writes all logs to stderr unless you disconnect that. IIRC I discovered through some other work that it is not disconnected here. |
|
Had a chat and we confirmed that the worker logging setup already handles my concern appropriately. Please disregard all comments. |
|
@lukecwik I think to have a single lock, we would need to use the PrintStream lock itself, as that is what is synchronized on by Throwable.printStackTrace and which can be synchronize outside and inside other synchronization blocks if we choose another lock to guard the buffer, ErrorManager, handler. Regarding on why to synchronize on buffer, that allows us to keep the preconditioncheck to sanity check our implementation. But that seems overkill, so I will just remove it. |
|
What is the next step for this PR? |
|
I have been busy with other higher-priority work but am planning on
finishing it up next week
…On Fri, Sep 25, 2020 at 1:55 AM Ahmet Altay ***@***.***> wrote:
What is the next step for this PR?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12825 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBZZTD2IMUNHYRAG6EBX53SHPL7XANCNFSM4RHI5ZUQ>
.
|
…he original System.err Currently such errors are logged to System.err which is a PrintStream that publishes to the handler. This is perhaps unlikely to work if earlier publishing failed and additionally removes a potential deadlock between the PrintStream object sychronization and the Handler object synchronization. This was attempted to be fixed earlier by dissallowing the PrintStream object to be synchronized when calling into the handler. However this is possible to be triggered by external synchronization on the PrintStream, such as that performed by Throwable.printStackTrace. Changing the PrintStream to use separate synchronization for buffering works in most cases but not for cases where the stream is externally synchronized.
|
@lukecwik PTAL, see above for why I think having a single lock would be hard/possibly error-prone. Let me know if you think it's worth pursuing that. |
|
Run Java PreCommit |
|
We can leave as is since the migration to using the Java SDK harness is upcoming and the Java SDK harness doesn't override system.out/system.err due to the long list of issues we have hit with this override to capture system.out/system.err logs in Dataflow. |
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please work with @robinyqiu to get this cherry picked into the 2.25 release branch.
Should https://issues.apache.org/jira/browse/BEAM-9399 be marked as a release blocker? |
…he original System.err
Currently such errors are logged to System.err which is a PrintStream that
publishes to the handler. This is perhaps unlikely to work if earlier publishing
failed and additionally removes a potential deadlock between the PrintStream
object sychronization and the Handler object synchronization. This was attempted
to be fixed earlier by dissallowing the PrintStream object to be synchronized
when calling into the handler. However this is possible to be triggered by
external synchronization on the PrintStream, such as that performed by
Throwable.printStackTrace. Changing the PrintStream to use separate synchronization
for buffering works in most cases but not for cases where the stream is externally
synchronized.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.