-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-3598. Fix trunk build #1143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ntire test runner to shutdown
|
If we call System.exit Maven forked process will exit and maybe we are running fewer tests than expected. |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix I have on other projects for System.exit is to add a SecurityManager that prevents calls like System.exit, this way System.exit ends in a SecurityException.
But the best way is to not call System.exit in tests
| LOG.warn("CommitProcessor does not shutdown gracefully after " | ||
| + "waiting for {} ms, exit to avoid potential " | ||
| + "inconsistency issue", workerShutdownTimeoutMS); | ||
| System.exit(ExitCode.SHUTDOWN_UNGRACEFULLY.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catch (InterruptedException) won't help for System.exit.
Maybe the best idea is to have a system property to disable the real 'System.exit' and add such property in tests (Maven surefire plugin config)
|
@eolivelli Adding some more context, because I think you misunderstood my fix. The problem with the original code is that while the thread is waiting for CommitProcessor thread to finish in My fix changes the original behavior to avoid further processing by quiting the entire shutdown method if calling thread has been interrupted. This significantly changes the original implementation, hence we need to look very carefully. |
9f126c7 to
12638ec
Compare
|
@anmolnar now I understand better your fix. To me it is not very clear if we can skip calling the next processor. This is a behavior change. We can simply also revert the original fix and re-open the JIRA, this way the master branch will be clear and we will have time to fix the original patch. |
|
@eolivelli Sounds good to me. It's not just Ant. I cannot run tests with Maven on my local machine either. That's why I don't understand how could be the Maven build green: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/zookeeper-master-maven/521/org.apache.zookeeper$zookeeper/testReport/ Btw I cannot see |
|
Look at this. The test didn't succeed since build no. 514 and the job has not reported it as an error: The error is the same: |
|
I see the same error message in successful Maven precommit jobs like this: |
|
yep, it is a known and annoying problem in Surefire, if the forked VM crashes there is no report of the problem, only in the logs. btw it is better not to call System.exit in tests. I see the same problem using Kafka Server in "unit" tests of downstream applications, embedded Kafka Server breaks the JVM and this is really annoying |
|
@eolivelli Makes sense. If it's a limitation of Surefure plugin, I think the best would be to revert the patch and avoid using System.exit anywhere in our codebase. The workaround that you suggested could also be acceptable for me. I'm closing this one and pushing the revert. @lvfangmin fyi. |
|
If the CommitProcessor thread.join is interrupted, and the thread is still alive, we cannot call nextProcessor.shutdown, otherwise it will affect the correctness. System.exit is the safe way to avoid the potential correctness issue here. @anmolnar can you help me understand why the tearDown interrupted the CommitProcessor shutdown()? I checked the unit test, and didn't see we'll interrupted the code, is it from junit? Maybe we should change the unit test if it's possible instead of reverting the code which will actually affect the prod. |
|
@lvfangmin Good question, I have no idea. |
@lvfangmin @hanm @eolivelli
You were working on #1130 . Please check this fix, I'm not sure if it's appropriate.
When running the unit test tearDown() thread gets an interrupt which will cause the entire test runner to shut down. This affects both Maven and Ant builds. I have no idea why we can't see it in Maven jobs.