KAFKA-14244: Add guard against accidental calls to halt JVM during testing#12666
KAFKA-14244: Add guard against accidental calls to halt JVM during testing#12666C0urante wants to merge 5 commits intoapache:trunkfrom
Conversation
| includeTags "integration" | ||
| includeTags "org.apache.kafka.test.IntegrationTest" | ||
| // Both engines are needed to run JUnit 4 tests alongside JUnit 5 tests. | ||
| // Both engines are needed to run JUnit 4 tests alongside JUnit 5 tests. |
There was a problem hiding this comment.
Unrelated whitespace fix; this line was not aligned with others in my IDE.
| excludeTags "integration" | ||
| excludeTags "org.apache.kafka.test.IntegrationTest" | ||
| // Both engines are needed to run JUnit 4 tests alongside JUnit 5 tests. | ||
| // Both engines are needed to run JUnit 4 tests alongside JUnit 5 tests. |
There was a problem hiding this comment.
Unrelated whitespace fix; this line was not aligned with others in my IDE.
|
@C0urante , interesting solution. But I'm wondering how you test it and prove this can really catch |
|
This is a good idea and definitely an improvement over what already exists. But I think that the downside of taking a dependency on something that is deprecated as soon as JDK 18 is quite significant. Perhaps, we can solve the underlying problem in a different way! Let me explain my point of view on this. Some of our tests spawn new threads. The expectation for the happy case path is that these threads are terminated by the test gracefully at the end of test execution. In case when these threads are not terminated, they may cause erroneous/undeterministic behaviour such as calling System.Exit(). Instead of focusing on Exit() (which is not the underlying cause, it's a side effect), could we try to formulate a solution which ensures that the test would fail if threads created by that test are not terminated at the end of the execution? I don't have a proposal towards a solution but there are ideas that we could explore such as ensuring all threads created by a test have a property in their thread local (this could be set in @before and removed in @after). Thoughts? |
|
@showuon Thanks for taking a look. I did some local testing by injecting calls to |
|
@divijvaidya Thank you for your thoughts. I don't quite agree that attempting to terminate the JVM is a symptom rather than the root cause, since there are cases where these attempts occur without threads being leaked (one such case was touched on in #9698). In addition, the main issue here is that these attempts tend to crop up only during CI builds and not locally, in which case the cost of allowing the JVM to be terminated is higher as debugging is harder and the build attempt is rendered useless. I'm also not very worried about the deprecation of the That said, if you have the time to develop and implement an alternative proposal, I'd be happy to review a POC. I'm not certain we really can track thread creation and termination during tests without a serious refactoring of the code base, but if there's a lightweight way to do this, it would probably be worth considering either instead of or in addition to hardening our testing logic against JVM termination. |
|
@C0urante Thank you for explaining. I stand corrected that thread leaks are the only cause of |
|
@showuon I've added some tests for the custom extension that should demonstrate its effectiveness. I should note that I also experimented with an approach that leveraged JUnit 5's test class ordering API to create a test that runs after all others have completed and checks for leaked threads that have tried to terminate the JVM, instead of performing that check after every test and failing potentially-unrelated tests in order to surface the problem. Unfortunately, I couldn't find a way to make this work with parallel testing, since the test would only be run on a single executor, and would often finish before other tests on other executors. |
|
@C0urante You mentioned in your PR that we could potentially adjust the |
|
@ijuma It's partially that we sometimes have direct calls to I think if we could find a way to address the latter point, it'd be worth it to just modify the existing |
|
I think there's no way around the junit extension, but I was wondering if we could use that with the |
|
@ijuma Sorry for the delay. The tradeoff involved is whether we want to account for direct calls to |
|
I've updated the PR to not use a custom @divijvaidya @ijuma If you have time, let me know what you think; hoping this is readable to fresh eyes, but want to make sure. |
|
Closing due to lack of interest; we can revisit this if necessary. |
Jira
Uses an automatically-registered JUnit 5 extension that installs a SecurityManager which intercepts attempts to terminate the JVM that appear to have come from test cases, and then reports those attempts by causing tests to fail. If the test that attempted to terminate the JVM is running at the time of the attempt, that test will fail. If not, a random test will fail at a later point (in order to fail the CI build and surface the issue), with a message reporting that a prior test appears to have leaked a thread which has attempted to terminate the JVM after that test completed.The InheritableThreadLocal class is used to track which tests attempt to terminate the JVM, including cases where the attempt occurs in a separate thread, and even when the attempt has occurred after the test has already completed.Although theSecurityManagerAPI was deprecated and slated for removal in JEP-411, there is no existing alternative for intercepting attempts to terminate the JVM (see https://bugs.openjdk.org/browse/JDK-8199704). Once an alternative becomes available, we can use it instead of theSecurityManagerAPI.This is heavily inspired by the system-rules library, with some modifications to support parallel testing.It's worth noting that there is a potentially much-simpler fix where we modify theExitwrapper classes to useInheritableThreadLocalinstances to track custom procedures. However, this does not automatically prevent the JVM from being accidentally terminated during testing (tests still have to remember to use the wrapper class), and it does not provide a way to fail the build when unexpected attempts to terminate the JVM take place, potentially swallowing dangerous failures during testing and CI.Uses an automatically-registered JUnit 5 extension that installs a custom default exit procedure which intercepts attempts to terminate the JVM that appear to have come from test cases, and then reports those attempts by causing tests to fail. If the test that attempted to terminate the JVM is running at the time of the attempt, that test will fail. If not, a random test will fail at a later point (in order to fail the CI build and surface the issue), with a message reporting that a prior test appears to have leaked a thread which has attempted to terminate the JVM after that test completed.
Committer Checklist (excluded from commit message)