-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][tests] improved flaky test runs #16011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- improved PulsarFunctionTlsTests by reordering tearDown() logic - improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early - improved TestPulsarConnector memory consumption by removing unnecessary spy() - improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);
Jason918
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| for (int i = 0; i < numMessages; i++) { | ||
| Message<byte[]> msg = consumer.receive(30, TimeUnit.SECONDS); | ||
| Message<byte[]> msg = consumer.receive(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the test env is hectic with many concurrent jobs, I think we could see timeouts from time to time. If this sync receive response is really delayed, the test will eventually fail. So, yes, I think this will improve the test stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test env has reasonable computing resources in CI. Each build job has 2 CPUs and 7GB RAM. I think that it's a bug if receiving a message takes more than 30 seconds. Why would message receiving take more than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the clear answer here.
To me, 2 vCPUs might not be enough for the e2e tests, especially when running all pulsar components with dockers. I could be wrong here.
If this timeout issue started happening only recently, then I agree that we have a bug here. Please let me know if we do not want this change. The intention is to make the test more stable for other PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, 2 vCPUs might not be enough for the e2e tests, especially when running all pulsar components with dockers. I could be wrong here.
2 vCPUs and 7GB RAM is plenty of computation power & RAM for the tests that we have.
If this timeout issue started happening only recently, then I agree that we have a bug here. Please let me know if we do not want this change. The intention is to make the test more stable for other PRs.
I am not convinced that we should remove the timeout. The problem must be investigated, and the root cause should be fixed. There must be a bug in production code if 30 seconds isn't sufficient.
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the timeout change from this PR
|
* [improve][tests] improved flaky test runs - improved PulsarFunctionTlsTests by reordering tearDown() logic - improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early - improved TestPulsarConnector memory consumption by removing unnecessary spy() - improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS); * Reverted PulsarFunctionsTest consumer.receive() change (cherry picked from commit b1b25ef)
* [improve][tests] improved flaky test runs - improved PulsarFunctionTlsTests by reordering tearDown() logic - improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early - improved TestPulsarConnector memory consumption by removing unnecessary spy() - improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS); * Reverted PulsarFunctionsTest consumer.receive() change (cherry picked from commit b1b25ef) (cherry picked from commit 5f7a6af)
* [improve][tests] improved flaky test runs - improved PulsarFunctionTlsTests by reordering tearDown() logic - improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early - improved TestPulsarConnector memory consumption by removing unnecessary spy() - improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS); * Reverted PulsarFunctionsTest consumer.receive() change (cherry picked from commit b1b25ef)
* [improve][tests] improved flaky test runs - improved PulsarFunctionTlsTests by reordering tearDown() logic - improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early - improved TestPulsarConnector memory consumption by removing unnecessary spy() - improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS); * Reverted PulsarFunctionsTest consumer.receive() change (cherry picked from commit b1b25ef)
(If this PR fixes a github issue, please add
Fixes #<xyz>.)Fixes #
(or if this PR is one task of a github issue, please add
Master Issue: #<xyz>to link to the master issue.)Master Issue: #
Motivation
Fixing flaky tests.
Modifications
PulsarFunctionTlsTestsby reordering tearDown() logicManagedLedgerFactoryImpl.shutdown()by closing cacheEviction threads earlyTestPulsarConnectormemory consumption by removing unnecessary spy()Verifying this change
This change is already covered by existing tests.
Does this pull request potentially affect one of the following parts:
If
yeswas chosen, please highlight the changesDocumentation
Check the box below or label this PR directly.
Need to update docs?
doc-required(Your PR needs to update docs and you will update later)
doc-not-needed(Please explain why)
doc(Your PR contains doc changes)
doc-complete(Docs have been already added)