Skip to content

Conversation

@heesung-sohn
Copy link
Contributor

@heesung-sohn heesung-sohn commented Jun 10, 2022

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Fixing flaky tests.

Modifications

  • improved PulsarFunctionTlsTests by reordering tearDown() logic
  • improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
  • improved TestPulsarConnector memory consumption by removing unnecessary spy()

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

- improved PulsarFunctionTlsTests by reordering tearDown() logic
- improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
- improved TestPulsarConnector memory consumption by removing unnecessary spy()
- improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);
Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


for (int i = 0; i < numMessages; i++) {
Message<byte[]> msg = consumer.receive(30, TimeUnit.SECONDS);
Message<byte[]> msg = consumer.receive();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

@heesung-sohn heesung-sohn Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the test env is hectic with many concurrent jobs, I think we could see timeouts from time to time. If this sync receive response is really delayed, the test will eventually fail. So, yes, I think this will improve the test stability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test env has reasonable computing resources in CI. Each build job has 2 CPUs and 7GB RAM. I think that it's a bug if receiving a message takes more than 30 seconds. Why would message receiving take more than that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the clear answer here.

To me, 2 vCPUs might not be enough for the e2e tests, especially when running all pulsar components with dockers. I could be wrong here.

If this timeout issue started happening only recently, then I agree that we have a bug here. Please let me know if we do not want this change. The intention is to make the test more stable for other PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, 2 vCPUs might not be enough for the e2e tests, especially when running all pulsar components with dockers. I could be wrong here.

2 vCPUs and 7GB RAM is plenty of computation power & RAM for the tests that we have.

If this timeout issue started happening only recently, then I agree that we have a bug here. Please let me know if we do not want this change. The intention is to make the test more stable for other PRs.

I am not convinced that we should remove the timeout. The problem must be investigated, and the root cause should be fixed. There must be a bug in production code if 30 seconds isn't sufficient.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the timeout change from this PR

@heesung-sohn heesung-sohn requested a review from lhotari June 14, 2022 19:25
@heesung-sohn
Copy link
Contributor Author

remove the timeout change from this PR
@lhotari
I removed it. Please check this again by any chance.

@lhotari lhotari merged commit b1b25ef into apache:master Jun 14, 2022
codelipenghui pushed a commit that referenced this pull request Jun 28, 2022
* [improve][tests] improved flaky test runs
- improved PulsarFunctionTlsTests by reordering tearDown() logic
- improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
- improved TestPulsarConnector memory consumption by removing unnecessary spy()
- improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);

* Reverted PulsarFunctionsTest consumer.receive() change

(cherry picked from commit b1b25ef)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jul 4, 2022
* [improve][tests] improved flaky test runs
- improved PulsarFunctionTlsTests by reordering tearDown() logic
- improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
- improved TestPulsarConnector memory consumption by removing unnecessary spy()
- improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);

* Reverted PulsarFunctionsTest consumer.receive() change

(cherry picked from commit b1b25ef)
(cherry picked from commit 5f7a6af)
congbobo184 pushed a commit that referenced this pull request Nov 7, 2022
* [improve][tests] improved flaky test runs
- improved PulsarFunctionTlsTests by reordering tearDown() logic
- improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
- improved TestPulsarConnector memory consumption by removing unnecessary spy()
- improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);

* Reverted PulsarFunctionsTest consumer.receive() change

(cherry picked from commit b1b25ef)
@congbobo184 congbobo184 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Nov 7, 2022
congbobo184 pushed a commit that referenced this pull request Nov 30, 2022
* [improve][tests] improved flaky test runs
- improved PulsarFunctionTlsTests by reordering tearDown() logic
- improved ManagedLedgerFactoryImpl.shutdown() by closing cacheEviction threads early
- improved TestPulsarConnector memory consumption by removing unnecessary spy()
- improved PulsarFunctionsTest run by using receive() instead of receive(30, TimeUnit.SECONDS);

* Reverted PulsarFunctionsTest consumer.receive() change

(cherry picked from commit b1b25ef)
@heesung-sohn heesung-sohn deleted the fix-ci-tests branch April 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants