Skip to content

Conversation

@lhotari
Copy link
Member

@lhotari lhotari commented May 15, 2021

Motivation

GracefulExecutorServicesShutdownTest's shouldTerminateWhenFutureIsCancelled remains flaky after #10592 .

Example failure: https://github.com/apache/pulsar/pull/10598/checks?check_run_id=2590214027#step:10:1341

Modifications

  • fix race condition in test by adding a CountDownLatch to verify that execution has entered the awaitTermination method before the future is cancelled.
  • use CompletableFuture for awaitTerminationInterrupted check.

- fix race condition in test by adding a CountDownLatch
  to verify that execution has entered the awaitTermination
  method before the future is cancelled
@lhotari lhotari added this to the 2.8.0 milestone May 15, 2021
@lhotari lhotari requested a review from eolivelli May 15, 2021 13:04
@lhotari lhotari self-assigned this May 15, 2021
@lhotari lhotari requested review from codelipenghui and merlimat May 15, 2021 16:37
@linlinnn
Copy link
Contributor

What caused the flakiness was that this condition caused the execution to never get to the Thread.sleep when the thread was already interrupted:

Yes, I agree that.
But after call countdown, main thread can cancel the future, that means the Interruption exception can probably be caught by outer catch.
You can test it locally by add some time-consuming action after countdown.

@eolivelli eolivelli merged commit 102409d into apache:master May 16, 2021
@eolivelli
Copy link
Contributor

@linlinnn I have merged this patch in order to unblock CI.
We can continue the discussion and find a better fix if think that this is not the best way

@linlinnn
Copy link
Contributor

ok

@lhotari
Copy link
Member Author

lhotari commented May 16, 2021

But after call countdown, main thread can cancel the future, that means the Interruption exception can probably be caught by outer catch.
You can test it locally by add some time-consuming action after countdown.

I don't see how this could happen. Interrupting a thread doesn't stop execution. It sets the interrupted status and interrupts any blocking methods such as sleep or blocking IO with an InterruptedException. Calling countDown will never be interrupted.
Makes sense?

@linlinnn
Copy link
Contributor

@lhotari
Oh, I make a wrong understand with the outer catch InterruptedException in GracefulExecutorServicesTerminationHandler#awaitTermination since executor.awaitTermination has been mocked.
Sorry for my mistake.

@lhotari
Copy link
Member Author

lhotari commented May 16, 2021

Oh, I make a wrong understand with the outer catch InterruptedException in GracefulExecutorServicesTerminationHandler#awaitTermination since executor.awaitTermination has been mocked.
Sorry for my mistake.

No worries, the mocking was quite unexpected for executor.awaitTermination.

btw. It seems that this test became very flaky after the switch to JDK11. To get the fix to other PRs, it is either necessary to rebase or close and re-open a PR so that the changes get picked up. When re-running a failed build it won't pick up changes made into master branch. New PR builds or reopened PR builds will pick up changes.

yangl pushed a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
…0599)

- fix race condition in test by adding a CountDownLatch
  to verify that execution has entered the awaitTermination
  method before the future is cancelled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants