Skip to content

Conversation

@zentol
Copy link
Contributor

@zentol zentol commented Jul 3, 2019

Based on #8922. Ignore everything but my commit.

Extends the added AdaptedRestartPipelinedRegionStrategyNG to respect restart constraints imposed by the configured RestartStrategy.

Currently, the failover strategy does query RestartStrategy#canRestart before executing a failover, but the RS is never informed about this failover happening. As a result it is not taken into account for the maximum number of failure attempts (or max rate), nor is the restart delay honored.

This PR modifies the AdaptedRestartPipelinedRegionStrategyNG to call RestartStrategy#restart before executing any failover, with a RestartCallback that will execute the failover logic.

This change has a few repercussion on existing tests. Since the execution of the failover logic is now dependent on the restart strategy actually using the callback, we may no longer rely on the InfiniteDelayRestartStrategy as this one does not do so. Additionally, since restart strategies use the schedule method, we now have to additionally call ManuallyTriggeredScheduledExecutor#triggerScheduledTasks, as #triggerAll (for some reason) ignores scheduled tasks.

AdaptedRestartPipelinedRegionStrategyNGFailoverTest#testFailoverExecutionDependentOnRestartStrategyRecoveryTrigger tests that the failover behavior is dependent on the restart strategy. I'm not really happy with the test, as it effectively pauses the failover progress (after canceling the tasks, but before resetting them) and I suppose the exact state we should be in at this point is not well-defined.

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 3, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Member

@GJL GJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. I left some suggestions – we should at least fix the checkstyle violation.

Copy link
Member

@GJL GJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for merging

@zentol zentol merged commit fd85207 into apache:master Jul 5, 2019
@zentol zentol deleted the 13060_b branch July 24, 2019 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants