-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-13060][coordination] Respect restart constraints in new RegionFailover strategy #8972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
GJL
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. I left some suggestions – we should at least fix the checkstyle violation.
...apache/flink/runtime/executiongraph/AdaptedRestartPipelinedRegionStrategyNGFailoverTest.java
Outdated
Show resolved
Hide resolved
...apache/flink/runtime/executiongraph/AdaptedRestartPipelinedRegionStrategyNGFailoverTest.java
Outdated
Show resolved
Hide resolved
...apache/flink/runtime/executiongraph/AdaptedRestartPipelinedRegionStrategyNGFailoverTest.java
Outdated
Show resolved
Hide resolved
...rg/apache/flink/runtime/executiongraph/failover/AdaptedRestartPipelinedRegionStrategyNG.java
Outdated
Show resolved
Hide resolved
…Failover strategy
GJL
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for merging
Based on #8922. Ignore everything but my commit.
Extends the added
AdaptedRestartPipelinedRegionStrategyNGto respect restart constraints imposed by the configuredRestartStrategy.Currently, the failover strategy does query
RestartStrategy#canRestartbefore executing a failover, but the RS is never informed about this failover happening. As a result it is not taken into account for the maximum number of failure attempts (or max rate), nor is the restart delay honored.This PR modifies the
AdaptedRestartPipelinedRegionStrategyNGto callRestartStrategy#restartbefore executing any failover, with aRestartCallbackthat will execute the failover logic.This change has a few repercussion on existing tests. Since the execution of the failover logic is now dependent on the restart strategy actually using the callback, we may no longer rely on the
InfiniteDelayRestartStrategyas this one does not do so. Additionally, since restart strategies use theschedulemethod, we now have to additionally callManuallyTriggeredScheduledExecutor#triggerScheduledTasks, as#triggerAll(for some reason) ignores scheduled tasks.AdaptedRestartPipelinedRegionStrategyNGFailoverTest#testFailoverExecutionDependentOnRestartStrategyRecoveryTriggertests that the failover behavior is dependent on the restart strategy. I'm not really happy with the test, as it effectively pauses the failover progress (after canceling the tasks, but before resetting them) and I suppose the exact state we should be in at this point is not well-defined.