Zookeeper loss by michael-trelinski · Pull Request #6740 · apache/druid

michael-trelinski · 2018-12-14T21:45:36Z

This pull request addresses Proposal #6518. The main idea is that upon zookeeper being unreachable after retries, it should forcefully exit so that a daemon monitor can restart the process. This new behavior is disabled by default and enabled by a config file setting.

Fix bin/init to source from proper directory.

…loss of ZK connectivity

…to zookeeper_loss # Conflicts: # extensions-core/parquet-extensions/src/main/resources/META-INF/services/org.apache.druid.initialization.DruidModule

sascha-coenen · 2018-12-15T01:27:02Z

Hi. I'm running Druid on Kubernetes and the fact that the Curator library is not dealing with connection disruptions properly is the one single remaining issue that prevents my setup from being resilient. So I love your motion as it is much needed. I'm not a committer, so I cannot comment on whether the approach chosen is sound, but in one way or another, something has to be done for sure.

As as side note, out of shear curiosity I wonder whether there is any good reason to use Curator at all. I know that there are plans to move away from Zookeeper, but it is not Zookeeper that is causing all the issues - it is the Curator client library that is the culprit. I wonder why the Druid Devs don't just throw out Curator and be done with it. Would be much easier than getting rid of Zookeeper, which they can still do, but as a migration path action, I would throw out Curator first.

The most natural thing to do would have been to fix the following Curator issue:
https://issues.apache.org/jira/browse/CURATOR-229
But this issue has been reported in 2015 and it still did not get fixed although it is a severe bug that is impeding many people.

Druid is also not using the most recent version of Curator if I'm not mistaken. Version 4.0.1 has been released in Feb 2018 while Druid master uses 4.0.0.

michael-trelinski · 2018-12-15T07:01:17Z

Hi Sascha, Thank you for your interest in my PR. I would like to make a couple of comments (some are non-sequiters) about why I did what I did: 1. Fixing this in Curator (or Zookeeper) would affect a whole lot of systems (and systems of systems). I believe I have tested this well, but I would prefer not to interfere with every other system that uses Zookeeper on the planet’s stability. If Curator is interested in my workaround, they may have it. As it is now (pending PR approval), it would be available by explicitly enabling a config file option and sandboxes the change only to Druid. I feel that this is the best way to roll out something this monumental. 2. After researching the problem (and also simulating why the RetryPolicy implementations didn’t catch this problem completely), I determined that Zookeeper itself is never really sure when a connection to a server is down or the service is down. In my opinion, Curator had the best foundation for handling this scenario. 3. Our reason for this fix had a very specific cause: our DNS response for our Zookeeper servers changed IP addresses, and Java seemed to cache them forever. Even with the BoundedExponentialBackoff retry policy timing out/giving up, Zookeeper would loop forever and keep a thread alive, which would prevent a clean JVM exit. We would prefer that the service die (and allow a daemon to restart it) rather than produce bad data. This may not fit everyone’s use case. 4. I am excited for my first meaty Druid PR, and I hope that the community accepts this change. Thank you. Best, Mike

…

On Dec 14, 2018, at 5:27 PM, Sascha Coenen ***@***.***> wrote: Hi. I'm running Druid on Kubernetes and the fact that the Curator library is not dealing with connection disruptions properly is the one single remaining issue that prevents my setup from being resilient. So I love your motion as it is much needed. I'm not a committer, so I cannot comment on whether the approach chosen is sound, but in one way or another, something has to be done for sure. As as side note, out of shear curiosity I wonder whether there is any good reason to use Curator at all. I know that there are plans to move away from Zookeeper, but it is not Zookeeper that is causing all the issues - it is the Curator client library that is the culprit. I wonder why the Druid Devs don't just throw out Curator and be done with it. Would be much easier than getting rid of Zookeeper, which they can still do, but as a migration path action, I would throw out Curator first. The most natural thing to do would have been to fix the following Curator issue: https://issues.apache.org/jira/browse/CURATOR-229 But this issue has been reported in 2015 and it still did not get fixed although it is a severe bug that is impeding many people. Druid is also not using the most recent version of Curator if I'm not mistaken. Version 4.0.1 has been released in Feb 2018 while Druid master uses 4.0.0. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

kaijianding · 2018-12-16T04:07:23Z

+1
LGTM
I did the same thing in my production environment to let druid process exit when zookeeper server is down for a very long time. This PR is definitely useful.

leventov · 2018-12-20T17:59:45Z

+
+  private final Function<Void, Void> exitFunction;
+
+  public BoundedExponentialBackoffRetryWithQuit(Function<Void, Void> exitFunction, int baseSleepTimeMs, int maxSleepTimeMs, int maxRetries)


Line longer than 120 cols.

leventov · 2018-12-20T18:02:58Z

  @JsonProperty("authScheme")
  private String authScheme = "digest";

+  @JsonProperty("quitOnConnectFail")


IMO "quit" is a little vague. Maybe "terminateDruidProcess".

Please add documentation for this property.

Fixed.

Added documentation.

leventov · 2018-12-20T18:03:12Z

+
+  private final Function<Void, Void> exitFunction;
+
+  public BoundedExponentialBackoffRetryWithQuit(Function<Void, Void> exitFunction, int baseSleepTimeMs, int maxSleepTimeMs, int maxRetries)


Line longer than 120 cols.

Why using Function<Void, Void> rather than Runnable?

Fixed line length.

Why using Function<Void, Void> rather than Runnable?

Initially, I used a Callable, but discovered that the method signature included a "throws" which played badly with the allowRetry method signature. I switched to function because it doesn't suffer from that. I don't want to use Runnable because it would lead people to believe that it needs to be executed from a Thread, which it does not.

To me, Function<Void, Void> is more confusing. Could you please define an interface like "ExitAction"?

I switched it to your original choice of Runnable rather than to write an interface that looked identical to Runnable/Callable.

leventov · 2018-12-20T18:04:44Z

+    return shouldRetry;
+  }
+
+  public Function<Void, Void> getExitFunction()


Why this getter is needed?

I removed the getter. I didn't know that's how we did things. My apologies.

leventov · 2018-12-20T18:05:19Z

+
+import java.util.function.Function;
+
+public class BoundedExponentialBackoffRetryWithQuit extends BoundedExponentialBackoffRetry


Maybe use composition instead of inheritance?

Point taken, but I think this is simple, and if the base class changes its constructor, we'll have to refactor either way. I wanted this based off of the BoundedExponentialBackoffRetry logic, so I subclassed it.

Could you please add a Javadoc comment "BoundedExponentialBackoffRetryWithQuit extends BoundedExponentialBackoffRetry for simplicity. It's not actually a BoundedExponentialBackoffRetry from the Liskov substitution principle point of view, but it doesn't matter in this code."

leventov · 2018-12-20T18:07:03Z

        .ensembleProvider(ensembleProvider)
        .sessionTimeoutMs(config.getZkSessionTimeoutMs())
-        .retryPolicy(new BoundedExponentialBackoffRetry(BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES))
+        .retryPolicy(config.getQuitOnConnectFail() ? new BoundedExponentialBackoffRetryWithQuit(exitFunction, BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES) : new BoundedExponentialBackoffRetry(BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES))


Line longer than 120 cols. Extract a variable.

This has been extracted to a variable, but I couldn't find a clean way to do the conditional on multiple lines, so I just did an if-else refactor.

leventov · 2018-12-20T18:07:29Z

        exConfig.getRestUriPath(),
        exConfig.getPollingMs(),
-        new BoundedExponentialBackoffRetry(BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES)
+        config.getQuitOnConnectFail() ? new BoundedExponentialBackoffRetryWithQuit(exitFunction, BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES) : new BoundedExponentialBackoffRetry(BASE_SLEEP_TIME_MS, MAX_SLEEP_TIME_MS, MAX_RETRIES)


Line longer than 120 cols. Extract a variable.

leventov · 2018-12-20T18:13:17Z

+      public Void apply(Void someVoid)
+      {
+        log.error("Zookeeper can't be reached, forcefully stopping lifecycle...");
+        lifecycle.stop();


Are you sure that this function will actually run before the function configured in makeEnsembleProvider() will terminate the process?

I'm not sure I understand the question. Can you please rephrase?

EnsembleProvider created in makeEnsembleProvider() method, where you also added exit (but without stopping a lifecycle), is injected only in this method. So they are somehow related. I don't know what EnsembleProvider provider is and which termination action will kick in first.

When I was testing this, I could never make the makeEnsembleProvider(...)-created RetryPolicy actually fire. It was always the makeCurator(...)-created RetryPolicy that fired (and exited). I'm just covering all the bases.

I think the reason of this should be understood. What if under some conditions the other exit function configured in makeEnsembleProvider() will suddenly kick in first and lifecycle will be left unfinished?

At very least, the exit function configured in makeEnsembleProvider() should log that as such, to allow easier analysis later. And it should be expressed in comments in that code that there are two functions configured because nobody knows if the one configured in makeCurator() is sufficient.

It is clearly logged:

makeCurator()'s exit will look like this:

Zookeeper can't be reached, forcefully stopping lifecycle...
Zookeeper can't be reached, forcefully stopping virtual machine...

makeEnsembleProvider()'s exit will look like this:

Zookeeper can't be reached, forcefully stopping virtual machine...

leventov · 2018-12-20T18:22:00Z

Added Design Review tag because this PR adds a config parameter.

@kaijianding please add such "important" tags (along with Incompatible) when you are the first committer who visits a PR.

gianm · 2018-12-20T23:36:50Z

I haven't read the patch, but just wanted to say that I am generally supportive of the idea of servers nuking themselves if really bad, and possibly unrecoverable, things happen. Like OOMEs and ZK connection loss.

- Add feature documentation - Cosmetic refactors - Variable extractions - Remove getter

leventov · 2018-12-22T13:50:35Z

  @JsonProperty("authScheme")
  private String authScheme = "digest";

+  @JsonProperty("terminateDruidProcess")


I meant "terminateDruidProcessOnConnectFail", just replace "quit" part with "terminateDruidProcess"

Got it. Fixed.

leventov · 2018-12-22T13:53:07Z

+      public Void apply(Void someVoid)
+      {
+        log.error("Zookeeper can't be reached, forcefully stopping lifecycle...");
+        lifecycle.stop();


EnsembleProvider created in makeEnsembleProvider() method, where you also added exit (but without stopping a lifecycle), is injected only in this method. So they are somehow related. I don't know what EnsembleProvider provider is and which termination action will kick in first.

leventov · 2018-12-22T13:56:48Z

 |`druid.zk.service.user`|The username to authenticate with ZooKeeper. This is an optional property.|none|
 |`druid.zk.service.pwd`|The [Password Provider](../operations/password-provider.html) or the string password to authenticate with ZooKeeper. This is an optional property.|none|
 |`druid.zk.service.authScheme`|digest is the only authentication scheme supported. |digest|
+|`druid.zk.service.terminateDruidProcess`|When set to 'true', changes behavior of Druid processes that rely on Zookeeper to forcefully exit. If the connection to Zookeeper fails (after exhausting all exponential backoff retries), the process will die with an errorlevel of 1.|false|


I think something like

If set to 'true' and the connection to ZooKeeper fails (after exhausting all potential backoff retires), Druid process terminates itself with exit code 1.

would be clearer

leventov · 2018-12-22T13:59:00Z

+          log.error("Zookeeper can't be reached, forcefully stopping lifecycle...");
+          lifecycle.stop();
+          log.error("Zookeeper can't be reached, forcefully stopping virtual machine...");
+          System.exit(1);


If you really want to exit,

try { log.error("Zookeeper can't be reached, forcefully stopping lifecycle..."); // Maybe wrap with call and log exception too lifecycle.stop(); log.error("Zookeeper can't be reached, forcefully stopping virtual machine..."); } finally { System.exit(1); }

would be more robust

leventov · 2018-12-22T13:59:39Z

+        public Void apply(Void aVoid)
+        {
+          log.error("Zookeeper can't be reached, forcefully stopping virtual machine...");
+          System.exit(1);


try { log.error("Zookeeper can't be reached, forcefully stopping virtual machine..."); } finally { System.exit(1); }

Switched to this pattern. I didn't think it was necessary since I've never seen a logger kill a JVM.

Should this stop the lifecycle too? If this is getting hit it means that the curator killer is not getting hit and so I think this exit would be unclean.

Also, did you figure out why this needs to kill the process too? I see the discussion here, #6740 (comment) but it doesn't really come to a conclusion about whether it's needed.

I'm not sure it is needed. Presumably if exhibitor is down the zk processes managed by it are dead too, and I would expect the other killer will get hit? Or is this for a startup case where the other killer somehow doesn't get triggered? Regardless, could you also add a comment with the explanation?

@clintropolis I think the problem is that nobody knows answers to your questions and nobody has the determination to invest the effort to research this question (yet). I don't think we should demand this from @michael-trelinski. I think the PR in its current form is a move in the right direction anyway.

However, @michael-trelinski could you please add a comment in makeEnsembleProvider() along the lines with // It's unknown whether this precaution is needed, see https://github.com/apache/incubator-druid/pull/6740#discussion_r243368737 for details?

Sure, I wasn't expecting a deep investigation, I was suggesting either adding a lifecycle.stop() to make sure it stops cleanly and adding a comment to explain the we aren't sure if it is needed, OR just removing it until a scenario where an Exhibitor run Zookeeper is down but Druid doesn't kill itself and adding this block later when we find out it is needed.

Arguments to those provider methods are injected, so it could be available if you added it as an argument, e.g.

makeEnsembleProvider(CuratorConfig config, ExhibitorConfig exConfig, Lifecycle lifecycle) { ...

In master and the branch/commit I'm using, makeEnsembleProvider is declared thusly:

@Provides
@LazySingleton
public EnsembleProvider makeEnsembleProvider(CuratorConfig config, ExhibitorConfig exConfig)

I see what you're getting at. Apologies.

However, if it wasn't there to begin with and we're all not sure why makeEnsembleProvider is even there, should I be mucking around with it by adding a Lifecycle?

I don't think there is question of why makeEnsembleProvider exists, it provides the EnsembleProvider argument to makeCurator.

The only question I had is whether the quit on retry functionality needs to be wired into it's failures as well as curator's failures. I believe there should be no harm in using the injected lifecycle here as far as I can tell.

@michael-trelinski could you please link this discussion from the comment? There is no superstition that code should be "self-contained" in Druid, there are many links from the code to Github. It would be easier for readers to navigate. Currently, readers would be forced to dig into Git history to follow this path.

- Switch from Function<Void,Void> to Runnable/Lambda - try { … } finally { … }

leventov · 2019-01-10T11:20:56Z

+
+  public void setTerminateDruidProcessOnConnectFail(Boolean terminateDruidProcessOnConnectFail)
+  {
+    this.terminateDruidProcessOnConnectFail = terminateDruidProcessOnConnectFail == null ? false : terminateDruidProcessOnConnectFail;


Line longer than 120 cols. Please use if-else.

leventov · 2019-01-10T11:31:05Z

@michael-trelinski as a side note, please don't "mark converations as resolved". I wish this Github feature could be turned off. As a reviewer I anyway have to revisit each conversation and verify myself that it's resolved, marking it as resolved just adds clicking work for me.

clintropolis

Overall LGTM 👍

clintropolis · 2019-02-03T07:10:51Z

+  @Override
+  public boolean allowRetry(int retryCount, long elapsedTimeMs, RetrySleeper sleeper)
+  {
+    log.warn("Zookeeper can't be reached, retrying (retryCount = " + retryCount + " out of " + this.getN() + ")...");


Could you use a format string here:

log.warn("Zookeeper can't be reached, retrying (retryCount = %s out of %s)...", retryCount, this.getN());

clintropolis · 2019-02-03T07:14:00Z

+        try {
+          log.error("Zookeeper can't be reached, forcefully stopping lifecycle...");
+          lifecycle.stop();
+          log.error("Zookeeper can't be reached, forcefully stopping virtual machine...");


This should probably be System.err.println since log is probably dead after lifecycle stop. It won't explode, but it probably won't go anywhere either. See #6975, it will require adding @SuppressForbidden(reason = "System#err")

clintropolis · 2019-02-03T07:21:38Z

+        public Void apply(Void aVoid)
+        {
+          log.error("Zookeeper can't be reached, forcefully stopping virtual machine...");
+          System.exit(1);


Should this stop the lifecycle too? If this is getting hit it means that the curator killer is not getting hit and so I think this exit would be unclean.

Also, did you figure out why this needs to kill the process too? I see the discussion here, #6740 (comment) but it doesn't really come to a conclusion about whether it's needed.

I'm not sure it is needed. Presumably if exhibitor is down the zk processes managed by it are dead too, and I would expect the other killer will get hit? Or is this for a startup case where the other killer somehow doesn't get triggered? Regardless, could you also add a comment with the explanation?

- use System.err.println after lifecycle stops

egor-ryashin · 2019-02-15T17:20:23Z

+          System.err.println("Zookeeper can't be reached, forcefully stopping virtual machine...");
+        }
+        finally {
+          System.exit(1);


For me it looks like a hack, shouldn't we try to recreate a CuratorFramework?

Correct me if I'm wrong, please, we could have the whole Druid clusters restarted due to a ZK cluster shortage?

Correct me if I'm wrong, please, we could have the whole Druid clusters restarted due to a ZK cluster shortage?

@egor-ryashin That's the point. It is disabled by default. A lot of us want to have the Java process die (so that an outside process can restart it, like other popular Linux daemons). Some scenarios cause it to not completely die, because a Zookeeper thread hangs on forever.

Our use case was this: DNS/IT services changed the IPs of the zookeeper cluster after the JVM had already cached them. Had they been able to die gracefully, a daemon would have caused the cluster to "self heal." This use case is not an exhaustive list.

Also, we noticed that queries gave bad data during these times. Sometimes, for some use cases (as is ours), no data is preferable to bad data.

For me it looks like a hack, shouldn't we try to recreate a CuratorFramework?

This would not have fixed the issue in our use case.

Our use case was this: DNS/IT services changed the IPs of the zookeeper cluster after the JVM had already cached them.

I wonder whether using the option networkaddress.cache.ttl can help in that case?

It probably can.

However, all foreseeable use cases cannot be determined at this time.

Another one may have presented itself today. I contend that having this in there would be a net positive.

https://groups.google.com/forum/#!topic/druid-user/ZK0VMaEIM1w

egor-ryashin · 2019-02-15T17:47:23Z

+          System.err.println("Zookeeper can't be reached, forcefully stopping virtual machine...");
+        }
+        finally {
+          System.exit(1);


I believe you cannot write a unit-test for that.

You're basically right.

Without adding a bunch of PowerMockito dependencies for catching a static System.exit(..) call (which I don't want to do), I simulated it in a unit test by throwing an unchecked RuntimeException (instead of System.exit(...)) from an identical block of code and catching that.

clintropolis

sorry for delay, LGTM overall 👍. I don't feel strongly enough about makeEnsembleProvider also stopping the lifecycle to hold this PR up, since I think overall it's a useful feature.

clintropolis · 2019-03-05T01:18:32Z

@leventov or @egor-ryashin any further comments on this PR?

leventov · 2019-03-05T23:34:12Z

+        public Void apply(Void aVoid)
+        {
+          log.error("Zookeeper can't be reached, forcefully stopping virtual machine...");
+          System.exit(1);


@michael-trelinski could you please link this discussion from the comment? There is no superstition that code should be "self-contained" in Druid, there are many links from the code to Github. It would be easier for readers to navigate. Currently, readers would be forced to dig into Git history to follow this path.

leventov · 2019-03-05T23:34:55Z

+
+  private static final Logger log = new Logger(BoundedExponentialBackoffRetryWithQuitTest.class);
+
+  /*


(Optional) no reason why it shouldn't be a Javadoc comment.

could you please link this discussion from the comment? There is no superstition that code should be "self-contained" in Druid, there are many links from the code to Github. It would be easier for readers to navigate. Currently, readers would be forced to dig into Git history to follow this path.

@leventov I don't understand what you mean.

I mean that comments in code should say see see https://github.com/apache/incubator-druid/pull/6740#discussion_r243368737 or, more generally, see discussions in https://github.com/apache/incubator-druid/pull/6740 somewhere. In other words, the code should back reference to this PR.

@leventov Ok, done.

@michael-trelinski this is a wrong place. This link should be a part of the comment in CuratorModule. Specifically, it should extend your own comment

// It's unknown whether or not this precaution is needed. Tests revealed that this path was never taken.

Apologies. It's part of the thread-view for BoundedExponentialBackoffRetryWithQuitTest.java.

leventov · 2019-03-05T23:37:18Z

+
+import java.util.function.Function;
+
+public class BoundedExponentialBackoffRetryWithQuit extends BoundedExponentialBackoffRetry


Could you please add a Javadoc comment "BoundedExponentialBackoffRetryWithQuit extends BoundedExponentialBackoffRetry for simplicity. It's not actually a BoundedExponentialBackoffRetry from the Liskov substitution principle point of view, but it doesn't matter in this code."

leventov · 2019-03-21T20:52:07Z

    RetryPolicy retryPolicy;
    if (config.getTerminateDruidProcessOnConnectFail()) {
-      // It's unknown whether or not this precaution is needed.  Tests revealed that this path was never taken.
+      /** It's unknown whether or not this precaution is needed.  Tests revealed that this path was never taken.


You cannot Javadoc inner variables in a method. Please make it an ordinary comment.

no time to review currently

michael-trelinski · 2019-03-28T10:42:42Z

@clintropolis can you please restart the build (or advise)? I don't believe these errors are my fault, they seem IntelliJ-related on the TeamCity service.

leventov · 2019-03-28T22:45:19Z

@michael-trelinski you can try to merge master into your branch, it may help.

…to zookeeper_loss

michael-trelinski and others added 3 commits October 24, 2018 16:44

Update init

5edc6da

Fix bin/init to source from proper directory.

Fix for Proposal apache#6518: Shutdown druid processes upon complete …

b9e8650

…loss of ZK connectivity

Merge branch 'master' of https://github.com/apache/incubator-druid in…

c8ce78c

…to zookeeper_loss # Conflicts: # extensions-core/parquet-extensions/src/main/resources/META-INF/services/org.apache.druid.initialization.DruidModule

fjy added this to the 0.14.0 milestone Dec 16, 2018

leventov added Area - ZooKeeper/Curator Design Review labels Dec 20, 2018

leventov requested changes Dec 20, 2018

View reviewed changes

Zookeeper Loss:

8b64673

- Add feature documentation - Cosmetic refactors - Variable extractions - Remove getter

leventov reviewed Dec 22, 2018

View reviewed changes

- Change config key name and reword documentation

39d538d

- Switch from Function<Void,Void> to Runnable/Lambda - try { … } finally { … }

leventov reviewed Jan 10, 2019

View reviewed changes

Fix line length too long

0df7274

clintropolis requested changes Feb 3, 2019

View reviewed changes

Mike Trelinski added 2 commits February 4, 2019 14:11

- change to formatted string for logging

1ece697

- use System.err.println after lifecycle stops

commenting on makeEnsembleProvider()-created Zookeeper termination

3fb949e

jon-wei removed this from the 0.14.0 milestone Feb 12, 2019

egor-ryashin reviewed Feb 15, 2019

View reviewed changes

egor-ryashin previously requested changes Feb 15, 2019

View reviewed changes

clintropolis approved these changes Mar 5, 2019

View reviewed changes

leventov reviewed Mar 5, 2019

View reviewed changes

Mike Trelinski added 2 commits March 19, 2019 16:01

Add javadoc

1e7774e

added java doc reference back to apache discussion thread.

2d16309

move comment to other class

1ecd489

leventov reviewed Mar 21, 2019

View reviewed changes

favor two-slash comments instead of multiline comments

4459257

leventov approved these changes Mar 21, 2019

View reviewed changes

Merge branch 'master' of https://github.com/apache/incubator-druid in…

d9187e0

…to zookeeper_loss

clintropolis merged commit 347779b into apache:master Mar 29, 2019

yurmix mentioned this pull request Apr 4, 2019

[Proposal] Shutdown druid processes upon complete loss of ZK connectivity #6518

Closed

jihoonson added this to the 0.15.0 milestone May 16, 2019

jihoonson mentioned this pull request Jun 8, 2019

0.15.0-incubating release notes #7854

Closed


		private final Function<Void, Void> exitFunction;

		public BoundedExponentialBackoffRetryWithQuit(Function<Void, Void> exitFunction, int baseSleepTimeMs, int maxSleepTimeMs, int maxRetries)


		import java.util.function.Function;

		public class BoundedExponentialBackoffRetryWithQuit extends BoundedExponentialBackoffRetry


		private static final Logger log = new Logger(BoundedExponentialBackoffRetryWithQuitTest.class);

		/*

Conversation

michael-trelinski commented Dec 14, 2018

Uh oh!

sascha-coenen commented Dec 15, 2018

Uh oh!

michael-trelinski commented Dec 15, 2018 via email

Uh oh!

kaijianding commented Dec 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov commented Dec 20, 2018

Uh oh!

gianm commented Dec 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

michael-trelinski Feb 5, 2019 •

edited

Loading

leventov Mar 5, 2019 •

edited

Loading