Skip to content

Conversation

@pdu-mn1
Copy link
Contributor

@pdu-mn1 pdu-mn1 commented Aug 30, 2018

Add common retry functionality to table IO functions for data stores
that do not have native retry support. We use failsafe as the retry
library.

Add common retry functionality to table IO functions for data stores
that do not have native retry support. We use failsafe as the retry
library.
@pdu-mn1
Copy link
Contributor Author

pdu-mn1 commented Aug 30, 2018

@prateekm please help review when you have a chance.

Copy link
Member

@weisong44 weisong44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, overall LGTM. A few minor comments.


import org.apache.samza.SamzaException;

import net.jodah.failsafe.RetryPolicy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the retry policy is just a POJO that captures retry parameters. This class should be independent on the implementation (failsafe) we chose. And it should not take a dependency on failsafe.

* Currently, the policy object can be translated into {@link RetryPolicy} of failsafe library.
*/
public class TableRetryPolicy implements Serializable {
enum BackoffType { NONE, FIXED, RANDOM, EXPONENTIAL }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we describe the behavior and related parameters of each?

* Wrapper of retry-related metrics common to both {@link RetriableReadFunction} and
* {@link RetriableWriteFunction}.
*/
class RetryMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It it possible to get counts for failed request after retry?


private AsyncFailsafe<?> failsafe() {
long startMs = System.currentTimeMillis();
return Failsafe.with(retryPolicy).with(retryExecutor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part reusable or do we have to create a new instance for every request?

Copy link
Contributor Author

@pdu-mn1 pdu-mn1 Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the instance because the timestamp is unique per request and each failsafe object contains its own context as retry is per-request not shared across all requests.

TableRetryPolicy writeRetryPolicy = null;

if (readRetryPolicy != null || writeRetryPolicy != null) {
retryExecutor = Executors.newSingleThreadScheduledExecutor(runnable -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it's better to share this across all tasks and tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll make this a singleton

- extract failsafe logic into an adapter class from TableRetryPolicy
- added a unit test class for TableRetryPolicy + FailsafeAdapter
- use retry executor service as a singleton
- added a permanent-failure metric
This allows the application to have a say on which exception types can
be retried. Exception will be retried if either table function or the
custom predicate say so.
Copy link
Member

@weisong44 weisong44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

*/
public TableRetryPolicy withRetryOn(RetryPredicate isRetriable) {
Preconditions.checkNotNull(isRetriable);
this.isRetriable = isRetriable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we simply call it retryPredicate?

* @param isRetriable predicate for retriable exception identification
* @return this policy instance
*/
public TableRetryPolicy withRetryOn(RetryPredicate isRetriable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just call it withRetryPredicate?

Copy link
Contributor

@xinyuiscool xinyuiscool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have one minor comment.

A question for table provider: is it created per task? I want to make sure the readFn and writeFn will be created a new instance for each task. thx

@pdu-mn1
Copy link
Contributor Author

pdu-mn1 commented Sep 14, 2018

@xinyuiscool yes, the table and the associated read/write fns are created per task instance.

@xinyuiscool
Copy link
Contributor

Thanks for answering my question. Please address all @weisong44 's feedback and I will commit it once it's finalized.

@asfgit asfgit closed this in e4719b4 Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants