Skip to content

Conversation

@markhopson
Copy link
Contributor

@markhopson markhopson commented Feb 16, 2021

This PR allows users to enable retry-behaviour for ECSOperator._start_task.

This feature was discussed briefly: here #13725

The design was lifted from hooks/base_google.py. There's 2 main differences with my AWS implementation ...

  1. the user can configure the retry with a param quota_retry (Dict) that is passed in to the ECSOperator

  2. the user can configure the retry without tenacity (e.g. passing in max=300 and multiplier=1 instead of tenacity.wait_exponential(multiplier=1, max=300)

@boring-cyborg boring-cyborg bot added area:webserver Webserver related Issues provider:amazon AWS/Amazon - related issues labels Feb 16, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 16, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@turbaszek turbaszek removed the request for review from ryanahamilton February 16, 2021 22:03
@markhopson markhopson requested a review from turbaszek February 17, 2021 14:52
@markhopson
Copy link
Contributor Author

@turbaszek Thanks for the feedback. I also saw some checks fail (i.e. CI Build / Postgres9.6,Py3.6, CI Build / MySQL5.7, Py3.6, CI Build / Sqlite Py3.6) but I'm not sure what to do to fix them. Can you please advise me? Thanks!

@markhopson
Copy link
Contributor Author

@turbaszek just a friendly bump to see what the next steps are here

from airflow.providers.amazon.aws.models.exceptions import ECSOperatorError
from airflow.utils.log.logging_mixin import LoggingMixin

ECS_QUOTA_ERROR_REASONS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are ECS specific they shouldn't really be in the base_aws module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do here. It feels weird to move this to ecs.py if I have code in base_aws.py that catches ECSOperatorError. For now, I've moved this variable into the is_permissible_error() scope to clarify intent. Any suggestions are welcomed.

"""

def decorator_f(self):
quota_retry = getattr(self, 'quota_retry', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this parameter name -- it seems very specific to EcsOperator only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Hm I've changed it to retry_args to be more generic, but I'm open to any suggestions.

@markhopson markhopson requested a review from ashb February 28, 2021 04:01
@markhopson
Copy link
Contributor Author

@ashb Friendly bump here

@ashb
Copy link
Member

ashb commented Mar 6, 2021

Thanks, I'll take a look on Monday

return self.get_client_type("iam").get_role(RoleName=role)["Role"]["Arn"]

@staticmethod
def retry(fun: Callable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change this to

    def retry(should_retry: Callable[[Exception], bool], fun: Callable)

Then the retry_if_permissible_error can move out of BaseAWS in to ECSHook, used like this:

def should_retry(exception: Exception):
    """Check if exception is related to ECS resource quota (CPU, MEM)."""
    return isinstance(exception, ECSOperatorError) and any(
        quota_reason in failure['reason']
        for quota_reason in ['RESOURCE:MEMORY', 'RESOURCE:CPU']
        for failure in exception.failures
    )

...

    @AwsBaseHook.retry(should_retry)
    def _start_task(self):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I thought of this. I've made this change.

limit.
"""

def decorator_f(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def decorator_f(self):
@functools.wraps
def decorator_f(self, *args, **kwargs):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ashb
Copy link
Member

ashb commented Mar 8, 2021

@markhopson See the review I just left -- I think that gives us a way to have the base retry functionality in the base hook, without having to put any service specific login in there. WDYT?

@markhopson
Copy link
Contributor Author

@markhopson See the review I just left -- I think that gives us a way to have the base retry functionality in the base hook, without having to put any service specific login in there. WDYT?

@ashb Good tip. I like it. I've made the change.

@markhopson markhopson requested a review from ashb March 10, 2021 02:56
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@markhopson
Copy link
Contributor Author

@ashb Friendly bump.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good @markhopson

Could you fix up the static checks please?

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Mar 19, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@ashb
Copy link
Member

ashb commented Mar 19, 2021

TestECSOperator.test_execute_with_failures is now failing becuase the type of the exception it throws has changed -- I think that's fine, and the tests just need updating.

I haven't look in detail at the code/test though.

@ashb ashb merged commit 614be87 into apache:master Mar 26, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 26, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:webserver Webserver related Issues okay to merge It's ok to merge this PR as it does not require more tests provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants