Skip to content

Conversation

@dazza-codes
Copy link
Contributor

@dazza-codes dazza-codes commented Dec 13, 2019

Design / PR notes and RFC

  • open to commentary on the general design and approach applied in this PR
    • the intention is to preserve existing functionality, as is, with minor tweaks
  • AFAICT, the public API is backwards compatible, unless noted in UPDATING.md
    • most changes are refactoring private methods
    • this adds new features to the public API
  • for RFC, please ignore most details of the implementation, just add commentary to the conversation; if a slack discussion would help, ping me to setup a time (Pacific standard time)
  • if it mostly looks OK, go ahead and give it a review
    • LMK if it would be easier to split this out into stacked PRs, but hopefully it's clear enough, although large
    • LMK if this requires additional integration testing (and how to do that easily)

Jira

Related issues

Description

  • Here are some details about my PR:

This PR improves the batch job status polling by using exponential backoff with jitter, see

This PR creates a utility for generating and using custom waiters for the AWS batch service; see also:

The utility for creating custom waiters can accept a waiter config to generate a custom waiter model. This should allow users to specify any waiter model required for their use cases, which should satisfy https://issues.apache.org/jira/browse/AIRFLOW-6245 when the custom waiter can be plugged into the AWS batch operator.

Tests

  • My PR adds unit tests for the generator of custom waiters for the AWS batch service

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@mik-laj mik-laj added the provider:amazon AWS/Amazon - related issues label Dec 13, 2019
@dazza-codes dazza-codes force-pushed the aws-batch-waiters branch 3 times, most recently from afc1be7 to c59d351 Compare December 14, 2019 03:10
Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is too much changes. Could you please split the Pr into two separate:

  • move AWS from contrib to providers
  • add custom waiters

In this way it will be easier to review and chance that we will miss something will be lower ;)

@dazza-codes
Copy link
Contributor Author

dazza-codes commented Dec 16, 2019

@nuclearpinguin - this is already a stacked PR, please read the PR description (esp. WIP status) and linked PRs. If your interested, please take a look at #6764

@dazza-codes dazza-codes changed the title [WIP][AIRFLOW-6245] Add custom waiters for AWS batch jobs [WIP, staked on 6764][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 16, 2019
@dazza-codes dazza-codes changed the title [WIP, staked on 6764][AIRFLOW-6245] Add custom waiters for AWS batch jobs [WIP, stacked on 6764][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 16, 2019
@dazza-codes dazza-codes force-pushed the aws-batch-waiters branch 2 times, most recently from ec32c66 to dd7f211 Compare December 17, 2019 16:52
@dazza-codes
Copy link
Contributor Author

dazza-codes commented Dec 17, 2019

#6764 is merged - rebased this PR on latest master today

  • still WIP to consider and test new options for custom waiters (updates to WIP status are in the PR title and description)
  • pushed up [1915] Use delay function in waiter boto/botocore#1921 and not sure exactly how this PR and that one can change the landscape of waiting/polling on batch jobs, but this PR will likely stand on its own for Airflow custom options; both PRs need more work to sort out CI-test issues

@dazza-codes dazza-codes changed the title [WIP, stacked on 6764][AIRFLOW-6245] Add custom waiters for AWS batch jobs [WIP][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 17, 2019
@dazza-codes dazza-codes force-pushed the aws-batch-waiters branch 3 times, most recently from 92b78f0 to c0bc140 Compare December 18, 2019 22:06
@dazza-codes dazza-codes changed the title [WIP][AIRFLOW-6245] Add custom waiters for AWS batch jobs [AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 18, 2019
@dazza-codes dazza-codes changed the title [AIRFLOW-6245] Add custom waiters for AWS batch jobs [WIP][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 19, 2019
@dazza-codes dazza-codes force-pushed the aws-batch-waiters branch 9 times, most recently from 6fc9fe5 to 64c0232 Compare December 21, 2019 02:55
@dazza-codes dazza-codes changed the title [WIP][AIRFLOW-6245] Add custom waiters for AWS batch jobs [WIP][RFC][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 21, 2019
@dazza-codes dazza-codes changed the title [WIP][RFC][AIRFLOW-6245] Add custom waiters for AWS batch jobs [RFC][AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 21, 2019
@dazza-codes
Copy link
Contributor Author

dazza-codes commented Dec 27, 2019

Rebased and fixed test imports after the merge of #6919 - only changes to imports, no substantive code changes required.

@dazza-codes dazza-codes requested a review from feluelle December 28, 2019 17:47
@dazza-codes dazza-codes changed the title [RFC][AIRFLOW-6245] Add custom waiters for AWS batch jobs [AIRFLOW-6245] Add custom waiters for AWS batch jobs Dec 28, 2019
@dazza-codes
Copy link
Contributor Author

dazza-codes commented Jan 6, 2020

Appreciate the reviews - thanks @feluelle. I've rebased on master (went OK) and pushed up some requested changes (some open comments remain).

@dazza-codes
Copy link
Contributor Author

Rebased on master and pushed again because some test failures were not related to this PR and might be fixed in the latest master.

@feluelle
Copy link
Member

feluelle commented Jan 7, 2020

I restarted the failed check since the error was a known one: travis machine ran out of memory -.-

@dazza-codes
Copy link
Contributor Author

Rebased and pushed again, travis failed one test with a timeout or something, not something specific to this PR.

- add AwsBatchWaiters
  - the waiters are based on botocore, but not yet
    available for AWS batch services
- refactor AwsBatchOperator:
  - use an optional waiters object to wait for
    batch job status indicators
  - split execute into submit_job and monitor_job
  - use job_id with an optional init-parameter;
    discard jobId and jobName (already has job_name)
  - inherit from AwsBatchClient
  - add notes to UPDATING.md
- extract class for AwsBatchClient
  - move responsibility for batch API calls and
    response parsing to this client
  - move responsibility for default wait and
    polling to this client
- rename BatchProtocol to AwsBatchProtocol [AIP-21]
  - test backward compatibility
  - add PROTOCOLS to tests/test_core_to_contrib.py
  - add notes to UPDATING.md
- split up polling for job status into steps:
  - poll for a JobExists
  - poll for a JobRunning
  - poll for a JobComplete
- use random jitter for wait-polling delays for
  high concurrency job polling
- modify the exponential backoff delay for the
  existing polling functions
- revise and update unit tests
@dazza-codes
Copy link
Contributor Author

Did another self-review and most things look OK and test OK. There is a minor revision to how the client polls the AWS Batch job description, so that it will fail-fast when it encounters most client errors (except one for too-many-requests, then it will retry). I've briefly considered creating an extract-class refactor to pull out an AwsBatchJob class that captures only the job monitoring responsibility (possibly a persistent state-machine for the job), but skipping it because this PR already trims down the operator enough and that work could follow this PR if it helps later on.

Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work @dazza-codes ! 👍

@feluelle feluelle merged commit 78d8fe6 into apache:master Jan 8, 2020
@dazza-codes dazza-codes deleted the aws-batch-waiters branch January 8, 2020 15:20
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
- add AwsBatchWaiters
  - the waiters are based on botocore, but not yet
    available for AWS batch services
- refactor AwsBatchOperator:
  - use an optional waiters object to wait for
    batch job status indicators
  - split execute into submit_job and monitor_job
  - use job_id with an optional init-parameter;
    discard jobId and jobName (already has job_name)
  - inherit from AwsBatchClient
  - add notes to UPDATING.md
- extract class for AwsBatchClient
  - move responsibility for batch API calls and
    response parsing to this client
  - move responsibility for default wait and
    polling to this client
- rename BatchProtocol to AwsBatchProtocol [AIP-21]
  - test backward compatibility
  - add PROTOCOLS to tests/test_core_to_contrib.py
  - add notes to UPDATING.md
- split up polling for job status into steps:
  - poll for a JobExists
  - poll for a JobRunning
  - poll for a JobComplete
- use random jitter for wait-polling delays for
  high concurrency job polling
- modify the exponential backoff delay for the
  existing polling functions
- revise and update unit tests
@dazza-codes
Copy link
Contributor Author

In https://issues.apache.org/jira/browse/AIRFLOW-6245 - this work is attached to a 2.0.0 release. How can we request that this is included in the next 1.x release?

@eladkal
Copy link
Contributor

eladkal commented Apr 15, 2021

@dazza-codes for 1.10.X you should use https://pypi.org/project/apache-airflow-backport-providers-amazon/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants