-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[AIRFLOW-6245] Add custom waiters for AWS batch jobs #6811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4da6ba0 to
05a6960
Compare
afc1be7 to
c59d351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is too much changes. Could you please split the Pr into two separate:
- move AWS from contrib to providers
- add custom waiters
In this way it will be easier to review and chance that we will miss something will be lower ;)
|
@nuclearpinguin - this is already a stacked PR, please read the PR description (esp. WIP status) and linked PRs. If your interested, please take a look at #6764 |
ec32c66 to
dd7f211
Compare
|
#6764 is merged - rebased this PR on latest master today
|
92b78f0 to
c0bc140
Compare
c0bc140 to
9dde67f
Compare
6fc9fe5 to
64c0232
Compare
78ca43c to
2370419
Compare
|
Rebased and fixed test imports after the merge of #6919 - only changes to imports, no substantive code changes required. |
2370419 to
91bd6a2
Compare
91bd6a2 to
1a49bfa
Compare
1a49bfa to
4153ccd
Compare
4153ccd to
ba14be3
Compare
ba14be3 to
f0e849c
Compare
|
Appreciate the reviews - thanks @feluelle. I've rebased on master (went OK) and pushed up some requested changes (some open comments remain). |
f0e849c to
b6af4cd
Compare
|
Rebased on master and pushed again because some test failures were not related to this PR and might be fixed in the latest master. |
|
I restarted the failed check since the error was a known one: travis machine ran out of memory -.- |
b6af4cd to
fd2ed5a
Compare
|
Rebased and pushed again, travis failed one test with a timeout or something, not something specific to this PR. |
- add AwsBatchWaiters
- the waiters are based on botocore, but not yet
available for AWS batch services
- refactor AwsBatchOperator:
- use an optional waiters object to wait for
batch job status indicators
- split execute into submit_job and monitor_job
- use job_id with an optional init-parameter;
discard jobId and jobName (already has job_name)
- inherit from AwsBatchClient
- add notes to UPDATING.md
- extract class for AwsBatchClient
- move responsibility for batch API calls and
response parsing to this client
- move responsibility for default wait and
polling to this client
- rename BatchProtocol to AwsBatchProtocol [AIP-21]
- test backward compatibility
- add PROTOCOLS to tests/test_core_to_contrib.py
- add notes to UPDATING.md
- split up polling for job status into steps:
- poll for a JobExists
- poll for a JobRunning
- poll for a JobComplete
- use random jitter for wait-polling delays for
high concurrency job polling
- modify the exponential backoff delay for the
existing polling functions
- revise and update unit tests
fd2ed5a to
302d536
Compare
|
Did another self-review and most things look OK and test OK. There is a minor revision to how the client polls the AWS Batch job description, so that it will fail-fast when it encounters most client errors (except one for too-many-requests, then it will retry). I've briefly considered creating an extract-class refactor to pull out an |
feluelle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work @dazza-codes ! 👍
- add AwsBatchWaiters
- the waiters are based on botocore, but not yet
available for AWS batch services
- refactor AwsBatchOperator:
- use an optional waiters object to wait for
batch job status indicators
- split execute into submit_job and monitor_job
- use job_id with an optional init-parameter;
discard jobId and jobName (already has job_name)
- inherit from AwsBatchClient
- add notes to UPDATING.md
- extract class for AwsBatchClient
- move responsibility for batch API calls and
response parsing to this client
- move responsibility for default wait and
polling to this client
- rename BatchProtocol to AwsBatchProtocol [AIP-21]
- test backward compatibility
- add PROTOCOLS to tests/test_core_to_contrib.py
- add notes to UPDATING.md
- split up polling for job status into steps:
- poll for a JobExists
- poll for a JobRunning
- poll for a JobComplete
- use random jitter for wait-polling delays for
high concurrency job polling
- modify the exponential backoff delay for the
existing polling functions
- revise and update unit tests
|
In https://issues.apache.org/jira/browse/AIRFLOW-6245 - this work is attached to a 2.0.0 release. How can we request that this is included in the next 1.x release? |
|
@dazza-codes for 1.10.X you should use https://pypi.org/project/apache-airflow-backport-providers-amazon/ |
Design / PR notes and RFC
Jira
Related issues
Custom callable for waiter boto/botocore#1915
While developing this PR, thoughts about custom options for async waiters lingered in the background. This PR may not be the place to introduce that (it might follow on from this PR), but it's worth noting here that work on a base async operator is in [AIRFLOW-5567] BaseReschedulePokeOperator #6210
Description
This PR improves the batch job status polling by using exponential backoff with jitter, see
This PR creates a utility for generating and using custom waiters for the AWS batch service; see also:
The utility for creating custom waiters can accept a waiter config to generate a custom waiter model. This should allow users to specify any waiter model required for their use cases, which should satisfy https://issues.apache.org/jira/browse/AIRFLOW-6245 when the custom waiter can be plugged into the AWS batch operator.
Tests
Commits
Documentation