EC2 CreateInstance: terminate instances in on_kill #36828
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the termination of instances in creation when a task instance is stopped (externally marked for retry) before the instance IDs were returned. It uses the
on_killmethod to shut down EC2 instances.Use case/problem:
The
EC2CreateInstanceOperatoris often used in a setup task of https://airflow.apache.org/docs/apache-airflow/stable/howto/setup-and-teardown.html . If the task inside the DAG is cancelled, a (potentially) ongoingEC2CreateInstanceOperatortask is killed. This is likely if the AWS initialization and/or the post-processing step take considerable time (large instance storage, long cloud-init processes) and thewait_for_completionisTrue.Without the
on_killcleanup code the partially initialized instances (i.e. the instance ids were not sent to XCom) will not be terminated by the tear-down task.This happened several times to me and I finally found the cause and fixed it.
Alternative: Split the setup into many small tasks:
EC2CreateInstanceas a setup (wowait_for_completion), wait on instance state, wait on cloud-init setup to finish, ...EC2TerminateInstanceas teardown.Questions:
_on_kill_instance_ids- what's coming after execute in the TI lifecycle?