Skip to content

Conversation

@dashton90
Copy link
Contributor

Adds two new operators to Amazon provider package: EC2HibernateInstanceOperator, and EC2RebootInstanceOperator

closes: #35703

@dashton90 dashton90 changed the title Add ec2 reboot and hibernate operators Add EC2HibernateInstanceOperator and EC2RebootInstanceOperator Nov 22, 2023
@dashton90
Copy link
Contributor Author

Apologies, I pushed a very bad rebase to the previous PR which requested review from a dozen contributors. Reopening a separate PR to keep things organized

@vincbeck
Copy link
Contributor

Apologies, I pushed a very bad rebase to the previous PR which requested review from a dozen contributors. Reopening a separate PR to keep things organized

Not a problem. Tagging @o-nikolas here since he reviewed your previous PR as well.

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, only some nits

dashton90 and others added 8 commits November 22, 2023 12:22
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!

Just a few more things to fix

Co-authored-by: Niko Oliveira <onikolas@amazon.com>
dashton90 and others added 4 commits November 22, 2023 20:06
@vincbeck vincbeck merged commit ca1202f into apache:main Nov 23, 2023
@ferruzzi
Copy link
Contributor

ferruzzi commented Nov 23, 2023

This merge broke the system tests; t4g.micro instances can not be hibernated. Did you test this out? If so, I don't suppose you remember which instance size worked for you?

botocore.exceptions.ClientError: An error occurred (UnsupportedHibernationConfiguration) when calling the RunInstances operation: The instance type does not support hibernation: t4g.micro. For information about supported instance types, see the Amazon EC2 Hibernation documentation.

[Edit] That may have come out more harsh than I intended. Thanks for the contribution, it's a great addition, we just need to sort out the instance size.

@ferruzzi
Copy link
Contributor

Looking through the code, it looks like you are checking for a hibernation option, but the exception you are raising raise AirflowException(f"Instance {instance['InstanceId']} is not configured for hibernation") is not making it into the system test's logs, so we may need to have another look at the unit tests as well.

@dashton90
Copy link
Contributor Author

I tested on an m4.large instance

And didn't come across as harsh at all. Please let me know how I can assist with the system tests.

@dashton90
Copy link
Contributor Author

As a separate note, are these tests available publicly? Would be happy to follow any additional steps before opening a PR in the future

@ferruzzi
Copy link
Contributor

What do you think about the unit tests? Do they need to be tweaked?

I tested on an m4.large instance

Cool. I'll see if I can find documentation around which sizes DO work, but at least we have a place to start now.

are these tests available publicly?

We (the AWS Airflow team) run most of the example_{foo} tests in the system test folder on a regular basis and report the results here. The logs aren't publicly available (yet??), unfortunately, but you can see a pass/fail for the last 10 runs there.

@dashton90
Copy link
Contributor Author

Cool. I'll see if I can find documentation around which sizes DO work, but at least we have a place to start now.

The supported instances are listed here

What do you think about the unit tests? Do they need to be tweaked?

If the logs are not making it then the tests should definitely be updated. I did add two unhappy path unit tests for trying to hibernate instances not configured for hibernation 1 2.

We (the AWS Airflow team) run most of the example_{foo} tests in the system test folder on a regular basis and report the results here. The logs aren't publicly available (yet??), unfortunately, but you can see a pass/fail for the last 10 runs there.

Amazing, I'll keep an eye on this page next time I contribute :)

@ferruzzi
Copy link
Contributor

The supported instances are listed here

You took the words out of my mouth fingers.

$ aws ec2 describe-instance-types --filters Name=hibernation-supported,Values=true --query "InstanceTypes[*].[InstanceType]" --output text | sort | grep "micro"
t2.micro
t3a.micro
t3.micro

EC2 instance types aren't my forte, but it looks like swapping the test over to t3.micro instance type should get it passing, just running on an x86 CPU vs an ARM CPU (I think). I'll run the test on the t3.micro and see if it passes.

@dashton90
Copy link
Contributor Author

Just re-reading through your comments:

Looking through the code, it looks like you are checking for a hibernation option, but the exception you are raising raise AirflowException(f"Instance {instance['InstanceId']} is not configured for hibernation") is not making it into the system test's logs, so we may need to have another look at the unit tests as well.

This check is only done when you run the EC2HibernateInstanceOperator. The example_ec2 DAG would never get to this operator since it is failing during instance creation.

If you'd like, I can update the example to use an m4 or t3 instance.

@ferruzzi
Copy link
Contributor

If you'd like, I can update the example to use an m4 or t3 instance.

I'm running it against my live AWS account. It is going need a few other changes, too. The AMI architecture needs to be adjusted here if we change from a t4g. When I do that it fails with "botocore.exceptions.ClientError: An error occurred (UnsupportedHibernationConfiguration) when calling the RunInstances operation: For hibernation, the root device volume must be encrypted." We're on the right track, though. Do you want to poke at it and get it sorted out, or do you want me to?

@ferruzzi
Copy link
Contributor

The example_ec2 DAG would never get to this operator since it is failing during instance creation.

Yup, good catch. My mistake.

@dashton90
Copy link
Contributor Author

Do you want to poke at it and get it sorted out, or do you want me to?

If you're already on it then I'll leave it to you. But happy to take over if you need another set of eyes

@ferruzzi
Copy link
Contributor

It looks like in order to get this working we're going to need to set up a KMS key and manually define the device mappings to configure an EBS volume. I won't be able to get this done until Monday. Should you feel like taking care of it before then, have at it and ping me in the PR.

This isn't directly applicable since it isn't using the boto3 API, but it should have the info we need, just in a different format.

@dashton90
Copy link
Contributor Author

Created #35839. I've tested the changes independently, but can I leave it to you to do the end-to-end tests?

@ferruzzi
Copy link
Contributor

Amazing, thanks! I'll have a look later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add EC2RebootInstanceOperator and EC2HibernateInstanceOperator to Amazon Provider

4 participants