-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Update example_ec2 to Create Instance Which Supports Hibernation (#35790) #35839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ferruzzi tagging you since you flagged the bug. I did not create a CMK since AWS will use an Amazon managed key to encrypt the instance |
bd98196 to
c42c673
Compare
|
At a glance, it looks right. As mentioned, we'll need to adjust it to accept a KMS key to encrypt the EBS volume. Github makes it a bit of a pain to collaborate on a PR like this since the code still lives in your fork; it would be nice to be able to add a commit to your PR myself..... here's what needs to be done next: Replace Line 41 with the following block: then add a line between 97 and 98: Then you can use that in the config{}. I'm not positive on this part, but I /think/ you apply it by changing line 113 to I would need to test that last part out, but the first few steps are definitely right and will allow us to store the KMS key in SSM Parameter Store and fetch it for the test. Check out the RDS Export test for a working example of a KMS Key in a system test if I didn't make sense. Unfortunately, as you mentioned, once you do that it still needs one of us to actually test running it against the real service using the real KMS key. But if we merge this as it is, we'll need to make yet another PR to make those changes.... I'm not sure which is actually more convenient. |
|
I added you as a collaborator to my fork so you should be free to push any changes you need. But I can make the changes myself tonight or tomorrow if you don't get around to it. I skipped creating the KMS key because Amazon will default to using an Amazon managed key to encrypt the volume if you don't specify one. Figured it was less overhead than managing our own key, but happy to create one if that's the established practice. |
AH! I misunderstood, I thought you meant "Amazon" meaning my team specifically would have to add it. If the API call works without specifying one, then by all means let's skip that whole thing and keep the example/test as simple as we can. |
|
Alright. progress! I ran the proposed changes from this PR (without my KMS stuff) and it failed, but it's getting much further. Here's the current state: |
|
Huh, it looks like boto fails the terminate waiter if the instance is in the stopping state. No idea if that is a bug or the intended behaviour since instances which are stopping should be able to be terminated per the AWS docs Easiest fix is probably to add |
|
@ferruzzi
|
|
I am actually poking at it right now. When I ran it with the If it turns out that this is going to take as long as it appears it'll take, then I'll run it by the team and see what they think. I will propose (I'll have to get back to you on the decision) that we move it into a different system test which we flag not to run with the others. That way the code snippets can still be imported into the docs but it won't bog down the test. We have a couple of other system tests which are flagged as "do not run automatically" for a number of reasons and this seems like a viable use for that, IMHO. |
|
In response to my last comment, it looks like the consensus is to get this sorted out, then we (I) can split it into two tests and just run them in parallel. We're fine with the run time, but breaking it out might save a bit of time. I'm running into unrelated issues at the moment with Docker, but I'm poking at it as I can. |
|
Alright, I think I got my laptop's docker issue figured out and I ran the system test as you have it here, but with The other thing is purely cosmetic, but please move the |
|
Alright, I dialed it in. It passed with |
|
Done in 00e06ce |
|
I merged main into the branch and that got the CI tests to pass. 🚀 The only other thing I'd like to see is move the wait_for_completion and the max_retries out of the code snippets as I mentioned above so they don't clutter up the docs pages, but other than that I'm very happy with it. Can we get another pair of eyes? @o-nikolas @vincbeck @syedahsn |
|
+1 to moving |
| root_device_name = "/dev/xvda" | ||
|
|
||
| images = boto3.client("ec2").describe_images( | ||
| Filters=[ | ||
| {"Name": "description", "Values": [image_prefix]}, | ||
| {"Name": "architecture", "Values": ["arm64"]}, | ||
| {"Name": "architecture", "Values": ["x86_64"]}, | ||
| {"Name": "root-device-type", "Values": ["ebs"]}, | ||
| {"Name": "root-device-name", "Values": [root_device_name]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-graviton instances only support x86 architecture. The root device must be ebs backed in order to support hibernation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hibernation requires an x86 architecture and encrypted storage; encrypted storage requires EBS; EBS requires manually assigning mount points. It was a bit of a chain of requirements there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add comments? That'd be helpful I think :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's helpful context. I agree with @vincbeck, I'd love some comments here explaining that for future folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments in 4089321
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @dashton90
| reboot_instance = EC2RebootInstanceOperator( | ||
| task_id="reboot_instace", | ||
| instance_ids=instance_id, | ||
| wait_for_completion=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this and below out of the docs snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b8e6568
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
Moved everything out of the codeblock in b8e6568 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Your persistence is greatly appreciated on this one, it was a bit of a slog.
|
Congrats @dashton90 , thanks to your PR, the system test is now succeeding, see dashboard here |
A bug was introduced in the ec2_example DAG when adding the
EC2HibernateInstanceOperator. This PR updates the configuration of the example to create an instance which supports hibernation.Discussion begins here.
related: #35790