feat(aws): AMI architecture detection and cross-validation#665
feat(aws): AMI architecture detection and cross-validation#665ArangoGutierrez wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Add describeImageArch helper to query an AMI's architecture from EC2 and getInstanceTypeArch to query instance type supported architectures. Update resolveImageForNode to return architecture for all code paths including explicit ImageId. Cross-validate AMI architecture against instance type supported architectures in DryRun() to catch mismatches (e.g., arm64 AMI + x86_64 instance type) with a clear error message before instance creation fails. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
There was a problem hiding this comment.
Pull request overview
This PR adds AMI architecture detection and cross-validation to prevent architecture mismatches between AMIs and instance types. Previously, when an explicit AMI ID was provided, architecture detection was skipped, leading to cryptic boot failures when arm64 AMIs were used with x86_64 instance types (or vice versa). The PR implements three new helper functions to query architecture information from EC2 APIs and integrates a validation check in the DryRun() method to catch these mismatches early with clear error messages.
Changes:
- Added
describeImageArch()to query AMI architecture from EC2 DescribeImages API - Added
getInstanceTypeArch()to query supported architectures from EC2 DescribeInstanceTypes API - Added
normalizeArchToEC2()to convert architecture aliases (amd64/aarch64) to EC2 canonical forms (x86_64/arm64) - Enhanced
resolveImageForNode()to detect and return architecture for explicit ImageIds - Added architecture cross-validation in
DryRun()to catch AMI/instance type mismatches - Updated all mock EC2 responses to include ProcessorInfo with architecture data
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/provider/aws/image.go | Core implementation: added three new helper functions for architecture detection/normalization, updated resolveImageForNode to query architecture for explicit AMIs, updated ResolvedImage struct to include architecture, added architecture normalization in resolveOSToAMI and setLegacyAMI |
| pkg/provider/aws/dryrun.go | Added architecture compatibility validation that cross-checks AMI architecture against instance type supported architectures and returns a clear error on mismatch |
| pkg/provider/aws/image_test.go | Comprehensive unit tests for new functions including describeImageArch, getInstanceTypeArch, and DryRun scenarios for both architecture match and mismatch cases, plus assertion that architecture is always populated |
| pkg/provider/aws/mock_ec2_test.go | Updated default mock responses for DescribeInstanceTypes to include ProcessorInfo with supported architectures |
| pkg/provider/aws/aws_test.go | Updated mock responses in existing tests to include ProcessorInfo architecture data |
| pkg/provider/aws/aws_ginkgo_test.go | Updated mock responses in Ginkgo tests to include ProcessorInfo architecture data |
Pull Request Test Coverage Report for Build 22021789809Details
💛 - Coveralls |
|
Closing as superseded. The equivalent fixes were already merged into main via PRs #661-664:
Additionally, these fixes address downstream provisioning issues but do not resolve the actual EC2 |
Summary
describeImageArchhelper to query AMI architecture from EC2resolveImageForNodeto return architecture for explicit ImageId (was skipping all arch detection)getInstanceTypeArchto query instance type supported architecturesDryRun()Catches arm64 AMI + x86_64 instance type mismatches with a clear error message instead of cryptic boot failures.
Test plan
describeImageArchwith mock EC2 clientgetInstanceTypeArchwith mock EC2 clientDryRun()returning error on arch mismatchgo test ./pkg/provider/aws/... -vpasses