Skip to content

aks-preview: Support VMSS agent pool VM size resize via nodepool update#9732

Open
wenhug wants to merge 6 commits intoAzure:mainfrom
wenhug:wenhuang/agentpool-vmsize-resize
Open

aks-preview: Support VMSS agent pool VM size resize via nodepool update#9732
wenhug wants to merge 6 commits intoAzure:mainfrom
wenhug:wenhuang/agentpool-vmsize-resize

Conversation

@wenhug
Copy link
Copy Markdown
Contributor

@wenhug wenhug commented Mar 26, 2026

Summary

Enable changing the VM size (SKU) of an existing VMSS-based agent pool via az aks nodepool update --node-vm-size <new-size>.

When the user changes the VM size of a VMSS node pool, the AKS RP performs a rolling upgrade:

  1. Surge new nodes with the target VM size
  2. Cordon and drain old nodes
  3. Delete old nodes

This is a preview feature that requires:

  • AFEC registration: Microsoft.ContainerService/AgentPoolVMSSResize
  • RP internal toggle: enable-agentpool-vmsize-resize (currently enabled for E2E + Canary)

The --node-vm-size parameter already existed on nodepool update for VirtualMachines pool autoscaler updates. This PR extends it to also work for VMSS pools.

Usage

# Resize VM size for a VMSS node pool
az aks nodepool update \
  -g MyResourceGroup \
  -n nodepool1 \
  --cluster-name MyManagedCluster \
  --node-vm-size Standard_D4s_v3

RP-side validation

The RP validates the resize request and blocks incompatible combinations:

  • DiskControllerType (SCSI vs NVMe)
  • CPU Architecture (x64 vs ARM64)
  • Confidential Computing (SNP)
  • Hypervisor Generation (V1 vs V2)
  • Combined with K8s version upgrade or node count change

Changes

  • agentpool_decorator.py: Add update_vm_size() method for VMSS pools and integrate it into update_agentpool_profile_preview()
  • _params.py: Mark --node-vm-size as is_preview=True for nodepool update
  • _help.py: Update help text and add VMSS resize CLI example
  • test_agentpool_decorator.py: Add unit tests for update_vm_size

Test plan

  • Unit tests added for update_vm_size (both Standalone and ManagedCluster modes)
  • E2E test exists in AKS RP repo (Scenario_VMSS_VMSize_Resize)
  • Manual validation with preview AFEC registration

Copilot AI review requested due to automatic review settings March 26, 2026 21:14
@azure-client-tools-bot-prd
Copy link
Copy Markdown

azure-client-tools-bot-prd bot commented Mar 26, 2026

️✔️Azure CLI Extensions Breaking Change Test
️✔️Non Breaking Changes

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Hi @wenhug,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

@yonzhan
Copy link
Copy Markdown
Collaborator

yonzhan commented Mar 26, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link
Copy Markdown
Contributor

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

@github-actions
Copy link
Copy Markdown
Contributor

CodeGen Tools Feedback Collection

Thank you for using our CodeGen tool. We value your feedback, and we would like to know how we can improve our product. Please take a few minutes to fill our codegen survey

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Hi @wenhug

Release Suggestions

Module: aks-preview

  • Update VERSION to 19.0.0b30 in src/aks-preview/setup.py

Notes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends az aks nodepool update --node-vm-size to support resizing VMSS-based agent pools (preview), aligning CLI behavior with the RP’s rolling-replacement resize flow.

Changes:

  • Add a VMSS-aware update_vm_size() path to the nodepool update decorator and wire it into update_agentpool_profile_preview().
  • Mark --node-vm-size as a preview parameter for aks nodepool update and update CLI help text/examples accordingly.
  • Add unit tests validating update_vm_size() behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/aks-preview/azext_aks_preview/agentpool_decorator.py Adds update_vm_size() and invokes it during preview nodepool update assembly.
src/aks-preview/azext_aks_preview/_params.py Marks --node-vm-size as preview for aks nodepool update.
src/aks-preview/azext_aks_preview/_help.py Updates help text and adds a VMSS resize example.
src/aks-preview/azext_aks_preview/tests/latest/test_agentpool_decorator.py Adds unit tests for update_vm_size().

@wenhug
Copy link
Copy Markdown
Contributor Author

wenhug commented Mar 26, 2026

Addressed the two Copilot review comments:

Comment 1 (blocker in update_auto_scaler_properties): Good catch! Removed the InvalidArgumentValueError that rejected --node-vm-size for VMSS pools. That guard was added when --node-vm-size was VMs-pool-only, but now VMSS pools support VM size resize via rolling upgrade.

Comment 2 (VMs pool test): Added a third test case verifying update_vm_size() is a no-op for VirtualMachines pools (handles both Standalone and ManagedCluster decorator modes).

Also added a HISTORY.rst entry per the bot's request.

@FumingZhang
Copy link
Copy Markdown
Member

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Copy Markdown
Member

@FumingZhang FumingZhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@FumingZhang
Copy link
Copy Markdown
Member

Please resolve merge conflict and rebase/merge from main to pass the CI checks, @wenhug

Wen Huang and others added 3 commits April 7, 2026 21:19
Enable changing the VM size (SKU) of an existing VMSS-based agent pool
via `az aks nodepool update --node-vm-size <new-size>`. The RP performs
a rolling upgrade (surge new nodes, drain old, delete old) to replace
nodes with the new VM size.

This preview feature requires:
- AFEC registration: Microsoft.ContainerService/AgentPoolVMSSResize
- RP internal toggle: enable-agentpool-vmsize-resize

Changes:
- agentpool_decorator.py: add update_vm_size() for VMSS pools and call
  it in update_agentpool_profile_preview()
- _params.py: mark --node-vm-size as is_preview for nodepool update
- _help.py: update help text and add VMSS resize example
- test_agentpool_decorator.py: add unit tests for update_vm_size
Address review comments:

1. Remove the InvalidArgumentValueError in update_auto_scaler_properties()
   that blocked --node-vm-size for VMSS pools. This check was added when
   --node-vm-size only supported VirtualMachines pools, but now VMSS
   pools support VM size resize via rolling upgrade.

2. Add test case for VirtualMachines pool to verify update_vm_size() is
   a no-op (VMs pools handle VM size via the autoscaler update path).

3. Add HISTORY.rst entry for the new feature.
Add a live scenario test that creates a cluster with Standard_D2s_v3
and resizes the nodepool to Standard_D4s_v3 via `az aks nodepool update
--node-vm-size`, verifying the rolling upgrade completes successfully.

Addresses reviewer request for scenario test coverage with custom header
bypass for the AgentPoolVMSSResize feature flag.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
@wenhug wenhug force-pushed the wenhuang/agentpool-vmsize-resize branch from 2f7b0ad to e51f6fa Compare April 7, 2026 21:21
The test_update_agentpool_profile_preview tests mock all update methods
called by update_agentpool_profile_preview(). Adding update_vm_size()
to the call chain requires updating these mocks and assertions.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
The scenario test requires real Azure credentials and cannot run in
CI's recording/playback mode. Add @live_only() decorator to skip it
in CI while still allowing manual live test runs.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
@wenhug
Copy link
Copy Markdown
Contributor Author

wenhug commented Apr 7, 2026

@FumingZhang @yonzhan ,addressed conflicts

Copy link
Copy Markdown
Member

@FumingZhang FumingZhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@AKSCustomResourceGroupPreparer(
random_name_length=17, name_prefix="clitest", location="centraluseuap"
)
def test_aks_nodepool_update_vmss_vm_size_resize(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queued live test to validate the change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The live test failed with the following error, the custom feature header is not working?

      raise HttpResponseError(response=response, model=error, error_format=ARMErrorFormat)

E azure.core.exceptions.HttpResponseError: (PropertyChangeNotAllowed) Changing property 'properties.vmSize' is not allowed.
E Code: PropertyChangeNotAllowed
E Message: Changing property 'properties.vmSize' is not allowed.
E Target: properties.vmSize

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the test env, the RP code has not release to production yet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has it reached to staging or EUAP? Queued test against EUAP. Just want to make sure the change works as expected.

The RP now gates VMSize resize by preview API version instead of AFEC
feature registration, so the AKSHTTPCustomFeatures header is no longer
needed.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AKS Auto-Assign Auto assign by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants