Skip to content

Set Azure to only update metadata on BOOT_NEW_INSTANCE (SC-386)#1006

Merged
TheRealFalcon merged 2 commits into
canonical:mainfrom
TheRealFalcon:fix-azure-boot-events
Sep 3, 2021
Merged

Set Azure to only update metadata on BOOT_NEW_INSTANCE (SC-386)#1006
TheRealFalcon merged 2 commits into
canonical:mainfrom
TheRealFalcon:fix-azure-boot-events

Conversation

@TheRealFalcon
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon commented Sep 2, 2021

Proposed Commit Message

Set Azure to only update metadata on BOOT_NEW_INSTANCE

In #834, we refactored the handling of events for fetching new metadata.
Previously, in Azure's __init__, the BOOT event was added to the
update_events, so it was assumed that Azure required the standard BOOT
behavior, which is to apply metadata twice every boot: once during
local-init, then again during standard init phase.
https://github.com/canonical/cloud-init/blob/21.2/cloudinit/sources/DataSourceAzure.py#L356

However, this line was effectively meaningless. After the metadata was
fetched in local-init, it was then pickled out to disk. Because
"update_events" was a class variable, the EventType.BOOT was not
persisted into the pickle. When the pickle was then unpickled in the
init phase, metadata did not get re-fetched because EventType.BOOT was
not present, so Azure is effectely only BOOT_NEW_INSTANCE.

Fetching metadata twice during boot causes some issue for
pre-provisioning on Azure because updating metadata during
re-provisioning will cause cloud-init to poll for reprovisiondata again
in DataSourceAzure, which will infinitely return 404(reprovisiondata
is deleted from IMDS after health signal was sent by cloud-init during
init-local). This makes cloud-init stuck in 'init'

Additional Context

Note that the default update events are defined in the base class, so Azure doesn't need a separate definition here.

Found by Azure during SRU testing.

Test Steps

Run the updated integration test.

Additionally, manually verified with:

DEB_BUILD_OPTIONS=nocheck packages/bddeb -d
scp cloud-init_all.deb <launched_azure_instance>:/tmp/cloud-init.deb
ssh ubuntu@<azure_instance>
sudo dpkg -i /tmp/cloud-init.deb
cloud-init clean --reboot --logs
ssh ubuntu@<azure_instance>

Examine /var/log/cloud-init.log and verify Applying network configuration appears exactly once, then later we see:

Event Denied: scopes=['network'] EventType=boot-legacy
No network config applied. Neither a new instance nor datasource network update allowed

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@anhvoms
Copy link
Copy Markdown
Contributor

anhvoms commented Sep 2, 2021

LGTM

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

@blackboxsw , if you approve, don't merge right away. Looking at an integration test issue.

@TheRealFalcon TheRealFalcon added the wip Work in progress, do not land label Sep 2, 2021
In canonical#834, we refactored the handling of events for fetching new metadata.
Previously, in Azure's __init__, the BOOT event was added to the
update_events, so it was assumed that Azure required the standard BOOT
behavior, which is to apply metadata twice every boot: once during
local-init, then again during standard init phase.
https://github.com/canonical/cloud-init/blob/21.2/cloudinit/sources/DataSourceAzure.py#L356

However, this line was effectively meaningless. After the metadata was
fetched in local-init, it was then pickled out to disk. Because
"update_events" was a class variable, the EventType.BOOT was not
persisted into the pickle. When the pickle was then unpickled in the
init phase, metadata did not get re-fetched because EventType.BOOT was
not present, so Azure is effectely only BOOT_NEW_INSTANCE.

Fetching metadata twice during boot causes some issue for
pre-provisioning on Azure because updating metadata during
re-provisioning will cause cloud-init to poll for reprovisiondata again
in DataSourceAzure, which will infinitely return 404(reprovisiondata
is deleted from IMDS after health signal was sent by cloud-init during
init-local). This makes cloud-init stuck in 'init'
@TheRealFalcon TheRealFalcon removed the wip Work in progress, do not land label Sep 2, 2021
@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

I removed the integration test update. It conflicts with what I've done in #1001 so any necessary changes will go there. I posted my manual test steps, and that should be sufficient here.

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good. Want some manual test time on this thought tomorrow. base class update events look good at just EventType.BOOT_NEW_INSTANCE.

@TheRealFalcon TheRealFalcon changed the title Set Azure to only update metadata on BOOT_NEW_INSTANCE Set Azure to only update metadata on BOOT_NEW_INSTANCE (SC-386) Sep 3, 2021
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual testing upgrade clean and existing systems looking good and re-applying network only when expected on 21.3-11-g843eb94b-1~bddeb

@TheRealFalcon TheRealFalcon merged commit e69a887 into canonical:main Sep 3, 2021
@TheRealFalcon TheRealFalcon deleted the fix-azure-boot-events branch September 3, 2021 17:57
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Sep 17, 2021
In canonical#1006, we set Azure to apply networking config every
BOOT_NEW_INSTANCE because the BOOT_LEGACY option was causing problems
applying networking the second time per boot. However,
BOOT_NEW_INSTANCE is also wrong as Azure needs to apply networking
once per boot, during init-local phase.
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Sep 17, 2021
In canonical#1006, we set Azure to apply networking config every
BOOT_NEW_INSTANCE because the BOOT_LEGACY option was causing problems
applying networking the second time per boot. However,
BOOT_NEW_INSTANCE is also wrong as Azure needs to apply networking
once per boot, during init-local phase.
TheRealFalcon added a commit that referenced this pull request Sep 17, 2021
In #1006, we set Azure to apply networking config every
BOOT_NEW_INSTANCE because the BOOT_LEGACY option was causing problems
applying networking the second time per boot. However,
BOOT_NEW_INSTANCE is also wrong as Azure needs to apply networking
once per boot, during init-local phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants