Skip to content

Allow disabling of network activation (SC-307)#1048

Merged
blackboxsw merged 7 commits into
canonical:mainfrom
TheRealFalcon:unbringup-interfaces
Oct 7, 2021
Merged

Allow disabling of network activation (SC-307)#1048
blackboxsw merged 7 commits into
canonical:mainfrom
TheRealFalcon:unbringup-interfaces

Conversation

@TheRealFalcon
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon commented Oct 5, 2021

Proposed Commit Message

Allow disabling of network activation

In #919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.

LP: #1938299

Additional Context

Test Steps

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@TheRealFalcon TheRealFalcon force-pushed the unbringup-interfaces branch 3 times, most recently from 2e1d348 to 24023c3 Compare October 5, 2021 23:31
@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

This covers the functionality and an integration test for the failure case. Still missing unit tests and documentation for this.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

I added a unit test.

@blackboxsw , curious if you have strong opinions about docs. This feels like something that should be documented, but we don't really have documentation for /etc/cloud/cloud.cfg. I could add a section for it, but this shouldn't be the only thing about it documented either.

@TheRealFalcon TheRealFalcon changed the title Allow disabling of network activation Allow disabling of network activation (SC-307) Oct 6, 2021
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a request on the integration test for a bit more verbose error reporting. I think maybe logs could be added to doc/rtd/topics/network-config.rst.

self.assertIn(log, self.stderr.getvalue())


class TestShouldBringUpInterfaces:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test coverage here. It's functional for the additional function created. Since main isn't covered well with unit tests, this makes things tough to instrument unit tests for this interaction so I think its fine we rely on the integration test here.

network_activator = activators.select_activator()
network_activator.bring_up_all_interfaces(network_state)
else:
LOG.debug("Not bringing up newly configured network interfaces")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good log tracers here for quick triage in failure cases if needed.

"""Test that the network is not activated during init mode."""
_setup_custom_image(session_cloud)
with session_cloud.launch() as client:
assert client.execute('systemctl is-active google-guest-agent').ok
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event of failure the assert isn't going to be super helpful in triage. Can we do something like this?

Suggested change
assert client.execute('systemctl is-active google-guest-agent').ok
result = client.execute('systemctl is-active google-guest-agent')
if not result.ok:
raise AssertionError("google-guest-agent is not active:\n%s" % result.stdout)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealFalcon your commit message too should reference LP: #1938299

_setup_custom_image(session_cloud)
with session_cloud.launch() as client:
assert client.execute('systemctl is-active google-guest-agent').ok
result = client.execute('systemctl is-active google-guest-agent')
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be systemctl status google-guest-agent.service ? it retuns non-zero (3) in failed/dead state

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a quick note: whilst status works, I wonder if leveraging is-active a better way to go about it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsh2102 , I think the idea was to get the additional information to stdout from the status call if it fails, vs just something like "inactive" when checking is-active. Why do you think is-active would be better?

In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.

LP: #1938299
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent thank you for this @TheRealFalcon Validated integration test runtomorrow morn.
Integration test will cover our basic case for asserting google-guest-agent is running with the features and disable flag. Also confirmed failure path without. I think we might want a followup to ensure we setup default project-wide users for our project so we can confirm that they are also properly created. This is a bit broader in scope than we need to handle at the moment.

@blackboxsw blackboxsw merged commit 9c147e8 into canonical:main Oct 7, 2021
EricEdens added a commit to GoogleCloudPlatform/compute-image-tools that referenced this pull request Nov 19, 2021
…age (#1796)

This updates Ubuntu import to use the workaround from Canonical in canonical/cloud-init#1048.
@TheRealFalcon TheRealFalcon deleted the unbringup-interfaces branch December 7, 2022 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants