Consistently strip top-level network key (SC-963) by TheRealFalcon · Pull Request #1417 · canonical/cloud-init

TheRealFalcon · 2022-04-29T02:22:53Z

Proposed Commit Message

Consistently strip top-level network key

Our NetworkState parsing code expects no top level network key even
though it's part of our v1 and v2 standard. NoCloud and LXD datasources
had their own code to strip off the top-level network key, but it failed
for v2 configs. This commit simplifies and moves that code into the
parser so that it is consistent across all datasources.

LP: #1906187

Additional Context

See https://bugs.launchpad.net/cloud-init/+bug/1906187

blackboxsw · 2022-04-29T20:43:43Z

    """
    state = None
+    if "network" in net_config:
+        net_config = net_config["network"]


If we get a top-level network key we are off the typical expected path for providing network configuration. We probably should do some element of validation for the expected keys below "network". To avoid potentially changing behavior, I think we still need to minimally validate expected values below a top-level "network" key as our _maybe_remove_top_network function attempted.

Since we know valid network config can be either "network": {"config": "disabled"} or "network": {"version": [12], ...} let's at least validate that one of these structures is present, otherwise we use the unaltered net_config. I'd like to see some breadcrumb log when we net_config = net_config.pop("network"). Then we'd have something during triage that better informs us of the raw network-config that came in on a given platform.

Maybe this request of mine is just slightly adjusting the error message in parse_net_config_data to do a little better job of describing the failures on an invalid network-config I'd want to know if I have invalid config due to two failure cases:

missing a "version:" key

or that "config: <something_other_than_the_disabled_string>" was provided.

The 2nd case I could see happening if someone thinks network config needs to live under network: {config: {<my_v1_or_v2_config>} because of the generic name of the "config" key potentially confusing image creators.

Long-term, I think I'd like to also see network-config JSON schema in some form. But, you know, "when a developer has a hammer, everything looks like a nail".

If we get a top-level network key we are off the typical expected path for providing network configuration.

Not according to our networking config formats 😬 . I'm not sure why we ever considered it normal not have the top-level network key, but I consider the "normal" path of not having it to be the exception we're working around.

I noticed that we're not quite handling this early enough, so I pushed a new version that handles this little earlier (I still need to update tests).

About your concerns:

missing a "version:" key

That indirectly happens here: https://github.com/canonical/cloud-init/blob/main/cloudinit/net/__init__.py#L660

or that "config: <something_other_than_the_disabled_string>" was provided.

But <something_other_than_the_disabled_string> is valid, isn't it? V1 config is:

network: version: 1 config:

so config is valid there. Or am I misunderstanding what you're asking for?

I added a log of the network config when we raise the exception. Do you think that is good enough?

I definitely agree that a schema for networking is a good idea. We should create one.

Good move on earlier in stages, right before the is_disabled check makes sense to make sure we pop off the network key before checking "config": "disabled "

missing a "version:" key: Thanks for pointing out we are already validating and raising a descriptive error here, I was unaware of that still happening and that satisfies my concern

When I made the comment I was thinking specifically if a string value other than "disabled" were provided. Agreed that our "schema" should permit 'config: { type: [disabled, array]....} for this. Nothing really to do here, I was just thinking about checking isinstance(net_config.get("config"), str) and net_config["config"] != "disabled": raise RuntimeError("bogus net config value dude")

Yes let's make a network-config schema. https://warthogs.atlassian.net/browse/SC-969

blackboxsw · 2022-04-29T22:03:37Z

            "No valid network_state object created from network config. "
-            "Did you specify the correct version?"
+            "Did you specify the correct version? Network config:\n"
+            f"{net_config}"


blackboxsw · 2022-04-29T22:16:53Z

+            ncfg = self._remove_top_level_network_key(
+                available_cfgs[cfg_source]
+            )
            if net.is_disabled_cfg(ncfg):


Just in time for the "config: disabled" check. +1.

blackboxsw · 2022-04-29T22:58:37Z

    """
    state = None
+    if "network" in net_config:
+        net_config = net_config["network"]


Good move on earlier in stages, right before the is_disabled check makes sense to make sure we pop off the network key before checking "config": "disabled "

missing a "version:" key: Thanks for pointing out we are already validating and raising a descriptive error here, I was unaware of that still happening and that satisfies my concern

When I made the comment I was thinking specifically if a string value other than "disabled" were provided. Agreed that our "schema" should permit 'config: { type: [disabled, array]....} for this. Nothing really to do here, I was just thinking about checking isinstance(net_config.get("config"), str) and net_config["config"] != "disabled": raise RuntimeError("bogus net config value dude")

Yes let's make a network-config schema. https://warthogs.atlassian.net/browse/SC-969

Our NetworkState parsing code expects no top level network key even though it's part of our v1 and v2 standard. NoCloud and LXD datasources had their own code to strip off the top-level network key, but it failed for v2 configs. This commit simplifies and moves that code into the parser so that it is consistent across all datasources. LP: #1906187

* Stop using the CiTestCase base class * Use a common autouse fixture as a replacement of the setup method * Replace unittest assert calls to raw assert calls * Replace custom tmpdir implementation with the pytest fixture * parametrize the tests that test _find_networking_config to include a network config with a top-level network key and one without

blackboxsw

Thanks for this comprehensive fix and dropping some cut-n-paste for NoCloud/LXD.
Verified expected failure prior to this PR and fix afterward:

cat > net-with-top-level.yaml <<EOF
network:
  version: 2
  ethernets:
        eth0:
            dhcp4: true
EOF
cat > net-no-top-level.yaml <<EOF
version: 2
ethernets:
        eth0:
            dhcp4: true
EOF

lxc launch ubuntu-daily:jammy dev-j -c cloud-init.network-config="$(cat net-with-top-level.yaml)"
lxc exec dev-j cloud-init status --long
# See error network doesn't have version key"
- upgrade to this PR
lxc exec dev-j cloud-init clean reboot --logs
# no traceback/error

Our NetworkState parsing code expects no top level network key even though it's part of our v1 and v2 standard. NoCloud and LXD datasources had their own code to strip off the top-level network key, but it failed for v2 configs. This commit simplifies and moves that code into the parser so that it is consistent across all datasources. LP: #1906187

TheRealFalcon changed the title ~~Consistently strip top-level network key~~ Consistently strip top-level network key (SC-963) Apr 29, 2022

blackboxsw reviewed Apr 29, 2022

View reviewed changes

blackboxsw force-pushed the strip-network branch from 217e388 to b5ef430 Compare April 29, 2022 22:13

blackboxsw reviewed Apr 29, 2022

View reviewed changes

TheRealFalcon force-pushed the strip-network branch 2 times, most recently from 645f7a6 to 6c6300f Compare May 3, 2022 13:55

TheRealFalcon requested a review from blackboxsw May 3, 2022 13:58

TheRealFalcon added 3 commits May 4, 2022 09:44

round 2

1b45c09

TheRealFalcon force-pushed the strip-network branch from 6c6300f to b01edd7 Compare May 4, 2022 14:52

tmp_path not available in 3.6

1f56948

blackboxsw approved these changes May 5, 2022

View reviewed changes

blackboxsw merged commit d335866 into canonical:main May 5, 2022

This was referenced May 12, 2023

Top-level 'network' key results in error using v2 config #3815

Closed

Release 22.2 #3976

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistently strip top-level network key (SC-963)#1417

Consistently strip top-level network key (SC-963)#1417
blackboxsw merged 4 commits into
canonical:mainfrom
TheRealFalcon:strip-network

TheRealFalcon commented Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

TheRealFalcon Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw Apr 29, 2022

Uh oh!

blackboxsw left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TheRealFalcon commented Apr 29, 2022

Proposed Commit Message

Additional Context

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blackboxsw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants