Consistently strip top-level network key (SC-963)#1417
Conversation
| """ | ||
| state = None | ||
| if "network" in net_config: | ||
| net_config = net_config["network"] |
There was a problem hiding this comment.
If we get a top-level network key we are off the typical expected path for providing network configuration. We probably should do some element of validation for the expected keys below "network". To avoid potentially changing behavior, I think we still need to minimally validate expected values below a top-level "network" key as our _maybe_remove_top_network function attempted.
Since we know valid network config can be either "network": {"config": "disabled"} or "network": {"version": [12], ...} let's at least validate that one of these structures is present, otherwise we use the unaltered net_config. I'd like to see some breadcrumb log when we net_config = net_config.pop("network"). Then we'd have something during triage that better informs us of the raw network-config that came in on a given platform.
There was a problem hiding this comment.
Maybe this request of mine is just slightly adjusting the error message in parse_net_config_data to do a little better job of describing the failures on an invalid network-config I'd want to know if I have invalid config due to two failure cases:
- missing a "version:" key
- or that "config: <something_other_than_the_disabled_string>" was provided.
The 2nd case I could see happening if someone thinks network config needs to live under network: {config: {<my_v1_or_v2_config>} because of the generic name of the "config" key potentially confusing image creators.
Long-term, I think I'd like to also see network-config JSON schema in some form. But, you know, "when a developer has a hammer, everything looks like a nail".
There was a problem hiding this comment.
Maybe this request of mine is just slightly adjusting the error message in parse_net_config_data to do a little better job of describing the failures on an invalid network-config I'd want to know if I have invalid config due to two failure cases:
- missing a "version:" key
- or that "config: <something_other_than_the_disabled_string>" was provided.
The 2nd case I could see happening if someone thinks network config needs to live under network: {config: {<my_v1_or_v2_config>} because of the generic name of the "config" key potentially confusing image creators.
Long-term, I think I'd like to also see network-config JSON schema in some form. But, you know, "when a developer has a hammer, everything looks like a nail".
There was a problem hiding this comment.
If we get a top-level network key we are off the typical expected path for providing network configuration.
Not according to our networking config formats 😬 . I'm not sure why we ever considered it normal not have the top-level network key, but I consider the "normal" path of not having it to be the exception we're working around.
I noticed that we're not quite handling this early enough, so I pushed a new version that handles this little earlier (I still need to update tests).
About your concerns:
missing a "version:" key
That indirectly happens here: https://github.com/canonical/cloud-init/blob/main/cloudinit/net/__init__.py#L660
or that "config: <something_other_than_the_disabled_string>" was provided.
But <something_other_than_the_disabled_string> is valid, isn't it? V1 config is:
network:
version: 1
config:
so config is valid there. Or am I misunderstanding what you're asking for?
I added a log of the network config when we raise the exception. Do you think that is good enough?
I definitely agree that a schema for networking is a good idea. We should create one.
There was a problem hiding this comment.
- Good move on earlier in stages, right before the is_disabled check makes sense to make sure we pop off the network key before checking "config": "disabled "
- missing a "version:" key: Thanks for pointing out we are already validating and raising a descriptive error here, I was unaware of that still happening and that satisfies my concern
- When I made the comment I was thinking specifically if a string value other than "disabled" were provided. Agreed that our "schema" should permit 'config: { type: [disabled, array]....} for this. Nothing really to do here, I was just thinking about checking
isinstance(net_config.get("config"), str) and net_config["config"] != "disabled": raise RuntimeError("bogus net config value dude") - Yes let's make a network-config schema. https://warthogs.atlassian.net/browse/SC-969
217e388 to
b5ef430
Compare
| "No valid network_state object created from network config. " | ||
| "Did you specify the correct version?" | ||
| "Did you specify the correct version? Network config:\n" | ||
| f"{net_config}" |
| ncfg = self._remove_top_level_network_key( | ||
| available_cfgs[cfg_source] | ||
| ) | ||
| if net.is_disabled_cfg(ncfg): |
There was a problem hiding this comment.
Just in time for the "config: disabled" check. +1.
| """ | ||
| state = None | ||
| if "network" in net_config: | ||
| net_config = net_config["network"] |
There was a problem hiding this comment.
- Good move on earlier in stages, right before the is_disabled check makes sense to make sure we pop off the network key before checking "config": "disabled "
- missing a "version:" key: Thanks for pointing out we are already validating and raising a descriptive error here, I was unaware of that still happening and that satisfies my concern
- When I made the comment I was thinking specifically if a string value other than "disabled" were provided. Agreed that our "schema" should permit 'config: { type: [disabled, array]....} for this. Nothing really to do here, I was just thinking about checking
isinstance(net_config.get("config"), str) and net_config["config"] != "disabled": raise RuntimeError("bogus net config value dude") - Yes let's make a network-config schema. https://warthogs.atlassian.net/browse/SC-969
645f7a6 to
6c6300f
Compare
Our NetworkState parsing code expects no top level network key even though it's part of our v1 and v2 standard. NoCloud and LXD datasources had their own code to strip off the top-level network key, but it failed for v2 configs. This commit simplifies and moves that code into the parser so that it is consistent across all datasources. LP: #1906187
* Stop using the CiTestCase base class * Use a common autouse fixture as a replacement of the setup method * Replace unittest assert calls to raw assert calls * Replace custom tmpdir implementation with the pytest fixture * parametrize the tests that test _find_networking_config to include a network config with a top-level network key and one without
6c6300f to
b01edd7
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks for this comprehensive fix and dropping some cut-n-paste for NoCloud/LXD.
Verified expected failure prior to this PR and fix afterward:
cat > net-with-top-level.yaml <<EOF
network:
version: 2
ethernets:
eth0:
dhcp4: true
EOF
cat > net-no-top-level.yaml <<EOF
version: 2
ethernets:
eth0:
dhcp4: true
EOF
lxc launch ubuntu-daily:jammy dev-j -c cloud-init.network-config="$(cat net-with-top-level.yaml)"
lxc exec dev-j cloud-init status --long
# See error network doesn't have version key"
- upgrade to this PR
lxc exec dev-j cloud-init clean reboot --logs
# no traceback/errorOur NetworkState parsing code expects no top level network key even though it's part of our v1 and v2 standard. NoCloud and LXD datasources had their own code to strip off the top-level network key, but it failed for v2 configs. This commit simplifies and moves that code into the parser so that it is consistent across all datasources. LP: #1906187
Proposed Commit Message
Additional Context
See https://bugs.launchpad.net/cloud-init/+bug/1906187