Fix IPv6 netmask format for sysconfig#1215
Conversation
55a488a to
a845c05
Compare
|
Hi @hjensas, Thanks for the PR! I was wondering why this sort of thing isn't a builtin in ipaddress. It seems the standard library doesn't provide the same helpers for IPv6 as IPv4. Per the docs:
A couple of questions/comments: Could you provide some additional context about how/where this fails? I think we want to add netmask validation. Right now as long as an address is valid IPv6, this implementation will pass values, even if it is an invalid netmask (consider an invalid mask like Also, can we avoid the pack/unpack step? I don't think it is necessary and it adds mental overhead to this code. It brings up questions like: "Is it really safe to unpack this as big endian even though I am on a little endian machine?" I think something like this would probably be preferable: |
Yes, it is a pity the library does not support this functionality for IPv6.
There is information in the LP. NetworkManager will ignore the routes, as it sees the entry as invalid when the netmask is in expanded format. Similarly networkscripts will call the 'ip -6 route' command which also complain that the input is invalid.
OK, this makes sense.
Also makes sense. I will update the patch. |
|
Hold on, there is already a ipv6_mask_to_net_prefix() method [1]. [1] cloud-init/cloudinit/net/network_state.py Line 1058 in bae9b11 |
I solved this by adding leading zeros until the length of the string is correct i.e 32/128 bits wide depending on ip version.
Also passing int > 128 passes, so does int > 32 for .... so I decided to try to fix these nits as well with some refactoring. Let me know what you think. |
I did a little bit of digging. This format is not valid a valid netmask notation for ipv6, hence the lack of support in openstack and python standard library. I need to do a little bit of digging to better understand the rendering code, but if this format is coming from netplan we probably need to push on netplan to use a valid format. Either way, the nits you uncovered need to be fixed, thanks for pointing them out. I'll try to make some time in the next couple of days to give this a proper review. |
| raise ValueError("netmask '%s' had only %d parts" % (mask, len(toks))) | ||
|
|
||
| return sum([bin(int(x)).count("1") for x in toks]) | ||
| return mask_to_net_prefix(mask, 32) |
There was a problem hiding this comment.
It's probably worth pointing out that the standard library implements the functionality required for ipv4_mask_to_net_prefix() already.
This could easily be:
def ipv4_mask_to_net_prefix(mask):
return ipaddress.IPv4Network(f"0.0.0.0/{mask}").prefixlen
consider:
>>> ipaddress.IPv4Network("0.0.0.0/255.255.254.0").prefixlen
23
There was a problem hiding this comment.
Yes, I'm aware of that. We can of course opt to use the functionality from the standard library in case the caller uses the ip version specific method.
My reasoning was that in cloud-init there are many places where mask_to_net_prefix() is called directly without the caller's knowledge if it is IPv6 or IPv4. I figured since mask_to_net_prefix() here works for both IPv4 and IPv6 these version specific methods can just as well use mask_to_net_prefix.
If you feel strongly about using the library, in ipv4_mask_to_net_prefix I can change it.
There was a problem hiding this comment.
NetworkManager will ignore the routes, as it sees the entry as invalid when the netmask is in expanded format.
I did a little bit of digging. This format is not valid a valid netmask notation for ipv6, hence the lack of support in openstack and python standard library. I need to do a little bit of digging to better understand the rendering code, but if this format is coming from netplan we probably need to push on netplan to use a valid format.
++ from me on changeing the source, but afaik this coming from opentack's network metadata.
See: https://docs.openstack.org/nova/latest/user/metadata.html#openstack-format-metadata
Their Schema: https://docs.openstack.org/nova/latest/_downloads/9119ca7ac90aa2990e762c08baea3a36/network_data.json
To quote the network_data.json schema:
"l3_ipv6_netmask": {
"$id": "#/definitions/l3_ipv6_netmask",
"type": "string",
"pattern": "^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7})|(::))$",
"title": "L3 IPv6 network mask",
"examples": [
"ffff:ffff:ffff:ffff::"
]
},
Openstack Nova is using 'netaddr', and it can do this:
>>> route_addr = netaddr.IPNetwork('2bd0::/63')
>>> route_addr
IPNetwork('2bd0::/63')
>>> route_addr.netmask
IPAddress('ffff:ffff:ffff:fffe::')
Here is the Nova code building the network_metadata, it shared code for IPv4 and IPv6.
https://opendev.org/openstack/nova/src/branch/master/nova/virt/netutils.py#L311-L319
FWIW, I added Openstack Compute (nova) to the LP: 1959148 since this is the source of the invalid netmask notatation. |
Here are a couple of references for this claim. [1] https://tools.ietf.org/id/draft-lubashev-ipv6-addr-mask-01.html |
|
|
||
| # Get the binary mask as string, remove `0b` prefix, convert to int | ||
| bin_mask = int(bin(int(netmask))[2:]) | ||
| # Zero fill left until correct length 32 for IPv4, 128 for IPv6 |
There was a problem hiding this comment.
I can't say I like relying on regex for checking the validity of a netmask, but you have included unit tests and I also don't see anything incorrect with this implementation so unless anybody else objects to it I'm fine with this.
There was a problem hiding this comment.
ack, I got the inspiration from ipaddress.
| return prefix | ||
| # If mask passed as integer, or convert to int succeded | ||
| if isinstance(mask, int): | ||
| if max_bits and not (0 <= mask <= max_bits): |
There was a problem hiding this comment.
I think that this logic is wrong. Here the converted integer value of an address is compared against the number of bits. I think this needs to be compared to 2^max_bits - 1 I don't see any tests that test the "int conversion" codepath, so I think it would be good to exercise this codepath either way.
If I understand this code path correctly, the intention is that "mask" passed in is an IPv4Address object, correct?
For reference:
>>> import ipaddress
>>> 2**32 - 1 == int(ipaddress.IPv4Address("255.255.255.255"))
True
Alternatively, if we aren't actually using this codepath anywhere (haven't checked) we could could just remove it.
There was a problem hiding this comment.
I think that this logic is wrong. Here the converted integer value of an address is compared against the number of bits. I think this needs to be compared to
2^max_bits - 1I don't see any tests that test the "int conversion" codepath, so I think it would be good to exercise this codepath either way.If I understand this code path correctly, the intention is that "mask" passed in is an
IPv4Addressobject, correct?
No, this method will raise TypeError if a IPv4Address or IPv6Address is passed.
In cloudinit/net/network_state.py:L1056 there is a condition on stror int instance, IPv4Address or IPv6Address is neither. However, we should add IPv4Address and IPv6Address to that condition so that these types are allowed.
The handling of int and str that can be converted to int here is to keep the functionality provided by the methods being refactored. The idea is that the caller does not know if the netmask it has is represented as a prefix or IPvXAddress string. In the case the value is already a prefix either int(:number:) for str(:number:)the methods will simply "echo" back the number as int.
For reference:
>>> import ipaddress >>> 2**32 - 1 == int(ipaddress.IPv4Address("255.255.255.255")) True
The (0 <= mask <= max_bits) logic is simply trying to make sure a prefix value passed is valid. e.g 0-32 for IPv4 and 0-128 for IPv6.
Alternatively, if we aren't actually using this codepath anywhere (haven't checked) we could could just remove it.
The mask_to_net_prefix is used a bit:
cloudinit/net/__init__.py:from cloudinit.net.network_state import mask_to_net_prefix
cloudinit/net/__init__.py: self.prefix = mask_to_net_prefix(prefix_or_mask)
cloudinit/net/network_state.py: prefix = mask_to_net_prefix(maybe_prefix)
cloudinit/net/network_state.py: prefix = mask_to_net_prefix(netmask)
cloudinit/net/network_state.py: return mask_to_net_prefix(mask, 32)
cloudinit/net/network_state.py: return mask_to_net_prefix(mask, 128)
cloudinit/net/network_state.py:def mask_to_net_prefix(mask, max_bits=None):
cloudinit/net/network_state.py: mask_dec = ipv4_mask_to_net_prefix(mask)
cloudinit/net/sysconfig.py: prefix_value = network_state.ipv6_mask_to_net_prefix(
cloudinit/sources/helpers/vmware/imc/config_nic.py: cidr = mask_to_net_prefix(netmask)
cloudinit/sources/DataSourceOpenNebula.py: prefix = str(net.mask_to_net_prefix(mask))
There was a problem hiding this comment.
No, this method will raise
TypeErrorif aIPv4AddressorIPv6Addressis passed. Incloudinit/net/network_state.py:L1056there is a condition onstrorintinstance,IPv4AddressorIPv6Addressis neither.
I missed that, thanks.
However, we should add
IPv4AddressandIPv6Addressto that condition so that these types are allowed.
The handling ofintandstrthat can be converted tointhere is to keep the functionality provided by the methods being refactored. The idea is that the caller does not know if the netmask it has is represented as aprefixor IPvXAddress string. In the case the value is already aprefixeitherint(:number:)forstr(:number:)the methods will simply "echo" back the number as int.
Alternatively, if we aren't actually using this codepath anywhere (haven't checked) we could could just remove it.
The
mask_to_net_prefixis used a bit:cloudinit/net/__init__.py:from cloudinit.net.network_state import mask_to_net_prefix cloudinit/net/__init__.py: self.prefix = mask_to_net_prefix(prefix_or_mask) cloudinit/net/network_state.py: prefix = mask_to_net_prefix(maybe_prefix) cloudinit/net/network_state.py: prefix = mask_to_net_prefix(netmask) cloudinit/net/network_state.py: return mask_to_net_prefix(mask, 32) cloudinit/net/network_state.py: return mask_to_net_prefix(mask, 128) cloudinit/net/network_state.py:def mask_to_net_prefix(mask, max_bits=None): cloudinit/net/network_state.py: mask_dec = ipv4_mask_to_net_prefix(mask) cloudinit/net/sysconfig.py: prefix_value = network_state.ipv6_mask_to_net_prefix( cloudinit/sources/helpers/vmware/imc/config_nic.py: cidr = mask_to_net_prefix(netmask) cloudinit/sources/DataSourceOpenNebula.py: prefix = str(net.mask_to_net_prefix(mask))
I know the function is used in several places, what I meant here is that if no calls to mask_to_net_prefix() with an integer type exist, then we could simplify the function by removing support for passing in non-strings. My reasoning is that, although functions that support multiple input types are convenient, they add debt in that they are more error prone / difficult to review / audit, so if we aren't using them it's better to simplify, that's all.
There was a problem hiding this comment.
I know the function is used in several places, what I meant here is that if no calls to
mask_to_net_prefix()with an integer type exist, then we could simplify the function by removing support for passing in non-strings. My reasoning is that, although functions that support multiple input types are convenient, they add debt in that they are more error prone / difficult to review / audit, so if we aren't using them it's better to simplify, that's all.
Ah, yes now I understand. And I fully agree!
If I remove int from the allowd types at cloudinit/net/network_state.py:L1056 and run unit tests it looks like only two tests fail:
FAILED tests/unittests/test_net.py::TestEniRoundTrip::test_ipv6_static_routes - TypeError: Invalid network mask '24'
FAILED tests/unittests/net/test_init.py::TestEphemeralIPV4Network::test_ephemeral_ipv4_network_with_prefix - TypeError: Invalid network mask '16'
Might not be so much work to handle this at the caller side instead.
Consider deprecate non-string, log a warning and then follow up with a patch to remove support for passing non-string in a future release?
There was a problem hiding this comment.
If I remove
intfrom the allowd types atcloudinit/net/network_state.py:L1056and run unit tests it looks like only two tests fail:FAILED tests/unittests/test_net.py::TestEniRoundTrip::test_ipv6_static_routes - TypeError: Invalid network mask '24' FAILED tests/unittests/net/test_init.py::TestEphemeralIPV4Network::test_ephemeral_ipv4_network_with_prefix - TypeError: Invalid network mask '16'Might not be so much work to handle this at the caller side instead.
I analyzed the Eni parser yesterday. It doesn't cast to an int anywhere. I think the parsed input from Eni will always be a string.
Since test_ephemeral_ipv4_network_with_prefix() seems to be generic with respect to datasource/renderer, I don't think that is suggesting that any parser requires using the integer argument, just exists for test coverage.
Consider deprecate non-string, log a warning and then follow up with a patch to remove support for passing non-string in a future release?
Since this function is internal to cloud-init, I don't think that it makes much sense to log a deprecation warning. Users don't have anything actionable to do about it (they don't have control over the type I don't think) so it would just be noise. Since it appears that the integer argument codepath is executed purely by tests, I think it makes more sense to just rip it out.
One more realization I had yesterday while reviewing this is that all calls to mask_to_net_prefix() in the code appear to be in a context where the ip address version is actually known. This means that a generic mask_to_net_prefix() isn't really a hard requirement - just a convenience. Since the ipv(4|6)_mask_to_net_prefix() implementations always call mask_to_net_prefix() with a max_bits value, they provide better validation. With this in mind, I would argue that we should probably replace mask_to_net_prefix() with the ipv(4|6)_ versions to get stronger validation, and perhaps change mask_to_net_prefix() to _mask_to_net_prefix().
There was a problem hiding this comment.
I removed mask_to_net_prefix() completely and refactored ipv(4|6)_mask_to_net_prefix to use ipaddress from the library when possible. I updated code in the few places where mask_to_net_prefix() was used directly to use the version specific methods.
AFICT, removing mask_to_net_prefix() turned out to be a lot less of a hassle than I thought it would be.
Thanks for suggesting it!
There was a problem hiding this comment.
@hjensas I think this approach looks much better, thanks!
4c453db to
2bc5b01
Compare
holmanb
left a comment
There was a problem hiding this comment.
Thanks for the work on this! I think this looks good.
@TheRealFalcon @blackboxsw thoughts?
holmanb
left a comment
There was a problem hiding this comment.
I think this still needs a couple more changes before it's ready to merge, so I'm setting this to "Request changes" for now.
This change converts the IPv6 netmask from the network_data.json[1]
format to the CIDR style, <IPv6_addr>/<prefix>.
Using an IPv6 address like ffff:ffff:ffff:ffff:: does not work with
NetworkManager, nor networkscripts.
NetworkManager will ignore the route, logging:
ifcfg-rh: ignoring invalid route at \
"::/:: via fd00:fd00:fd00:2::fffe dev $DEV" \
(/etc/sysconfig/network-scripts/route6-$DEV:3): \
Argument for "::/::" is not ADDR/PREFIX format
Similarly if using networkscripts, ip route fail with error:
Error: inet6 prefix is expected rather than \
"fd00:fd00:fd00::/ffff:ffff:ffff:ffff::".
Also a bit of refactoring ...
cloudinit.net.sysconfig.Route.to_string:
* Move a couple of lines around to reduce repeated code.
* if "ADDRESS" not in key -> continute, so that the
code block following it can be de-indented.
cloudinit.net.network_state:
* Refactors the ipv4_mask_to_net_prefix, ipv6_mask_to_net_prefix
removes mask_to_net_prefix methods. Utilize ipaddress library to
do some of the heavy lifting.
LP: #1959148
hjensas
left a comment
There was a problem hiding this comment.
I took your advice and removed mask_to_net_prefix() completely.
A lot of change from previous versions of the patch, so re-set and re-review.
holmanb
left a comment
There was a problem hiding this comment.
Thanks @hjensas. I think this PR has come together nicely. It makes use of standard library functionality in the default cases and the "workaround" code in ipv6_mask_to_net_prefix() only runs when nonstandard ipv6 prefix inputs are received. That workaround matches the standard library implementation for checking IPv4 netmasks, adapted for IPv6.
LGTM!
TheRealFalcon
left a comment
There was a problem hiding this comment.
Thanks all! Good work and good collaboration.
This change converts the IPv6 netmask from the network_data.json[1]
format to the CIDR style, <IPv6_addr>/.
Using an IPv6 address like ffff:ffff:ffff:ffff:: does not work with
NetworkManager, nor networkscripts.
NetworkManager will ignore the route, logging:
Similarly if using networkscripts, ip route fail with error:
This change also add network mask validation, if the network mask is not
valid a warning is logged and the route is ignored.
Also some slight re-factoring reducing repeated code lines, and dedent a
block of code.
LP: #1959148
Checklist: