[1/2] DHCP: Refactor dhcp client code #2122
Conversation
e4a14ec to
aab1652
Compare
| # Generally dhclient relies on dhclient-script PREINIT action to bring | ||
| # the link up before attempting discovery. Since we are using | ||
| # -sf /bin/true, we need to do that "link up" ourselves first. | ||
| subp.subp(["ip", "link", "set", "dev", interface, "up"], capture=True) |
There was a problem hiding this comment.
this is platform specific, and should use the distro class
There was a problem hiding this comment.
Good point. This is how it works currently on main, but you're right, this isn't going to work without iproute2. I think that means that much of ephemeral.py needs to be reworked too (which should finally be reasonable to do now that we're getting distros plumbed through that module).
| pid_file, | ||
| interface, | ||
| "-sf", | ||
| "/bin/true", |
There was a problem hiding this comment.
these options are, unfortunately, not available in freebsd, https://man.freebsd.org/cgi/man.cgi?query=dhclient&apropos=0&sektion=8&manpath=FreeBSD+14.0-CURRENT&arch=default&format=html
it's one of the many reasons why we're looking into migrating to dhcpcd
There was a problem hiding this comment.
I'll propose something that uses the equivalents available in freebsd
|
|
||
| @staticmethod | ||
| def get_dhclient_d(): | ||
| # find lease files directory |
There was a problem hiding this comment.
why isn't this just reaching into distro to ask where to find it?
There was a problem hiding this comment.
Good question. This is basically a direct copy/paste from DataSourceCloudStack, but modified to be in a class.
From the git history I see that:
- 6 years ago the network manager directory was added. I suspect that this is no longer required, since recently when digging around in upstream networkmanager code I recall seeing that isc-dhclient was dropped in favor of an internal implementation. A hasty grep of my local (ubuntu) NetworkManager-based system sees nothing that matches
dhcp-server-identifier, which is the whole point of this function. - 12 years ago Ubuntu support was added, Fedora was already supported
I'm a bit hesitant to split this up, however, because of the risk of accidentally breaking one distro or another by making this more exact. Unless someone wants to audit all of the distros, I'd be more comfortable leaving it as is for now.
| # dhclient-<iface>.leases, dhclient6.leases | ||
| # centos7: ('--' is not a typo) | ||
| # dhclient--<iface>.lease, dhclient6.leases | ||
| for fname in lease_files: |
There was a problem hiding this comment.
is this going to be moved to distro in the next revision of this pr?
There was a problem hiding this comment.
It could be, but I'm not going to prioritize it right now. Right now my focus is getting functionality correct without risking breakage of things that currently work.
| print("Machine is not a Vultr instance") | ||
| sys.exit(1) | ||
|
|
||
| # It should probably be safe to try to detect distro via stages.Init(), |
There was a problem hiding this comment.
@eb3095 I'm not quite sure what __main__ in this module gets used for, or if it is even still used at all. Would you mind looking at this section if you get a chance?
With this change, piping distros throughout the dhcp and ephemeral setup code means that vultr.get_metadata() will now require a distro object, which will add flexibility across distros. The approach in this PR uses the same strategy that cloud-init normally uses. Will that work here?
Based on the is_vultr() check above, I assume that this should be running on an image with cloud-init. Therefore, I assume that this approach would work, but I want to double check my assumption.
There was a problem hiding this comment.
Yeah this should work fine. main i believe was an early discussion about how to debug. I dont think we have ever really used it, but that was the intended purpose.
| print("Machine is not a Vultr instance") | ||
| sys.exit(1) | ||
|
|
||
| # It should probably be safe to try to detect distro via stages.Init(), |
There was a problem hiding this comment.
Yeah this should work fine. main i believe was an early discussion about how to debug. I dont think we have ever really used it, but that was the intended purpose.
blackboxsw
left a comment
There was a problem hiding this comment.
Quick first pass. Wanted to get my BSD manage_service question out there as it may involve a bit of refactor to retain previous behavior. I'll continue the review today.
| return latest_file | ||
|
|
||
|
|
||
| def parse_dhcp_server_from_lease_file(lease_file): |
There was a problem hiding this comment.
Is there opportunity to leverage parse_dhcp_lease_file here?
We could also speed up this code potentially by reversing the order of leases processed to break on the first case of dhcp-server-identifier instead of processing the whole file and keeping the last case. It may not matter much in practice as the lease file processed in most cases may only have 1 lease in it though.
There was a problem hiding this comment.
Is there opportunity to leverage parse_dhcp_lease_file here?
It looks like it. This code came from a different file and looks like a case of partial code duplication. For future code archeology I think that refactoring that to deduplicate code might be better in a separate (smaller) PR, since this one is already touching and changing a lot of things.
We could also speed up this code potentially...
Agreed. I'd like for that to be a separate PR for a cleaner git history, since it doesn't fit the scope of this change.
| return ["usermod", "-G", group_name, member_name] | ||
|
|
||
| def manage_service(self, action: str, service: str): | ||
| def manage_service(self, action: str, service: str, rcs=None): |
There was a problem hiding this comment.
Forgot the @classmethod
There was a problem hiding this comment.
I see it in the latest version of this branch. I'm guessing this got fixed in a later commit than the one that you reviewed.
| # the routes are not recreated. | ||
| subp.subp( | ||
| ["service", "dhclient", "stop", dhcp_interface], | ||
| rcs=[0, 1], | ||
| capture=True, | ||
| net.dhcp.IscDhclient.start_service( | ||
| dhcp_interface, distros.freebsd.Distro | ||
| ) |
There was a problem hiding this comment.
I don't think we can make this shift as-written. These definitions are no longer the same, the new IscDhclient.start_service method completely ignores the dhcp_interface because we are using distro.manage_interface and dropping the supplemental dhcp_interface argument in the refactor.
Instead of running:
service dhclient stop eth0
we now run only the following on FreeBSD
service dhclient stop
There was a problem hiding this comment.
Nice catch, thanks. This should be fixed in f1eb8ac.
|
|
||
| @staticmethod | ||
| def parse_dhcp_lease_file(lease_file): | ||
| """Parse the given dhcp lease file for the most recent lease. |
There was a problem hiding this comment.
This returns a list of all leases. Let's use type annotations on the function definitions here to guide our expectations about params and return types
And let's also either:
- fix the logic to return only one lease and correct the one call-site in ephemeral.py to assume most recent lease
-- or -- - Fix the docstr header
There was a problem hiding this comment.
Thanks. Still missing either docstr fix
| """Parse the given dhcp lease file for the most recent lease. | |
| """Parse the given dhcp lease file returning all leases as dicts. |
There was a problem hiding this comment.
Sorry, I must have accidentally dropped that in a rebase. Thanks!
340851b to
d8b6bbe
Compare
Thanks for the initial review @blackboxsw. I think that I have addressed all of your comments. Additionally I just pushed up the refactor (commit 9d16846) that makes network ops in ephemeral.py distro-agnostic (cc @igalic). |
blackboxsw
left a comment
There was a problem hiding this comment.
Looks good @holman. Just to make sure I'm not lost in refactor intents, if we can separate net_ops work from the IscDhclient work it'd make me more certain we aren't missing something in review.
|
|
||
| @staticmethod | ||
| def parse_dhcp_lease_file(lease_file): | ||
| """Parse the given dhcp lease file for the most recent lease. |
There was a problem hiding this comment.
Thanks. Still missing either docstr fix
| """Parse the given dhcp lease file for the most recent lease. | |
| """Parse the given dhcp lease file returning all leases as dicts. |
|
|
||
| @staticmethod | ||
| def parse_dhcp_lease_file(lease_file): | ||
| def parse_dhcp_lease_file(lease_file: str) -> list: |
There was a problem hiding this comment.
I think we may want to from typing import Any, Dict, List and have to following:
| def parse_dhcp_lease_file(lease_file: str) -> list: | |
| def parse_dhcp_lease_file(lease_file: str) -> List[Dict[str, Any]]: |
| # This is used by self.shutdown_command(), and can be overridden in | ||
| # subclasses | ||
| shutdown_options_map = {"halt": "-H", "poweroff": "-P", "reboot": "-r"} | ||
| net_ops = iproute2 |
There was a problem hiding this comment.
This net_ops work feels like a bolt-on ideal that represents a bit of scope creep for this PR. Would it be okay to separate this to another PR to ease review? Otherwise, we can spend a bit more time going through this in this PR but I worry a bit about missing something here in the refactor dance.
There was a problem hiding this comment.
That's fair. I'll break it out.
| ["service", "dhclient", "stop", dhcp_interface], | ||
| rcs=[0, 1], | ||
| capture=True, | ||
| net.dhcp.IscDhclient.start_service( |
There was a problem hiding this comment.
@holmanb it looks like this PR unintentionally inverts the operations, the intent here was to stop any existing dhcp services on any dhcp interfaces before starting them again (instead of just using the systemctl restart operation). But it looks like we are now starting and then stopping. Let's flip them back.
Thanks! However, Zach might be confused by the Github notification from your comment ;)
Nice attention to detail in the review. And I agree - breaking that out into a separate commit will make this a more coherent commit, in addition to helping review burden. I've moved that work here. I believe I've addressed all of your comments. |
blackboxsw
left a comment
There was a problem hiding this comment.
@holmanb LGTM, thanks for separating the PRs into something easier to review. Refactor looks good.
There was a problem hiding this comment.
hold merge on this for a moment, I want to check that Vultr invocation of Init I had missed.
I realize now this was just when calling python3 -m cloudinit.sources.DataSourceVultr. Nothing to see here. I'm good with us loading stages.Init in one-off test/debug invocations of the module itself.
There was a problem hiding this comment.
Clear, If we have to call stages.Init() to get the distro object for callers of __main__. I think that's ok. We could source this from the sources.pkl_load(/var/lib/cloud/instance/obj.pkl), but I think stages.Init() is probably an easier entry point.
Additional Context
Reviewers may find this pull request easiest to review commit by commit rather than all at once, since this change spans several files. Each commit is logically grouped - code changes and corresponding test updates share commits.
Test Steps
Boot cloud-init on a datasource that uses ephemeral network setup.
New Log:
Full log
Checklist: