Skip to content

[1/2] DHCP: Refactor dhcp client code #2122

Merged
holmanb merged 11 commits into
canonical:mainfrom
holmanb:holmanb/dhcpcd
Apr 19, 2023
Merged

[1/2] DHCP: Refactor dhcp client code #2122
holmanb merged 11 commits into
canonical:mainfrom
holmanb:holmanb/dhcpcd

Conversation

@holmanb
Copy link
Copy Markdown
Member

@holmanb holmanb commented Apr 13, 2023

Move isc-dhclient code to dhcp.py

In support of the upcoming deprecation of
isc-dhcp-client, this code refactors current
dhcp code into classes in dhcp.py. The
primary user-visible change should be the
addition of the following log:

dhcp.py[DEBUG]: DHCP client selected: dhclient

This code lays groundwork to enable
alternate implementations to live side by
side in the codebase to be selected with
distro-defined priority fallback. Note that
maybe_perform_dhcp_discovery() now selects
which dhcp client to call, and then runs the
corresponding client's dhcp_discovery()
method. Currently only class IscDhclient is
implemented, however a yet-to-be-implemented
class Dhcpcd exists to test fallback behavior
and this will be implemented in part two of
this series.

Part of this refactor includes shifting
dhclient service management from hardcoded
calls to the distro-defined manage_service()
method in the *BSDs. Future work is required
in this area to support multiple clients via
select_dhcp_client().

Additional Context

Reviewers may find this pull request easiest to review commit by commit rather than all at once, since this change spans several files. Each commit is logically grouped - code changes and corresponding test updates share commits.

Test Steps

Boot cloud-init on a datasource that uses ephemeral network setup.

tox -e integration-test -- tests/integration_tests/test_logging.py

New Log:

2023-04-13 18:00:28,773 - util.py[DEBUG]: Read 5 bytes from /sys/class/net/ens5/operstate
2023-04-13 18:00:28,773 - dhcp.py[DEBUG]: DHCP client selected: dhclient                                                                                                                  
2023-04-13 18:00:28,773 - dhcp.py[DEBUG]: Performing a dhcp discovery on ens5  

Full log

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@holmanb holmanb force-pushed the holmanb/dhcpcd branch 2 times, most recently from e4a14ec to aab1652 Compare April 13, 2023 18:59
@holmanb holmanb requested a review from igalic April 13, 2023 21:40
Copy link
Copy Markdown
Collaborator

@igalic igalic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first look

Comment thread cloudinit/net/dhcp.py
# Generally dhclient relies on dhclient-script PREINIT action to bring
# the link up before attempting discovery. Since we are using
# -sf /bin/true, we need to do that "link up" ourselves first.
subp.subp(["ip", "link", "set", "dev", interface, "up"], capture=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is platform specific, and should use the distro class

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is how it works currently on main, but you're right, this isn't going to work without iproute2. I think that means that much of ephemeral.py needs to be reworked too (which should finally be reasonable to do now that we're getting distros plumbed through that module).

Comment thread cloudinit/net/dhcp.py
pid_file,
interface,
"-sf",
"/bin/true",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these options are, unfortunately, not available in freebsd, https://man.freebsd.org/cgi/man.cgi?query=dhclient&apropos=0&sektion=8&manpath=FreeBSD+14.0-CURRENT&arch=default&format=html
it's one of the many reasons why we're looking into migrating to dhcpcd

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll propose something that uses the equivalents available in freebsd

Comment thread cloudinit/net/dhcp.py

@staticmethod
def get_dhclient_d():
# find lease files directory
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this just reaching into distro to ask where to find it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is basically a direct copy/paste from DataSourceCloudStack, but modified to be in a class.

From the git history I see that:

  • 6 years ago the network manager directory was added. I suspect that this is no longer required, since recently when digging around in upstream networkmanager code I recall seeing that isc-dhclient was dropped in favor of an internal implementation. A hasty grep of my local (ubuntu) NetworkManager-based system sees nothing that matches dhcp-server-identifier, which is the whole point of this function.
  • 12 years ago Ubuntu support was added, Fedora was already supported

I'm a bit hesitant to split this up, however, because of the risk of accidentally breaking one distro or another by making this more exact. Unless someone wants to audit all of the distros, I'd be more comfortable leaving it as is for now.

Comment thread cloudinit/net/dhcp.py
# dhclient-<iface>.leases, dhclient6.leases
# centos7: ('--' is not a typo)
# dhclient--<iface>.lease, dhclient6.leases
for fname in lease_files:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to be moved to distro in the next revision of this pr?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but I'm not going to prioritize it right now. Right now my focus is getting functionality correct without risking breakage of things that currently work.

print("Machine is not a Vultr instance")
sys.exit(1)

# It should probably be safe to try to detect distro via stages.Init(),
Copy link
Copy Markdown
Member Author

@holmanb holmanb Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eb3095 I'm not quite sure what __main__ in this module gets used for, or if it is even still used at all. Would you mind looking at this section if you get a chance?

With this change, piping distros throughout the dhcp and ephemeral setup code means that vultr.get_metadata() will now require a distro object, which will add flexibility across distros. The approach in this PR uses the same strategy that cloud-init normally uses. Will that work here?

Based on the is_vultr() check above, I assume that this should be running on an image with cloud-init. Therefore, I assume that this approach would work, but I want to double check my assumption.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should work fine. main i believe was an early discussion about how to debug. I dont think we have ever really used it, but that was the intended purpose.

@jfroche jfroche mentioned this pull request Apr 14, 2023
3 tasks
print("Machine is not a Vultr instance")
sys.exit(1)

# It should probably be safe to try to detect distro via stages.Init(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should work fine. main i believe was an early discussion about how to debug. I dont think we have ever really used it, but that was the intended purpose.

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass. Wanted to get my BSD manage_service question out there as it may involve a bit of refactor to retain previous behavior. I'll continue the review today.

Comment thread cloudinit/net/dhcp.py Outdated
return latest_file


def parse_dhcp_server_from_lease_file(lease_file):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there opportunity to leverage parse_dhcp_lease_file here?

We could also speed up this code potentially by reversing the order of leases processed to break on the first case of dhcp-server-identifier instead of processing the whole file and keeping the last case. It may not matter much in practice as the lease file processed in most cases may only have 1 lease in it though.

Copy link
Copy Markdown
Member Author

@holmanb holmanb Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there opportunity to leverage parse_dhcp_lease_file here?

It looks like it. This code came from a different file and looks like a case of partial code duplication. For future code archeology I think that refactoring that to deduplicate code might be better in a separate (smaller) PR, since this one is already touching and changing a lot of things.

We could also speed up this code potentially...

Agreed. I'd like for that to be a separate PR for a cleaner git history, since it doesn't fit the scope of this change.

Comment thread cloudinit/distros/openbsd.py Outdated
return ["usermod", "-G", group_name, member_name]

def manage_service(self, action: str, service: str):
def manage_service(self, action: str, service: str, rcs=None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot the @classmethod

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it in the latest version of this branch. I'm guessing this got fixed in a later commit than the one that you reviewed.

Comment thread cloudinit/net/freebsd.py
Comment on lines 53 to 56
# the routes are not recreated.
subp.subp(
["service", "dhclient", "stop", dhcp_interface],
rcs=[0, 1],
capture=True,
net.dhcp.IscDhclient.start_service(
dhcp_interface, distros.freebsd.Distro
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make this shift as-written. These definitions are no longer the same, the new IscDhclient.start_service method completely ignores the dhcp_interface because we are using distro.manage_interface and dropping the supplemental dhcp_interface argument in the refactor.

Instead of running:
service dhclient stop eth0
we now run only the following on FreeBSD
service dhclient stop

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks. This should be fixed in f1eb8ac.

Comment thread cloudinit/net/dhcp.py Outdated

@staticmethod
def parse_dhcp_lease_file(lease_file):
"""Parse the given dhcp lease file for the most recent lease.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a list of all leases. Let's use type annotations on the function definitions here to guide our expectations about params and return types
And let's also either:

  • fix the logic to return only one lease and correct the one call-site in ephemeral.py to assume most recent lease
    -- or --
  • Fix the docstr header

Copy link
Copy Markdown
Member Author

@holmanb holmanb Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in c65dfe7

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Still missing either docstr fix

Suggested change
"""Parse the given dhcp lease file for the most recent lease.
"""Parse the given dhcp lease file returning all leases as dicts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must have accidentally dropped that in a rebase. Thanks!

@holmanb holmanb force-pushed the holmanb/dhcpcd branch 5 times, most recently from 340851b to d8b6bbe Compare April 17, 2023 18:19
@holmanb
Copy link
Copy Markdown
Member Author

holmanb commented Apr 17, 2023

Quick first pass. Wanted to get my BSD manage_service question out there as it may involve a bit of refactor to retain previous behavior. I'll continue the review today.

Thanks for the initial review @blackboxsw. I think that I have addressed all of your comments.

Additionally I just pushed up the refactor (commit 9d16846) that makes network ops in ephemeral.py distro-agnostic (cc @igalic).

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @holman. Just to make sure I'm not lost in refactor intents, if we can separate net_ops work from the IscDhclient work it'd make me more certain we aren't missing something in review.

Comment thread cloudinit/net/dhcp.py Outdated

@staticmethod
def parse_dhcp_lease_file(lease_file):
"""Parse the given dhcp lease file for the most recent lease.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Still missing either docstr fix

Suggested change
"""Parse the given dhcp lease file for the most recent lease.
"""Parse the given dhcp lease file returning all leases as dicts.

Comment thread cloudinit/net/dhcp.py Outdated

@staticmethod
def parse_dhcp_lease_file(lease_file):
def parse_dhcp_lease_file(lease_file: str) -> list:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to from typing import Any, Dict, List and have to following:

Suggested change
def parse_dhcp_lease_file(lease_file: str) -> list:
def parse_dhcp_lease_file(lease_file: str) -> List[Dict[str, Any]]:

Comment thread cloudinit/distros/__init__.py Outdated
# This is used by self.shutdown_command(), and can be overridden in
# subclasses
shutdown_options_map = {"halt": "-H", "poweroff": "-P", "reboot": "-r"}
net_ops = iproute2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This net_ops work feels like a bolt-on ideal that represents a bit of scope creep for this PR. Would it be okay to separate this to another PR to ease review? Otherwise, we can spend a bit more time going through this in this PR but I worry a bit about missing something here in the refactor dance.

Copy link
Copy Markdown
Member Author

@holmanb holmanb Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'll break it out.

Comment thread cloudinit/net/freebsd.py Outdated
["service", "dhclient", "stop", dhcp_interface],
rcs=[0, 1],
capture=True,
net.dhcp.IscDhclient.start_service(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb it looks like this PR unintentionally inverts the operations, the intent here was to stop any existing dhcp services on any dhcp interfaces before starting them again (instead of just using the systemctl restart operation). But it looks like we are now starting and then stopping. Let's flip them back.

Copy link
Copy Markdown
Member Author

@holmanb holmanb Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed.

@holmanb
Copy link
Copy Markdown
Member Author

holmanb commented Apr 18, 2023

Looks good @holman.

Thanks! However, Zach might be confused by the Github notification from your comment ;)

Just to make sure I'm not lost in refactor intents, if we can separate net_ops work from the IscDhclient work it'd make me more certain we aren't missing something in review.

Nice attention to detail in the review. And I agree - breaking that out into a separate commit will make this a more coherent commit, in addition to helping review burden. I've moved that work here.

I believe I've addressed all of your comments.

@holmanb holmanb requested a review from blackboxsw April 18, 2023 02:55
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb LGTM, thanks for separating the PRs into something easier to review. Refactor looks good.

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold merge on this for a moment, I want to check that Vultr invocation of Init I had missed.
I realize now this was just when calling python3 -m cloudinit.sources.DataSourceVultr. Nothing to see here. I'm good with us loading stages.Init in one-off test/debug invocations of the module itself.

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear, If we have to call stages.Init() to get the distro object for callers of __main__. I think that's ok. We could source this from the sources.pkl_load(/var/lib/cloud/instance/obj.pkl), but I think stages.Init() is probably an easier entry point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants