feat(vmware): Support network events#6063
Conversation
|
cc @PengpengSun |
ae2d751 to
429877d
Compare
429877d to
e9917c1
Compare
Do you think I should remove HOTPLUG if the transport is IMC then? |
e9917c1 to
7351b91
Compare
@PengpengSun I updated the PR so that |
d35e4a3 to
9be437d
Compare
@akutz Please keep HOTPLUG event support for IMC, although IMC hasn't supported live metadata update (which means delegating live customization cfg to cloud-init). Another feature to be done, when it's ready, HOTPLUG event will be supported by IMC transport directly. For now, guestinfo transport is in front of IMC and both transports requires VMware platform, so we are good. |
9be437d to
827a6ef
Compare
Done. |
31c0cc7 to
5e58f1c
Compare
TheRealFalcon
left a comment
There was a problem hiding this comment.
Just to be clear, by default and without any additional user data, you want cloud-init to
- Re-fetch all IMDS metadata every single boot, and
- Re-fetch network metadata every time there is a network add or remove event, including virtual network interfaces such as those used by docker
I'm asking generally because it seems to be a known issue that metadata sources aren't always available on the VMware platform. Is the current code to determine whether to used the cached datasource meant to handle these additional use cases?
In the past, item 2 has caused a critical bug on EC2 that resulted in restricting hotplug to specific drivers. Is this also a valid concern for VMware?
Except for the bit about Docker (see below, I will address this the same way Ec2 did), yeah. This is for a few reasons. When we backup/restore VMs and/or replicate them to other sites, we want to have them update their network configurations. However, today the only way we can do that is by changing the instance ID, which is bad, mmmm kay? It causes things like runcmds to run again and SSH host keys to SSH host key again. All bad stuff for the most part, at least from the user's perspective. Since we do most of this in a Kubernetes controller we don't know if the VM was restored and don't know if it was replicated, so we Cloud-Init to check the network config at boot to verify all is spiffy in who-ville. Let's say we knew about these events (restore, replication), is the user data section for configuring allowed events read at each boot? Could that be used via vendordata to tell the next boot to reconfigure networking?
The above was specific to the IMC transport. The metadata should always available via the other transports as long as it is not redacted. I will add detection to determine if the network data is in the metadata with something like: if "network" in self.metadataAnd change the supported events to
Good call. I'll use the same defaults and make the list of drivers configurable via a metadata option. Thanks @TheRealFalcon! |
Hi @akutz and @TheRealFalcon , What if there is persistent metadata in guestinfo and instance has the "second" boot, will it apply metadata again to this instance? especially network in metadata. If yes, it will overwrite network configuration which customer manually set between "first" and "second" boot.
Yes, for IMC transport, we don't want to instance re-fetch all IMDS metadata every single boot. My understanding is that with the current code to determine whether to used the cached datasource, if cached data_access_method == DATA_ACCESS_METHOD_IMC, cloud-init will fallback to cache but re-fetch all IMDS metadata.
|
@TheRealFalcon Is Cloud-Init smart enough to detect that the network configuration has not changed, even if we request the BOOT event? But even if Cloud-Init does configure it again, is that not the goal? If the network data has not changed, then even if it overwrites the network configuration, it is the same. |
@PengpengSun Let's sync Monday on the above. It sounds like you need me to update the PR to indicate the allowed events are |
It will refetch the configuration every boot and depending on the config being rendered, it may check the diff between the two configs to see if it needs to write out a new config. But you're correct; when the network data hasn't changed, it doesn't matter if we re-write the same config because the config is the same. |
Is this so bad though? I mean on your end that is. The cost on our end is what it is, and it's not network based, so an RPC call per boot is not the worst thing. But is it super expensive upstream of the data source? For example, does Cloud-Init need to transform the network v2 data we give back via the network renderer before it can do the diff you mentioned? If that is the case, then I can see how that adds to the cost. Would there be value in our DS keeping its own receipt to track whether or not the network has changed? The more I think about this, the more I wonder if we should set the SUPPORTED_EVENTS to everything but set the DEFAULT_EVENTS to just Still, VM Service is the future of vSphere, so it may not matter in the long run, and I like that we don't have to do anything with vendordata since we don't use it yet for anything. It's always there in the back pocket. Yet, I've been hesitant to use it since the docs have always indicated that vendordata can be blocked at the guest level, and that no one should use it if something is required to occur. |
@akutz If there is no cache, both IMC and guestinfo transports are ok to re-fetch metadata, the allowed events is at least |
I'm not really worried about the BOOT event cost-wise. It could add a second or two to boot, but otherwise there's not really a cost for cloud-init. Many other datasources have moved to doing this. I was more just double checking that the caching behavior has been considered and that there are no big blockers there. If things are ok from your end, then I'm ok with the change.
Yes, but that's fairly trivial time-wise. |
|
Hi @TheRealFalcon, I am really struggling to understand the difference between
However, if you look through the codebase, there are placed where
...boolean Never mind, I missed that dangling boolean
|
a9216c8 to
5f72f90
Compare
|
@TheRealFalcon I made the requested changes. @PengpengSun and I also decided that by default we will not activate the |
e7034c6 to
02ab924
Compare
TheRealFalcon
left a comment
There was a problem hiding this comment.
Looking pretty good! I left a few mostly minor inline comments. Are you wanting another review from a VMware person before we wrap this one up?
db7232f to
01727ae
Compare
c22cdc5 to
bcb2b1e
Compare
@PengpengSun is good with it, so once you are, please go ahead and merge. Thank you! |
This patch updates the DataSource for VMware to support network reconfiguration when the BOOT, BOOT_NEW_INSTANCE, and HOTPLUG events are received. Previously the datasource could only reconfigure the network if a new instance ID was detected. However, due to features like backup/restore, migrating VMs, etc., it was determined that it is valuable to support reconfiguring the network without changing the instance ID. This is because changing the instance ID also means running the per-instance configuration modules, such as regenerating the system's SSH host keys, which could lock out automation.
bcb2b1e to
787276c
Compare
|
Hi @akutz and @PengpengSun, I have a question about IMC transports, during the OS upgrade, cloud-init would clean cache when a Python version change is detected(#857), how could cloud-init re-fetch metadata? Based on my test, it is fallback to DataSourceNone (cloud-init 24.4), does this feature(#6063) support re-fetching metadata when using IMC transports?
|
|
Hi @xiachen-rh This feature(#6063) does NOT support re-fetching metadata on HOTPLUG for IMC transport, the reason is the data from IMC transport is not persistent, re-fetching metadata will not load data from IMC transport unless vm is customized along with HOTPLUG. |
Thanks @PengpengSun for your help, and good to know this setting, and I tested, cloud-init is disabled after OS upgrade reboot, I think it can fit the upgrade scenario well, we will document it, thanks! |
But don't we want cloud-init enabled after OS upgrade? I am confused. |
Proposed Commit Message
Additional Context
NATest Steps
Merge type
Fixes #5729