ds-identify: also discover LXD by DMI board_name = LXD#1311
Conversation
|
@stgraber I'm seeing Racy behavior with virtio device files not being setup at systemd generator timeframe. Not sure how we should resolve it given that ds-identify runs so early during generator timeframe
|
|
@blackboxsw is that an alternative you could use? |
@stgraber works great. Thanks for this pointer. I've updated the PR here with docs and integration test which asserts that both kvm and container can be detected appropriately during |
Does it guarantee the presence of a LXD source? Vs just guaranteeing that you’re rubbing on lxd. |
|
It's got the same guarantee as the virtio port minus the udev race. It doesn't guarantee that the guest will spawn a lxd-agent process but neither did the virtio-serial device. |
Ultimately the only way cloud-init can detect an operable datasource at python DS discovery time is when In al lxd container/kvm cases, Maybe since this race condition with /dev/lxd/sock exists only on lxd kvm instances, we should limit this DMI board_name check to only where DI_VIRT = "kvm" too to limit false positives dscheck_LXD() {
[ -S /dev/lxd/sock ] && return ${DS_FOUND}
if [ "$DI_VIRT" = "kvm" ]; then
get_dmi_field board_name
[ "$_RET" = "LXD" ] && return ${DS_FOUND}
fi
return ${DS_NOT_FOUND}
} |
If you are running a guest on EC2 and something routes off the metadata service during boot, then cloud-init will fail. Thats not really something to be concerned about. There are many ways thing can fail. The goal is for ds-identify to correctly identify "I am running on a platform with LXD metadata service." If all systems that have 'LXD' in dmi information can be expect to talk over /dev/lxd/sock, then this is good enough. If there are systems where that is not the case then it seems better to have the host present clear information.
"limit the false positives" is just "limit the bugs to some unlucky people". Its not really a great situation. |
Agreed. This failure mode you mention, "something routing off Ec2 IMDS" or launching Ec2 instances wth "metadata access: disabled" is the same type of failure mode we can see in LXD when someone configures On Ec2 with disabled IMDS at instance lauch we see ds-identify detect the EC2 platform yet DataSourceEc2.get_data ends up warning: and falling back to DataSourceNone when it doesn't find any matching active DS.
My phrasing was probably wrong above. The "false positive" I meant was only the ds-identify returning DS_FOUND even on containers which were launched with |
TheRealFalcon
left a comment
There was a problem hiding this comment.
LGTM with a small docs nit.
@smoser , does Chad's explanation make sense. Do you still have concerns here?
VMs will not start lxd-agent.service in systemd generator timeframe which means /dev/lxd/sock will not exist yet on LXD VM. For VM support, ds-identify will return DS_FOUND when /sys/class/dmi/id/board_name == "LXD" which exists at early boot regardless of LXD socket status.
|
I noticed this breaks the pattern of printing DMI info we gather, this question is basically: "why do we break the pattern?", along with a proposal how we might follow the pattern. If we don't care about this data or printing (I assume this is just for debugging) for some reason, feel free to disregard. Is something along these lines, perhaps: in addition to replacing |
|
Excellent review suggestion @holmanb and +1 on this sentiment, we frequently use /run/cloud-init/ds-identify.log to triage and debug errors in systemd generator time detection. So, having this output factored into that print is helpful for |
To aid in triage: read_dmi_board_name needs to print DMI board name. Also factor out is_socket_file to aid in unit test of LXD and LXD-kvm.
88f3410 to
b4f5896
Compare
|
@blackboxsw , is this ready for review again? |
|
@TheRealFalcon yes
Yes, no changes expected here. We were waiting I think on a response from @smoser if there was an objection to this approach, but I believe I explained the merit of this approach per his original concerns. |
Proposed Commit Message
Additional Context
Test Steps
Checklist: