[kubevirt] Add auto-detection of the correct RHCOS base image#994
[kubevirt] Add auto-detection of the correct RHCOS base image#994rmohr wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rmohr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
f61bee4 to
b9511a6
Compare
davidvossel
left a comment
There was a problem hiding this comment.
hey, I started thinking about this more. One thing that this PR exposed to me is that we already have the ability to detect the openstack url correctly. I didn't realize that.
Since we have this ability, I think it would be good for us to go ahead and support using the openstack images with DataVolumeTemplates so we have a someone practical path forward for users even if RFE-2501 doesn't land in 4.11
thoughts?
| func kubevirtMachineTemplateSpec(nodePool *hyperv1.NodePool, containerDiskChecksum string) *capikubevirt.KubevirtMachineTemplateSpec { | ||
| template := nodePool.Spec.Platform.Kubevirt.NodeTemplate.DeepCopy() | ||
| dataVolumeTemplates := template.Spec.DataVolumeTemplates | ||
| for i, dv := range dataVolumeTemplates { | ||
| if dv.Name == "containervolume" { | ||
| if dv.Spec.Source != nil && dv.Spec.Source.Registry != nil && | ||
| dv.Spec.Source.Registry.URL != nil && strings.Contains(*dv.Spec.Source.Registry.URL, "{rhcos:version}") { | ||
| image := strings.ReplaceAll(*dv.Spec.Source.Registry.URL, "{rhcos:version}", containerDiskChecksum) | ||
| dataVolumeTemplates[i].Spec.Source.Registry.URL = &image | ||
| } | ||
| } | ||
| } | ||
| volumes := template.Spec.Template.Spec.Volumes | ||
| for i, volume := range volumes { | ||
| if volume.Name == "containervolume" { | ||
| if volume.ContainerDisk != nil && strings.Contains(volume.ContainerDisk.Image, "{rhcos:version}") { | ||
| volumes[i].ContainerDisk.Image = strings.ReplaceAll(volume.ContainerDisk.Image, "{rhcos:version}", containerDiskChecksum) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
While we're looking at this, I think it would be wise to go ahead and shore up the ability to use the openstack urls with DataVolumes so we have a solution we can support easily even without RFE-2501
To do that, can we make multiple magic template strings here?
{rhcos:openstack:url} - used to template the full https:// url of the openstack qcow
{rhcos:openstack:checksum} - used to template just the checksum part.
{rhcos:version:major} - used to template the major version in use
{rhcos:version:minor} - used to template the minor version in use
Then we can keep everything exactly the way it is today. Someone could set the container disk image using the cli with
--containerdisk=quay.io/containerdisks/rhcos-{rhcos:major}.{rhcos:minor}:{rhcos:openstack:checksum}
and if someone wanted to use a DataVolumeTemplate that referenced the openstack qcow url directly, they'd construct the NodePool with a DataVolumeTemplate in the kubevirt platform bit that looked like this.
dataVolumeTemplates:
- metadata:
name: rhcos-dv
spec:
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
source:
http:
url: {rhcos:openstack:url}
There was a problem hiding this comment.
the big idea I'm trying to work with here is a way to get us where we want to be "consuming officially published container disks" while allowing us to have a stop gap for end users "consume openstack urls for DataVolumes" and a path for development to "unofficially consume container disks"
if we template all of this, then we can meet all those use cases in a way that doesn't make us have to reverse course necessarily.
There was a problem hiding this comment.
I am undecided. At least {rhcos:openstack:url} probably makes sense, but I am not really a fan of http based import. I would prefer to import also into datavolumes via a container disk. The involved raw-conversion and issing integrity checks is something I really don't like about datavolumes in general.
Regarding to
--containerdisk=quay.io/containerdisks/rhcos-{rhcos:major}.{rhcos:minor}:{rhcos:openstack:checksum}
Could you explain what the purpose of the major and minor template parameter is? I can understand checksum because it gives the exactly matching disk. If I want a specific image I can specify it with a clear tag and this would for instance mean that I don't know the version I install, but I assume that specific version a dedicated repo exists.
| // TODO (nargaman): replace with official container image, after RFE-2501 is completed | ||
| // As long as there is no official container image | ||
| // The image must be provided by user | ||
| // Otherwise it must fail | ||
| if o.ContainerDiskImage != "" && o.ContainerDiskRepo != "" { | ||
| return errors.New("only one of \"--containerdisk\" and \"--containerdisk-repository\" can be specified") | ||
| } | ||
| if o.ContainerDiskImage == "" && o.ContainerDiskRepo == "" { | ||
| return errors.New("the container disk image for the Kubevirt machine must be provided by user (\"--containerdisk\" or \"containerdisk-repository\" flag)") | ||
| } | ||
| if o.ContainerDiskRepo != "" { | ||
| o.ContainerDiskImage = o.ContainerDiskRepo + ":{rhcos:version}" | ||
| } |
There was a problem hiding this comment.
maybe we can get away with not needing to specify a new container-disk-repo cli arg at all, and just simply use the templated values in the container-disk string.
There was a problem hiding this comment.
I am unsure. I think that --containerdisk-repository is much clearer than asking people to add various different template paramters. At least when finally dealing with air-gaped installations it is much clearer to just point to the mirror repo. What do you think?
|
@rmohr to summarize our discussion about this issue. We'll move ahead with two templated values That allows us to use the http import immediately and begin using the "unoffical" container disks as well. For the |
b9511a6 to
1b0e13d
Compare
|
✔️ Deploy Preview for hypershift-docs ready! 🔨 Explore the source changes: bd2113f 🔍 Inspect the deploy log: https://app.netlify.com/sites/hypershift-docs/deploys/62276db50826e50007571258 😎 Browse the preview: https://deploy-preview-994--hypershift-docs.netlify.app/how-to/kubevirt-platform |
| // Validate platform specific input. | ||
| var ami string | ||
| if nodePool.Spec.Platform.Type == hyperv1.AWSPlatform { | ||
| rhcosInfo := &rhcosDetails{} |
There was a problem hiding this comment.
@enxebre I guess this would be another interesting point where potentially every provider except none would have to diverge regaring to what exactly to render in the machine template. Did you already think about a structure on how you want this to look like?
I just hacked something in here at the moment.
|
@davidvossel I think the draft should now look like we agreed. Can you take a look? |
davidvossel
left a comment
There was a problem hiding this comment.
@rmohr makes sense, the only thing I question is if we should support default behavior that automatically uses a datavolume+openstack rhcos url when people perform a hypershift create cluster
ideally, i'd like for this just to work out of the box today with hypershift create cluster kubevirt -name my-cluster if possible without needing to know anything about disks.
Do you think using the openstack url + datavolume would be a practical default at this point?
| if opts.KubevirtPlatform.ContainerDiskImage == "" { | ||
| return errors.New("the container disk image for the Kubevirt machine must be provided by user (\"--containerdisk\" flag)") | ||
| if opts.KubevirtPlatform.ContainerDiskImage != "" && opts.KubevirtPlatform.ContainerDiskRepo != "" { | ||
| return errors.New("only one of \"--containerdisk\" and \"--containerdisk-repository\" can be specified") | ||
| } | ||
| if opts.KubevirtPlatform.ContainerDiskImage == "" && opts.KubevirtPlatform.ContainerDiskRepo == "" { | ||
| return errors.New("the container disk image for the Kubevirt machine must be provided by user (\"--containerdisk\" or \"--containerdisk-repository\" flag)") | ||
| } | ||
| if opts.KubevirtPlatform.ContainerDiskRepo != "" { | ||
| opts.KubevirtPlatform.ContainerDiskImage = fmt.Sprintf("%v:%v", opts.KubevirtPlatform.ContainerDiskRepo, util.RHCOSOpenStackChecksumParameter) |
There was a problem hiding this comment.
if neither the container or the repo are provided, can we just default to using the datavolume http import?
1b0e13d to
7aa5d4a
Compare
Makes sense, working on it 👍 |
3ba071a to
10c7d55
Compare
davidvossel
left a comment
There was a problem hiding this comment.
getting close. I just have reservations about the modifications to the nodepool api. Maybe we can get what we're looking for using the hypershift cli with the existing api?
| Disks: []kubevirtv1.Disk{ | ||
| { | ||
| Name: "containervolume", | ||
| Name: util.RHCOSMagicVolumeName, | ||
| DiskDevice: kubevirtv1.DiskDevice{ | ||
| Disk: &kubevirtv1.DiskTarget{ | ||
| Bus: "virtio", |
There was a problem hiding this comment.
I think we need to do the same sort of disk/volume defaulting here as in the api/fixtures/example.go ? It's unclear to me why we have two locations where it looks like the VirtualMachineInstance is specified for the create command.
There was a problem hiding this comment.
I was experimenting here a little bit with the API. The idea is, that people and the hypershift commandline tool are by default no longer specifying the root disk at all. It just gets added automatically. As a fallback, the rhcos volume can be overwritten completely. But instead of having to always completely override it, the RootVolume section is added, similar to AWS, which only exposes size and storage class.
|
|
||
| // RootVolume specifies configuration for the root volume of node instances. | ||
| // | ||
| // +optional | ||
| RootVolume *KubeVirtRootVolume `json:"rootVolume,omitempty"` |
There was a problem hiding this comment.
can we get away with not adding this RootVolume info to the nodepool?
Would it be possible to just have the size and storage class as cmd options that render the NodeTemplate with the expected size/storage class on the datavolumetemplate?
|
|
||
| // RootVolume specifies configuration for the root volume of node instances. | ||
| // | ||
| // +optional | ||
| RootVolume *KubeVirtRootVolume `json:"rootVolume,omitempty"` |
There was a problem hiding this comment.
can we get away with not adding this RootVolume info to the nodepool?
Would it be possible to just have the size and storage class as cmd options that render the NodeTemplate with the expected size/storage class on the datavolumetemplate?
There was a problem hiding this comment.
weird this exact same comment got posted twice. Not sure how that happened.
10c7d55 to
0d3820e
Compare
|
@davidvossel PTAL |
davidvossel
left a comment
There was a problem hiding this comment.
/lgtm
This looks great, maybe it's time to revise the title and description to make it a real non wip pr?
I'll check this code out and give it a run locally at some point soon to make sure the default workflow works as expected with the datavolume support. The kubevirt e2e lanes will verify the container disk flow
| if (opts.KubevirtPlatform.ContainerDiskImage != "" && opts.KubevirtPlatform.ContainerDiskRepo != "") && (opts.KubevirtPlatform.RootVolumeSize > 0 || opts.KubevirtPlatform.RootVolumeStorageClass != "") { | ||
| return errors.New("container disk options (\"--containerdisk*\" flag) can not be used together with customized root volume options (\"--root-volume-*\" flags)") | ||
| } |
There was a problem hiding this comment.
I think this is fine for now. We could technically support this combination in the future if we wanted to though.
davidvossel
left a comment
There was a problem hiding this comment.
/lgtm cancel
found a minor issue during testing.
| } else { | ||
| dataVolume := defaultDataVolume() | ||
| setDataVolumeDefaults(&dataVolume, o) | ||
| exampleTemplate.NodeTemplate.Spec.DataVolumeTemplates = []kubevirtv1.DataVolumeTemplateSpec{dataVolume} | ||
| } | ||
| return exampleTemplate |
There was a problem hiding this comment.
you need to set an entry in the exampleTemplate.NodeTemplate.Spec.Template.Spec.Volumes for this data volume
We need three things
- DataVolumeTemplate
- Volume
- Disk
right now this just lacks the volume part. I discovered this while testing.
|
@rmohr I'm not familiar with the kubervirt images here, but wanted to mention that since 4.10 the RHCOS live-iso image is shipped in the release payload via the |
@hardys could you point me to the images which are in there? Is it identical to the openstack rhcos image? In general, kubevirt would probably have to support a different disk source type to leverage it. Containerdisks are the canonical way to get images in kubevirt and they have a certain layout. |
Prepares the hypershift cli to auto-detect the required rhcos version
from the cluster release payload. Since this is not known at cluster
creation time, template parameters are introduced:
`{rhcos:openstack:checksum}` for container disks, and
`{rhcos:openstack:url}` for datavolume imports.
The hypershift cli is extended the following way:
* By default the hypershift CLI adds now a datavolume which will
http-import the later-on auto-detected rhcos image.
* `--containerdisk-repository` can be specified, to let the cluster
figure out the right rhcos version when containerdisks should be
used. Example:
`--containerdisk-repository=quay.io/containerdisks/rhcos`.
* `--root-volume-size` and `--root-volume-storage-class` can be used to
customize the datavolume.
Finally the nodepool-create and the cluster-create command now share the
same kubevirt template to remove the duplication.
Signed-off-by: Roman Mohr <rmohr@redhat.com>
First the OpenStack RHCOS image, belonging to the current release is
detected. Then, if `{rhcos:openstack:checksum}` or
`{rhcos:openstack:url}` are found in the machine template, they are
replaced with the corresponding detected RHCOS version on machine
creation.
A new `ValidRHCOSImage` condition is added which has the same function
like `ValidAMI` but reports the actual detected RHCOS version. It can
later be used by a broad variety of platforms, not just AWS.
Signed-off-by: Roman Mohr <rmohr@redhat.com>
0d3820e to
bd2113f
Compare
@rmohr I think it only contains the live-iso image currently, with some scripts to either extract the ISO (potentially with modifications performed via coreos-installer) It's not identical to the openstack qcow image, but AFAIK the main difference (other than the ISO/QCOW format) is the default platform that decides which ignition data source is enabled, that could be set via another copy script e.g via Maybe this isn't a good fit for kubervirt, but I thought it worth mentioning, since baremetal IPI recently moved from the openstack qcow image to this release payload ISO image, and it potentially makes lifecycling the bootimage over multiple versions a lot easier. |
|
@rmohr: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
As far as I can see, what we have in common is that we are shipping content in containers. KubeVirt just expects a fully working and configured RHCOS, like for openstack and it can't modify the image on the fly, there is a well established and popular format on the kubevirt side which we want to integrate with. |
|
/hold The openstack url download is not really working. Going now for the more final flow. Sorting out details in coreos/stream-metadata-go#41 |
|
@rmohr: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
@rmohr just doing so housekeeping, please reopen if we still want to pursue this |
|
@enxebre: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
👍 was great working a little bit on hypershift with you. |
What this PR does / why we need it:
The RHCOS version is now automatically detected based on the release payload. In order to leverage this new information, new ways exist to inject a proper RHCOS image into the NodePool:
Full automatic injection
In this case users simply don't specify a root disk on the NodePool teample. Hypershift will figure out the right RHCOS version and inject a DataVolume which imports the correct RHCOS version from the official RHCOS openstack download sources. The new commandline flags
--root-volume-storage-classand--root-volume-sizeflags can be used to fill the newKubeVirtRootVolumeAPI parts of the platform.If a working storage provider is present, a cluster can now be created like this:
or in a more advanced example
ContainerDisk
The checksum is used to cross-reference the
release payload version with the quay.io/containerdisks/rhcos version.
Check for containerDisks or DataVolumeTemplates which are named
rhcosand have in their container source{rhcos:openstack:checksum}set and replace it with the discovered rhcos container version.
As an example:
quay.io/containerdisks/rhcos:{rhcos:openstack:checksum} would be replaced be modified toquay.io/containerdisks/rhcos:mytag, ifmytag`would be discovered in the release payload.
On the
hypershiftclient, a new flag called--containerdisk-repositorywas added to the kubevirt platform which can be pointend to any rhcos repository (e.g.quay.io/containerdisks/rhcos) which contains containerdisks which are tagged with thesha256sumof the openstack qcow2 image of rhcos.A minimal example to make use of this feature is
Customized HTTP DataVolume import
Checks for a DataVolumeTemplates which is named
rhcosand have in its http source{rhcos:openstack:url}set and replace it with the discovered rhcos download url.
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)format, where issue_number might be a GitHub issue, or a Jira story:Fixes #
Checklist