update catalog image unpacking process#2
update catalog image unpacking process#2anik120 merged 3 commits intooperator-framework:mainfrom everettraven:feature/processing-speedup
Conversation
to use a Job resource to unpack image contents in parallel. This increases the speed in which catalogsources are processed and children resources are created. Signed-off-by: Bryce Palmer <everettraven@gmail.com>
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // TODO: Can we create these resources in parallel using goroutines? |
There was a problem hiding this comment.
Wasn't entirely sure if it would be worth exploring this so figured I would pose this as question - Should we explore this route?
There was a problem hiding this comment.
Did some testing regarding this, and splitting the Package and BundleMetadata creation into 2 goroutines only resulted in an ~6 second increase in resource creation. I suspect that breaking the BundleMetadata creations into multiple goroutines we may make a more significant gain since it is the larger portion of creation requests.
There was a problem hiding this comment.
After attempting to break the BundleMetadata CR creation into multiple goroutines, it appears that did not result in any extra speed increase.
Due to these brief findings, I think using coroutines is adding additional complexity with minimal benefit and is probably best to leave out for now.
anik120
left a comment
There was a problem hiding this comment.
Great work on this Bryce 🎉
I left some comments, and the only comment about the PR structure I have is that when there's a PR that requires a bunch of related changes, it's better for reviewing if there are multiple commits, instead of one commit carrying of all the changes.
For example, instead of having this section in the PR commit message's body:
A brief overview of the changes:
A couple new functions in catalogsource_controller.go and removal of the containerdregistry approach
go mod tidy related updates
Introduction of a new UnpackPhaseError type that is used to recognize when a Job pod is not yet in the Succeeded state.
A couple manifest updates to ensure the controller has the proper permissions to run as expected
A couple helper scripts for testing:
reset.sh deletes and recreates a KinD cluster and then applies the necessary manifests to get everything up and running
timetest.sh can be used by running time ./timetest.sh to evaluate the time it takes for the sample CatalogSource CR to have all its child resources created on the cluster.
If the commit is divided into several commits that are logical units of the changes, maybe something like
commit 1: update dependencies (go mod changes): commit message explains why this is needed
commit 2: code changes to controller (including RBAC changes)
commit 3: test/bootstrap scripts
then it's easier to review commit by commit + you get more of an opportunity to present the logical units of your changes in the commits themselves rather than needing to explain them in a comment in the PR (which for eg doesn't show up if you do git blame).
| - resources: | ||
| - pods | ||
| verbs: | ||
| - create | ||
| - delete | ||
| - get | ||
| - list | ||
| - patch | ||
| - update | ||
| - watch | ||
| - apiGroups: | ||
| - batch | ||
| resources: | ||
| - jobs | ||
| verbs: | ||
| - create | ||
| - delete | ||
| - get | ||
| - list | ||
| - patch | ||
| - update | ||
| - watch | ||
| - apiGroups: | ||
| - batch | ||
| resources: | ||
| - pods/logs | ||
| verbs: | ||
| - create | ||
| - delete | ||
| - get | ||
| - list | ||
| - patch | ||
| - update | ||
| - watch |
There was a problem hiding this comment.
Do we need ALL of these? I.e it's better we start with the minimal set of required permission and then work our way up to everything we need
There was a problem hiding this comment.
Good point. Just a side note, this file isn't even used in the workflow for spinning this PoC up - I'll just revert these changes for now.
| resources: | ||
| - 'jobs' | ||
| verbs: | ||
| - '*' |
There was a problem hiding this comment.
Probably should just start off with just create
There was a problem hiding this comment.
Set verbs to create, get, list, watch in 85b94cb
| Cfg *rest.Config | ||
| } | ||
|
|
||
| const opmImage = "quay.io/operator-framework/opm:v1.26" |
There was a problem hiding this comment.
This should ideally be configurable via a flag to the binary that is then parsed, instead of hardcoding it in code.
There was a problem hiding this comment.
Made it configurable via a flag in the controller manager binary in commit f7e8156
| ## Loop until all packages are created (170) | ||
| packages=0 | ||
|
|
||
| until [ $packages -eq 170 ] |
There was a problem hiding this comment.
A nit but good hygiene to include a timeout here. Even though we're using a tagged image (that I own), if we use this concept of testing in the implementation repo where we'll have more official images to test, we must include a timeout since the underlying content of these images can always change (unless we use image shas to pin these down).
There was a problem hiding this comment.
Fair point. This script was really just thrown together for verifying how fast the resources were created. I'm honestly not sure how useful it is on it's own so I'll probably just remove it altogether.
| @@ -0,0 +1,6 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Any reason this is not just a makefile target?
There was a problem hiding this comment.
Honestly forgot this was included in the PR and was more so meant as an easy way for me to reset the kind cluster. If this is something we think would be useful I can create a Makefile target.
Otherwise I can just remove this script altogether.
There was a problem hiding this comment.
That's fine. We'll need to do a makefile targets clean up anyway but that can be punted for when we're in the "not PoC anymore" stage.
Signed-off-by: Bryce Palmer <everettraven@gmail.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
…ster (#209) * Add Ingress overlay to access `Catalog` contents outside of the cluster Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com> * Address review feedback Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com> * Address review feedback #1 Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com> * Address feedback #2 Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com> --------- Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com>
Description:
Updates the catalog image unpacking process to use a
Jobfor unpacking the image contents to STDOUT usingopm render <catalog-image>and then scraping the pod logs to use during the reconciliation process and creation of child resources.A brief overview of the changes:
catalogsource_controller.goand removal of the containerdregistry approachgo mod tidyrelated updatesUnpackPhaseErrortype that is used to recognize when a Job pod is not yet in the Succeeded state.reset.shdeletes and recreates a KinD cluster and then applies the necessary manifests to get everything up and runningtimetest.shcan be used by runningtime ./timetest.shto evaluate the time it takes for the sample CatalogSource CR to have all its child resources created on the cluster.Motivations:
Additional Context:
Both the
docker.io/anik120/rukpak-packageserver:latestanddocker.io/anik120/catalogsource-controller:latestimages have been updated with these changes.