Skip to content

[RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs#1710

Merged
matzew merged 3 commits into
openshift-knative:release-1.25from
skonto:deprecated_apis_1.25
Sep 20, 2022
Merged

[RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs#1710
matzew merged 3 commits into
openshift-knative:release-1.25from
skonto:deprecated_apis_1.25

Conversation

@skonto
Copy link
Copy Markdown
Collaborator

@skonto skonto commented Sep 17, 2022

Proposed Changes

  • Similar to Add DowngradePodDisruptionBudget to modify API version of PodDisruptionBudget #1571 but extending it more and no need to modify existing manifests (just an alternative).

  • If we are on 1.24+ (we want to avoid warnings on 4.11 too):

    a) upgrade: PodDisruptionBudget and HPA (the real problem for the latter will be on 1.26 but let's update anyway to avoid warnings)

    b) in Eventing tests use the latest CronJob

  • This will allow 1.25 to install on 4.12 and then users can upgrade to 1.26 to move further to 4.13 (1.26). In 1.26 we will need to fix HPA also at the Serving branches 1.5+, otherwise autoscaler-hpa will fail too.

  • Without this when we run on 1.25 (4.12) we will get the following (tested upstream operator on minikube since we don't have yet OCP 4.12 with K8s 1.25, see discussion):

$ kubectl get pods -n knative-eventing
NAME                                  READY   STATUS    RESTARTS   AGE
eventing-controller-596fbfcd7-hcsgp   1/1     Running   0          13m

$ kubectl get KnativeEventing knative-eventing -n knative-eventing -oyaml
apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.knative.dev/v1beta1","kind":"KnativeEventing","metadata":{"annotations":{},"name":"knative-eventing","namespace":"knative-eventing"}}
  creationTimestamp: "2022-09-17T21:46:05Z"
  finalizers:
  - knativeeventings.operator.knative.dev
  generation: 1
  name: knative-eventing
  namespace: knative-eventing
  resourceVersion: "1597"
  uid: f27a2183-3dcd-456e-82eb-0438d7acb028
status:
  conditions:
  - lastTransitionTime: "2022-09-17T21:46:05Z"
    status: Unknown
    type: DependenciesInstalled
  - lastTransitionTime: "2022-09-17T21:46:05Z"
    status: Unknown
    type: DeploymentsAvailable
  - lastTransitionTime: "2022-09-17T21:46:22Z"
    message: 'Install failed with message: no matches for kind "PodDisruptionBudget"
      in version "policy/v1beta1"'
    reason: Error
    status: "False"
    type: InstallSucceeded
  - lastTransitionTime: "2022-09-17T21:46:22Z"
    message: 'Install failed with message: no matches for kind "PodDisruptionBudget"
      in version "policy/v1beta1"'
    reason: Error
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-09-17T21:46:05Z"
    status: "True"
    type: VersionMigrationEligible
  observedGeneration: 1

$ oc logs knative-operator-5db994494f-kbk6r | grep error
...
  {"severity":"ERROR","timestamp":"2022-09-17T21:49:18.03001213Z","logger":"knative-operator","caller":"controller/controller.go:566","message":"Reconcile error","knative.dev/pod":"knative-operator-5db994494f-kbk6r","knative.dev/controller":"knative.dev.operator.pkg.reconciler.knativeeventing.Reconciler","knative.dev/kind":"operator.knative.dev.KnativeEventing","knative.dev/traceid":"cf787e6f-0b08-4149-8da4-7157008a5f5e","knative.dev/key":"knative-eventing/knative-eventing","duration":"8.937510645s","error":"failed to apply non rbac manifest: no matches for kind \"PodDisruptionBudget\" in version \"policy/v1beta1\"","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/pkg@v0.0.0-20220601013938-bac16f2394ce/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20220601013938-bac16f2394ce/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20220601013938-bac16f2394ce/controller/controller.go:491"}

@openshift-ci openshift-ci Bot requested review from aliok and mgencur September 17, 2022 22:08
@skonto skonto requested review from matzew, nak3 and pierDipi and removed request for aliok September 17, 2022 22:09
@skonto skonto changed the title [WIP][RELEASE-1.25][SRVCOM-1857] Upgrade deprecated apis [WIP][RELEASE-1.25][SRVCOM-1857] Upgrade deprecated/unavailable apis Sep 17, 2022
@skonto skonto changed the title [WIP][RELEASE-1.25][SRVCOM-1857] Upgrade deprecated/unavailable apis [WIP][RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs Sep 17, 2022
}
}

// CheckMinimumVersion checks if the version in the arg meets the requirement or not.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the requirement is a little hard to understand.
Can we perhaps link to its definition?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah will do it is a relic of the past.

- fs.StringVar(&c.Kubeconfig, "kubeconfig", os.Getenv("KUBECONFIG"),
- "Path to a kubeconfig. Only required if out-of-cluster.")
-
+ if fs.Lookup("kubeconfig") == nil {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required due to double vendoring for controller runtime and go client.

capabilities:
drop:
- all
- ALL
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using all will not work. Warnings still appear. Need capital letters for capabilities.

Comment thread templates/csv.yaml
- all

- ALL
seccompProfile:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik this is available since 1.19 so we should not have issues on previous versions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The policy/v1beta1 API version of PodDisruptionBudget will no longer be served in v1.25.
// The autoscaling/v2beta2 API version of HorizontalPodAutoscaler will no longer be served in v1.26
// TODO: When we move away from releases that bring v1beta1 we can remove this part
if err := CheckMinimumVersion(d, "1.24.0"); err == nil {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking 1.24 instead of 1.25 here to avoid warnings 4.11 and a scenario where restricted policy is enforced.

@skonto skonto changed the title [WIP][RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs [RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs Sep 19, 2022
Comment on lines +56 to +69
case "Deployment":
deployment := &appsv1.Deployment{}
if err := scheme.Scheme.Convert(u, deployment, nil); err != nil {
return fmt.Errorf("failed to convert Unstructured to Deployment: %w", err)
}
obj := deployment
podSpec := &deployment.Spec.Template.Spec
containers := podSpec.Containers
for i := range containers {
setPodSecurityContext(&containers[i])
}
if err := scheme.Scheme.Convert(obj, u, nil); err != nil {
return err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventing needs stateful set support as well.

I can log an issue and I can follow up if you don't want to do it in this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can do it.

Copy link
Copy Markdown
Collaborator Author

@skonto skonto Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I dont see any manifests with StatefulSet.Could you point me to them? Didn't see them in audit reports.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are coming in a future release, I thought this will land on main first.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok wanted to unblock testing that is why I did it on 1.25 first.

// TODO: When we move away from releases that bring v1beta1 we can remove this part
if err := CheckMinimumVersion(d, "1.24.0"); err == nil {
transformers = append(transformers, UpgradePodDisruptionBudget(), UpgradeHorizontalPodAutoscaler(), SetSecurityContextForAdmissionController())
}
Copy link
Copy Markdown
Member

@pierDipi pierDipi Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other case, should we try to do the opposite (downgrading)?

Copy link
Copy Markdown
Collaborator Author

@skonto skonto Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned in the description that this is an alternative. I see upgrading easier because otherwise we will have to patch 1.4 midstream branches (more work). Does not make sense to only patch the downloaded manifests in S-O repo you have to fix the mid stream ones too to be consistent. We can downgrade on main if that works better but dont see any benefits.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a manifest is at v1 (upgraded version) but the k8s version doesn't understand v1 because it was introduced in a future release.

Copy link
Copy Markdown
Collaborator Author

@skonto skonto Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if we have midstream manifests that are already upgraded yes. Do we have any? For Serving we might have in future branches eg 1.5+. But that is a concern for the main patch I guess.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

Two timeouts for not getting a claim another infra failure :(

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

infra issues.

: Run multi-stage test operator-e2e-aws-ocp-411 expand_less | 1h0m0s
-- | --
{  failed to wait for the created cluster claim to become ready: timed out waiting for the condition}

level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

Ok need to avoid setting it for 4.6, I accidentally set it for kube-rbac-proxy always by default. Will fix shortly.

                    {
                        "lastTransitionTime": "2022-09-19T13:42:53Z",
                        "lastUpdateTime": "2022-09-19T13:42:53Z",
                        "message": "pods \"activator-66947549dc-lq4vg\" is forbidden: unable to validate against any security context constraint: [pod.metadata.annotations.container.seccomp.security.alpha.kubernetes.io/kube-rbac-proxy: Forbidden: seccomp may not be set]",
                        "reason": "FailedCreate",
                        "status": "True",
                        "type": "ReplicaFailure"
                    },

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

2 similar comments
@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

Error from server: etcdserver: request timed out

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 19, 2022

@matzew @pierDipi I think we can stamp it.

@matzew
Copy link
Copy Markdown
Member

matzew commented Sep 20, 2022

/test 4.6-upstream-e2e-mesh-aws-ocp-46

@matzew
Copy link
Copy Markdown
Member

matzew commented Sep 20, 2022

@matzew @pierDipi I think we can stamp it.

I am positive too, I have seen 4.6 infra (cluster pool) issues before on my others. 4.11 works fine...

I gave it one more retry

@matzew
Copy link
Copy Markdown
Member

matzew commented Sep 20, 2022

INFO[2022-09-20T06:26:35Z] Releasing cluster claims for test upstream-e2e-mesh-aws-ocp-46 

/lgtm
/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Sep 20, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matzew, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +520 to +527
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we see if we can land those on upstream? Or directly on midstream first ?

Copy link
Copy Markdown
Collaborator Author

@skonto skonto Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1.26 yes that is also the goal for 1.25 we want to move fast. That is why I didnt start with main branch.
Upstream there is a ticket for fixing stuff for 1.8 let's see what we can port back.

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 75853f5 and 2 for PR HEAD 0f22fbd in total

@matzew
Copy link
Copy Markdown
Member

matzew commented Sep 20, 2022

@skonto

Is this known, or infra (b/c of timeout)?:

    service.go:55: Error creating ksvc: Post "https://api.ci-ocp-4-11-amd64-aws-us-east-1-kmznv.hive.aws.ci.openshift.org:6443/apis/serving.knative.dev/v1/namespaces/serverless-tests/services": dial tcp: lookup api.ci-ocp-4-11-amd64-aws-us-east-1-kmznv.hive.aws.ci.openshift.org on 172.30.0.10:53: read udp 10.130.128.78:36372->172.30.0.10:53: i/o timeout

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 20, 2022

It is infra afaik, cluster access times out occasionally.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Sep 20, 2022

@skonto: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.6-upstream-e2e-mesh-aws-ocp-46 0f22fbd link false /test 4.6-upstream-e2e-mesh-aws-ocp-46
ci/prow/4.11-operator-e2e-aws-ocp-411 0f22fbd link true /test 4.11-operator-e2e-aws-ocp-411

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Sep 20, 2022

/retest

@matzew matzew merged commit 0c21095 into openshift-knative:release-1.25 Sep 20, 2022
openshift-merge-robot pushed a commit that referenced this pull request Oct 26, 2022
* [RELEASE-1.25][SRVCOM-1857] Upgrade unavailable/deprecated K8s APIs (#1710)

* upgrade from deprecated apis

* psec

* fix rbac proxy

* remove seccomprofile (#1723)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants