Skip to content

OCPBUGS-61175: Let cluster-storage-operator to grant NetworkPolicies#6930

Draft
mpatlasov wants to merge 1 commit intoopenshift:mainfrom
mpatlasov:OCPBUGS-61175-NetworkPolicies-RBAC-for-cso-in-hostedcontrolplane
Draft

OCPBUGS-61175: Let cluster-storage-operator to grant NetworkPolicies#6930
mpatlasov wants to merge 1 commit intoopenshift:mainfrom
mpatlasov:OCPBUGS-61175-NetworkPolicies-RBAC-for-cso-in-hostedcontrolplane

Conversation

@mpatlasov
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Manila csi driver controller pods reside in a custome namespace (openshift-manila-csi-driver). Hence, Manila csi driver operator must create NetworkPolicy for them expicitly. It is cluster-storage-operator who is legit to grant permissions for that to Manila csi driver operator. But to do it, cluster-storage-operator itself must have proper permissions.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 3, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mpatlasov mpatlasov changed the title Let cluster-storage-operator to grant NetworkPolicies OCPBUGS-61175: Let cluster-storage-operator to grant NetworkPolicies Oct 3, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 3, 2025
@openshift-ci-robot
Copy link
Copy Markdown

@mpatlasov: This pull request references Jira Issue OCPBUGS-61175, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

Manila csi driver controller pods reside in a custome namespace (openshift-manila-csi-driver). Hence, Manila csi driver operator must create NetworkPolicy for them expicitly. It is cluster-storage-operator who is legit to grant permissions for that to Manila csi driver operator. But to do it, cluster-storage-operator itself must have proper permissions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels Oct 3, 2025
@mpatlasov
Copy link
Copy Markdown
Contributor Author

/test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mpatlasov
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Oct 3, 2025

Test Results

e2e-aks

Failed Tests

Total failed tests: 16

  • TestAutoscaling
  • TestAutoscaling/ValidateHostedCluster
  • TestAzureScheduler
  • TestAzureScheduler/ValidateHostedCluster
  • TestCreateCluster

... and 11 more failed tests

e2e-aws

Failed Tests

Total failed tests: 22

  • TestAutoscaling
  • TestAutoscaling/ValidateHostedCluster
  • TestCreateCluster
  • TestCreateCluster/ValidateHostedCluster
  • TestCreateClusterCustomConfig

... and 17 more failed tests

@mpatlasov
Copy link
Copy Markdown
Contributor Author

/retest-required

@mpatlasov mpatlasov force-pushed the OCPBUGS-61175-NetworkPolicies-RBAC-for-cso-in-hostedcontrolplane branch from 691b297 to a9a15ba Compare October 4, 2025 00:28
@mpatlasov
Copy link
Copy Markdown
Contributor Author

/test all

…cies

Manila csi driver controller pods reside in a custome namespace (`openshift-manila-csi-driver`). Hence, Manila csi driver operator must
create NetworkPolicy for them expicitly. It is cluster-storage-operator who is legit to grant permissions for that to Manila csi driver
operator. But to do it, cluster-storage-operator itself must have proper permissions.
@mpatlasov mpatlasov force-pushed the OCPBUGS-61175-NetworkPolicies-RBAC-for-cso-in-hostedcontrolplane branch from a9a15ba to 122485a Compare October 6, 2025 01:52
@mpatlasov
Copy link
Copy Markdown
Contributor Author

/test all

@mpatlasov
Copy link
Copy Markdown
Contributor Author

/retest-required

1 similar comment
@mpatlasov
Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 10, 2026

@mpatlasov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 122485a link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-kubevirt-aws-ovn-reduced 122485a link true /test e2e-kubevirt-aws-ovn-reduced
ci/prow/e2e-aws-upgrade-hypershift-operator 122485a link true /test e2e-aws-upgrade-hypershift-operator
ci/prow/e2e-aws 122485a link true /test e2e-aws
ci/prow/e2e-aks 122485a link true /test e2e-aks
ci/prow/e2e-aks-4-21 122485a link true /test e2e-aks-4-21
ci/prow/unit 122485a link true /test unit
ci/prow/e2e-azure-self-managed 122485a link true /test e2e-azure-self-managed

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown

Stale PRs are closed after 21d of inactivity.

If this PR is still relevant, comment to refresh it or remove the stale label.
Mark the PR as fresh by commenting /remove-lifecycle stale.

If this PR is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2026
@hypershift-jira-solve-ci
Copy link
Copy Markdown

Single commit PR — all runs tested the same commit 122485a. The PR updated fixtures for: default (root), IBMCloud, and TechPreviewNoUpgrade — but missed GCP and AROSwift platform-specific fixtures. This is the root cause of the unit test failure, and is a code defect in the PR itself.

Now let me produce the final report:

Test Failure Analysis Complete

Job Information

  • Prow Jobs: 7 failing checks on PR #6930 (OCPBUGS-61175: Let cluster-storage-operator to grant NetworkPolicies)
  • Build IDs: 1975661416908066816 (e2e-aks), 1975661416954204160 (e2e-aws), 1975661417012924416 (e2e-aws-upgrade), 1975661417046478848 (e2e-kubevirt), 2031419706811879424 (e2e-azure-self-managed)
  • PR Commit: 122485a9f851f20d3a7173b19ae306f6a2088c68
  • PR Author: mpatlasov
  • PR Status: WIP / Stale (open since 2025-10-06)

Test Failure Analysis

Error

--- FAIL: TestControlPlaneComponents (43.76s)
    hostedcontrolplane_controller_test.go:1286: got diff between expected and actual result:
        file: .../testdata/cluster-storage-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml
        diff:
        + 	- apiGroups:
        + 	  - networking.k8s.io
        + 	  resources:
        + 	  - networkpolicies
        + 	  verbs:
        + 	  - watch, list, get, create, delete, patch, update

    hostedcontrolplane_controller_test.go:1286: got diff between expected and actual result:
        file: .../testdata/cluster-storage-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml
        (same diff as above)

FAIL	github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane	55.305s

Summary

The PR adds networking.k8s.io/networkpolicies RBAC permissions to the cluster-storage-operator role, updating the source role definition and 3 of 5 test fixtures. However, it missed updating 2 platform-specific test fixtures (GCP and AROSwift), causing TestControlPlaneComponents to fail. The 4 older e2e jobs (e2e-aws, e2e-aks, e2e-aws-upgrade-hypershift-operator, e2e-kubevirt-aws-ovn-reduced) have expired artifacts and cannot be analyzed, but they ran against the same commit. The e2e-azure-self-managed job failed due to CI infrastructure resource exhaustion (Azure quota lease timeout), not a code issue. The 2 Konflux enterprise-contract checks are stale failures from October 2025 that have not been re-run.

Root Cause

The PR modifies control-plane-operator/controllers/hostedcontrolplane/v2/assets/cluster-storage-operator/role.yaml to add NetworkPolicy RBAC permissions. The TestControlPlaneComponents test compares rendered manifests against golden fixture files stored per platform variant. The PR updated 3 fixtures:

  1. testdata/cluster-storage-operator/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml (default)
  2. testdata/cluster-storage-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml
  3. testdata/cluster-storage-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml

But missed 2 platform-specific fixtures:

  1. testdata/cluster-storage-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml
  2. testdata/cluster-storage-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml

The test output explicitly shows the diff — the new networking.k8s.io/networkpolicies rule block is present in the rendered output but absent from the GCP and AROSwift fixture files, causing a golden-file mismatch.

Regarding the other failures:

  • e2e-azure-self-managed (build 2031419706811879424): Failed with failed to acquire lease for "hypershift-azure-quota-slice": resources not found after waiting the full 2-hour timeout. All 20 Azure quota slices were in use. This is a CI infrastructure resource contention issue — the test never ran.
  • e2e-aws, e2e-aks, e2e-aws-upgrade-hypershift-operator, e2e-kubevirt-aws-ovn-reduced (builds 1975661416*): GCS artifacts have been garbage-collected (jobs ran ~6 months ago). Specific failure reasons are unrecoverable, but they ran against the same commit with the fixture bug.
  • Konflux enterprise-contract (2 checks): Ran on 2025-10-06 and have not been re-triggered since. These are stale and unrelated to the code change.
Recommendations
  1. Update the missing test fixtures — Run UPDATE=true go test ./control-plane-operator/controllers/hostedcontrolplane/... locally to regenerate all fixture files, as the test output itself suggests. This will update the GCP and AROSwift fixtures with the new NetworkPolicy permissions block.

  2. Re-trigger all CI jobs — After pushing the fixture fix, run /retest to get fresh results on all e2e jobs. The expired jobs need re-execution regardless.

  3. Remove WIP label — The PR is marked do-not-merge/work-in-progress and lifecycle/stale. Once fixtures are fixed and tests pass, remove the WIP designation to enable merge.

  4. Konflux checks will re-run automatically on the next push, resolving the stale enterprise-contract failures.

Evidence
Evidence Detail
Failing test TestControlPlaneComponents in control-plane-operator/controllers/hostedcontrolplane
Unit test build log gs://test-platform-results/.../pull-ci-openshift-hypershift-main-unit/2031419711840849920/build-log.txt"result":"FAILURE"
Missing fixture (GCP) testdata/cluster-storage-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml — missing networking.k8s.io/networkpolicies rule
Missing fixture (AROSwift) testdata/cluster-storage-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_storage_operator_role.yaml — missing networking.k8s.io/networkpolicies rule
PR files changed 4 files: role.yaml + 3 fixtures (default, IBMCloud, TechPreviewNoUpgrade) — GCP and AROSwift omitted
e2e-azure-self-managed failed to acquire lease for "hypershift-azure-quota-slice": resources not found — infrastructure resource exhaustion, not code-related
e2e-aws, e2e-aks, e2e-aws-upgrade, e2e-kubevirt GCS artifacts expired (builds from ~6 months ago) — failure reason unrecoverable
Konflux checks Stale from 2025-10-06, never re-triggered — not code-related
PR commit Single commit 122485a9f851f20d3a7173b19ae306f6a2088c68 tested across all runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants