Skip to content

[occm] Tag load balancers with cluster identity to prevent name collisions#3103

Open
enginrect wants to merge 1 commit intokubernetes:masterfrom
enginrect:occm-cluster-id-tag
Open

[occm] Tag load balancers with cluster identity to prevent name collisions#3103
enginrect wants to merge 1 commit intokubernetes:masterfrom
enginrect:occm-cluster-id-tag

Conversation

@enginrect
Copy link
Copy Markdown

What this PR does / why we need it:

OCCM identifies an existing Octavia load balancer for a Service by name on
the first reconcile (via getLoadbalancerByName). The name format
kube_service_<cluster-name>_<namespace>_<service> defaults to a
<cluster-name> of kubernetes, so two Kubernetes clusters in the same
OpenStack project that happen to use the default cluster-name and have
Services with identical namespace/name produce identical load balancer
names. Octavia does not enforce uniqueness of names, so OCCM in cluster B
ends up adopting and overwriting cluster A's load balancer. This has been
reported repeatedly (see #2241, #2571, #2624) and the standing guidance
"set a unique --cluster-name" is correct but does not actually defend
against the failure mode.

This PR adds a stable Kubernetes cluster identifier - the UID of the
kube-system namespace - as a load balancer tag of the form
kube_cluster_id_<uid>. Lookup behaviour:

  • LBs that carry the matching kube_cluster_id_<our-uid> tag are kept.
  • LBs that carry no kube_cluster_id_* tag fall back to the legacy
    behaviour (preserves existing deployments and externally-created LBs).
  • LBs that carry only foreign kube_cluster_id_* tags are treated as
    NotFound, with a warning. OCCM will then create its own load balancer
    rather than overwriting one that belongs to another cluster.

The cluster UID is read once at controller-manager start-up. If the
lookup fails (RBAC denial, missing namespace, etc.) the safeguard is
disabled and OCCM falls back to the legacy name-based behaviour, so the
change is strictly additive. Pre-existing load balancers also gain the
kube_cluster_id_* tag during the next reconciliation.

Which issue this PR fixes(if applicable):
fixes #3102

Special notes for reviewers:

  • Backward compatibility:
    • Load balancers without any kube_cluster_id_* tag keep the previous
      behaviour. They are tagged on the next successful reconcile.
    • Load balancers looked up via the existing
      loadbalancer.openstack.org/load-balancer-id annotation (i.e. on
      every reconcile after the first one) go through GetLoadbalancerByID,
      which is unaffected.
  • New RBAC: get on namespaces is added to both the manifest
    ClusterRole (manifests/controller-manager/cloud-controller-manager-roles.yaml)
    and the helm chart (charts/openstack-cloud-controller-manager/templates/clusterrole.yaml).
    If the verb is unavailable the safeguard simply degrades to the legacy
    behaviour with a warning log; OCCM does not refuse to start.
  • Octavia API >= v2.5 (Stein) is required for the tag feature. This is
    already gated by svcConf.supportLBTags and behaves as before on older
    clouds.
  • New unit tests:
    • TestFilterLoadBalancersByClusterID covers the matching, legacy,
      foreign-only, and mixed cases.
    • TestFetchClusterUID covers happy path and graceful degradation
      (missing namespace, forbidden) with a fake clientset.

How to verify manually:

go test ./pkg/openstack/...

A reproduction of the original failure mode (two clusters in the same
project, same --cluster-name, same Service ns/name) is described in
#3102.

Release note:

[openstack-cloud-controller-manager] Octavia load balancers now carry a
stable cluster-identity tag (`kube_cluster_id_<kube-system-uid>`) so OCCM
will no longer adopt a load balancer that belongs to a different
Kubernetes cluster sharing the same OpenStack project, even when the load
balancer name collides. Pre-existing load balancers gain the tag on the
next reconcile; load balancers without the tag keep the previous
behaviour. The cloud-controller-manager ClusterRole gains `get` on
`namespaces`.

…sions

OCCM constructs Octavia load balancer names as
kube_service_<cluster-name>_<namespace>_<service>. When two Kubernetes
clusters share the same OpenStack project and use the same
--cluster-name (default "kubernetes"), services with identical
namespace/name produce identical load balancer names. Octavia does not
enforce uniqueness on load balancer names, so OCCM's first-time
name-based lookup can adopt and overwrite a load balancer that actually
belongs to a different cluster (see issues kubernetes#2241, kubernetes#2571, kubernetes#2624).

This commit adds a stable Kubernetes cluster identifier - the UID of
the kube-system namespace - as a load balancer tag of the form
kube_cluster_id_<uid>. getLoadbalancerByName now ignores load balancers
that carry a cluster-id tag for a different cluster and falls back to
the legacy behaviour for load balancers without any cluster-id tag, so
existing deployments keep working unchanged. Pre-existing load
balancers gain the new tag during the next reconciliation.

The cluster UID is read once at controller-manager start-up via the
kube-system namespace; failure to read it (RBAC denial, missing
namespace) is non-fatal and disables the safeguard, falling back to
legacy name-based lookup. The cloud-controller-manager ClusterRole and
the helm chart gain "get" on namespaces.

Made-with: Cursor
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 30, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 30, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: enginrect / name: enginrect (123ffe4)

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @enginrect!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @enginrect. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kayrus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 30, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 30, 2026
@enginrect
Copy link
Copy Markdown
Author

Hi @kayrus @stephenfin @zetaab — first-time contributor here. This PR addresses the long-standing cross-cluster LB collision issue (refs #2241, #2571, #2624) with an additive, backward-compatible kube_cluster_id_<kube-system-uid> tag on Octavia load balancers. Lookups now reject LBs tagged for a different cluster, fall back to legacy behaviour for untagged LBs, and tag pre-existing LBs on the next reconcile. The safeguard degrades gracefully (warning log + legacy behaviour) if the new get on namespaces RBAC is not granted, so the change is strictly additive.

Could one of you take a look and add /ok-to-test when convenient? The failing "Lint Charts" check is unrelated to this PR — it is a pre-existing repository-policy issue on master where the workflow uses unpinned action tags, and it currently fails on every PR.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[occm] Cross-cluster load balancer name collision when multiple Kubernetes clusters share an OpenStack project

2 participants