[occm] Tag load balancers with cluster identity to prevent name collisions#3103
[occm] Tag load balancers with cluster identity to prevent name collisions#3103enginrect wants to merge 1 commit intokubernetes:masterfrom
Conversation
…sions OCCM constructs Octavia load balancer names as kube_service_<cluster-name>_<namespace>_<service>. When two Kubernetes clusters share the same OpenStack project and use the same --cluster-name (default "kubernetes"), services with identical namespace/name produce identical load balancer names. Octavia does not enforce uniqueness on load balancer names, so OCCM's first-time name-based lookup can adopt and overwrite a load balancer that actually belongs to a different cluster (see issues kubernetes#2241, kubernetes#2571, kubernetes#2624). This commit adds a stable Kubernetes cluster identifier - the UID of the kube-system namespace - as a load balancer tag of the form kube_cluster_id_<uid>. getLoadbalancerByName now ignores load balancers that carry a cluster-id tag for a different cluster and falls back to the legacy behaviour for load balancers without any cluster-id tag, so existing deployments keep working unchanged. Pre-existing load balancers gain the new tag during the next reconciliation. The cluster UID is read once at controller-manager start-up via the kube-system namespace; failure to read it (RBAC denial, missing namespace) is non-fatal and disables the safeguard, falling back to legacy name-based lookup. The cloud-controller-manager ClusterRole and the helm chart gain "get" on namespaces. Made-with: Cursor
|
|
|
Welcome @enginrect! |
|
Hi @enginrect. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @kayrus @stephenfin @zetaab — first-time contributor here. This PR addresses the long-standing cross-cluster LB collision issue (refs #2241, #2571, #2624) with an additive, backward-compatible Could one of you take a look and add Thanks! |
What this PR does / why we need it:
OCCM identifies an existing Octavia load balancer for a Service by name on
the first reconcile (via
getLoadbalancerByName). The name formatkube_service_<cluster-name>_<namespace>_<service>defaults to a<cluster-name>ofkubernetes, so two Kubernetes clusters in the sameOpenStack project that happen to use the default cluster-name and have
Services with identical namespace/name produce identical load balancer
names. Octavia does not enforce uniqueness of names, so OCCM in cluster B
ends up adopting and overwriting cluster A's load balancer. This has been
reported repeatedly (see #2241, #2571, #2624) and the standing guidance
"set a unique
--cluster-name" is correct but does not actually defendagainst the failure mode.
This PR adds a stable Kubernetes cluster identifier - the UID of the
kube-systemnamespace - as a load balancer tag of the formkube_cluster_id_<uid>. Lookup behaviour:kube_cluster_id_<our-uid>tag are kept.kube_cluster_id_*tag fall back to the legacybehaviour (preserves existing deployments and externally-created LBs).
kube_cluster_id_*tags are treated asNotFound, with a warning. OCCM will then create its own load balancer
rather than overwriting one that belongs to another cluster.
The cluster UID is read once at controller-manager start-up. If the
lookup fails (RBAC denial, missing namespace, etc.) the safeguard is
disabled and OCCM falls back to the legacy name-based behaviour, so the
change is strictly additive. Pre-existing load balancers also gain the
kube_cluster_id_*tag during the next reconciliation.Which issue this PR fixes(if applicable):
fixes #3102
Special notes for reviewers:
kube_cluster_id_*tag keep the previousbehaviour. They are tagged on the next successful reconcile.
loadbalancer.openstack.org/load-balancer-idannotation (i.e. onevery reconcile after the first one) go through
GetLoadbalancerByID,which is unaffected.
getonnamespacesis added to both the manifestClusterRole (
manifests/controller-manager/cloud-controller-manager-roles.yaml)and the helm chart (
charts/openstack-cloud-controller-manager/templates/clusterrole.yaml).If the verb is unavailable the safeguard simply degrades to the legacy
behaviour with a warning log; OCCM does not refuse to start.
already gated by
svcConf.supportLBTagsand behaves as before on olderclouds.
TestFilterLoadBalancersByClusterIDcovers the matching, legacy,foreign-only, and mixed cases.
TestFetchClusterUIDcovers happy path and graceful degradation(missing namespace, forbidden) with a fake clientset.
How to verify manually:
go test ./pkg/openstack/...A reproduction of the original failure mode (two clusters in the same
project, same
--cluster-name, same Service ns/name) is described in#3102.
Release note: