Skip to content

update the clusteridentifier in the field for helm values!#345

Open
Rupam-It wants to merge 16 commits into
mainfrom
rupam/cluster_identifier_token
Open

update the clusteridentifier in the field for helm values!#345
Rupam-It wants to merge 16 commits into
mainfrom
rupam/cluster_identifier_token

Conversation

@Rupam-It
Copy link
Copy Markdown
Contributor

@Rupam-It Rupam-It commented Apr 1, 2026

By default, every time you install or reinstall ZXPorter, it calls CreateClusterToken on the DevZero platform — which creates a brand new
cluster entry
. This means if you uninstall and reinstall ZXPorter on the same physical cluster, the DevZero dashboard shows it as a different
cluster.
Cluster Identity solves this. By giving your cluster a stable identifier, ZXPorter calls ReattachCluster instead, which finds or
creates
a cluster by that identifier. Reinstalling ZXPorter always connects back to the same cluster entry in the DevZero platform.

for more insight : clusterIdentifier


Summary by Gitar

  • Refactoring:
    • Introduced resolveNamespace() helper to centralize and DRY up namespace detection across controller methods.
  • Storage cleanup:
    • Removed identifier parameter from persistClusterToken, persistClusterTokenToConfigMap, and persistClusterTokenToSecret as cluster identifiers are now exclusively managed via dedicated identity secrets.

This will update automatically on new commits.

Comment thread helm-chart/zxporter/templates/configmap.yaml Outdated
Comment thread internal/transport/dakr_client.go Outdated
@Rupam-It Rupam-It changed the title make changes in helm value and env variable and then do make changes … update the clusteridentifier in the field for helm values! Apr 6, 2026
@Rupam-It Rupam-It marked this pull request as ready for review April 6, 2026 14:29
Comment thread internal/controller/custom.go Outdated
@Rupam-It Rupam-It force-pushed the rupam/cluster_identifier_token branch from 5f42d45 to d8796bc Compare April 23, 2026 08:05
@Rupam-It Rupam-It force-pushed the rupam/cluster_identifier_token branch from 90b3942 to 2ab6d9a Compare April 23, 2026 16:39
Comment thread docs/cluster-identity.md
@@ -0,0 +1,198 @@
# Cluster Identity: Stable Cluster Identification Across Reinstalls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

Comment thread api/v1/collectionpolicy_types.go Outdated
// If ClusterToken is not provided but PATToken is, the system will exchange it for a cluster token
PATToken string `json:"patToken,omitempty"`

// ClusterIdentifier is an optional stable DNS-label identifier for the cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if stable dns label or uuid?

}
tempClient := transport.NewDakrClient(dakrURL, "", c.Log)

// Always use ReattachCluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ReattachCluster fails (any error other than CodeNotFound-with-identifier), the code logs and falls through with an empty ClusterToken. The old ExchangePATForClusterToken
path did return fmt.Errorf(...) on failure. This is a regression — the operator will start in a broken state instead of failing loudly.


// Resolve clusterIdentifier from the well-known cluster identity Secret.
// The secret name is fixed — users never need to configure it.
identifier, secretErr := c.readClusterIdentifierFromSecret(ctx, clusterIdentitySecretName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth looking into this from this angle maybe:

// Secret exists but key is empty → fatal
return "", fmt.Errorf("cluster identity Secret %q exists but CLUSTER_IDENTIFIER key is missing or empty...")

This is called before anything else in initializeTelemetryComponents, and the error is returned immediately (return secretErr). If the Secret was created manually with a typo,
or the key was accidentally cleared, the operator refuses to start entirely. Given the operator is the sole writer of this Secret, this error case is an operational footgun. A
warning + clearing the in-memory state and continuing (letting the operator re-register) would be safer.

Comment thread internal/controller/custom.go Outdated
if dakrURL == "" {
dakrURL = "https://dakr.devzero.io"
}
tempClient := transport.NewDakrClient(dakrURL, "", c.Log)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why move away from factory into tnrasport?

Comment thread internal/transport/dakr_client.go Outdated
// ReattachCluster registers or reattaches a cluster, returning (token, clusterIdentifier, error).
// Pass clusterIdentifier=nil on the first call; the backend assigns and returns a UUID.
// Pass clusterIdentifier=&uuid on subsequent calls to reattach the same cluster.
func (c *RealDakrClient) ReattachCluster(ctx context.Context, patToken string, clusterIdentifier *string, clusterName, k8sProvider string) (string, string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one thing worth keeping in mind:

If ReattachCluster gets CodeNotFound, retries with nil, gets a new UUID, then crashes before persistClusterIdentifierToIdentitySecret completes — on the next restart the
operator reads the old UUID from the identity Secret, hits CodeNotFound again, and creates yet another new cluster. With enough restarts this accumulates orphan cluster
entries. This is an inherent at-least-once delivery trade-off, but the crash window is wide (several lines of code including a network call between clearing the identifier and
persisting the new one)

not sure if we can do somethiung about it or not.

) (string, *gen.ClusterSnapshot, error)
// telemetry_logger.TelemetryLogSender sends a batch of log entries to Dakr
telemetry_logger.TelemetryLogSender
// ExchangePATForClusterToken exchanges a PAT token for a cluster token
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becaues this was removed whole ExchangePATForClusterToken if we still have it in zxporter somewhere or code around it is dead code? maybe worth checking and removing (only from zxporter not from services)

@@ -0,0 +1,4 @@
# The cluster identity Secret is created and managed automatically by the operator at runtime.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm empty file with just comment to be rendered with k8s? not sure that is good idea?

Copy link
Copy Markdown
Contributor

@Tzvonimir Tzvonimir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes + lint issue + test failing

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 8, 2026

Code Review ✅ Approved 4 resolved / 4 findings

Implements cluster identity persistence via dedicated secrets to ensure stable cluster reconnections. Resolved redundant namespace logic and cluster identifier management inconsistencies.

✅ 4 resolved
Bug: ReattachCluster is never called despite documentation claiming it

📄 internal/controller/custom.go:327-341 📄 internal/util/env.go:259-262 📄 helm-chart/zxporter/values.yaml:32-35
The comments in env.go:260 and values.yaml:33 state that when ClusterIdentifier is set alongside PATToken, ReattachCluster is called instead of ExchangePATForClusterToken. However, the PAT exchange logic in custom.go (the else branch around line 328) still unconditionally calls ExchangePATForClusterToken regardless of whether ClusterIdentifier is set. Furthermore, ReattachCluster does not appear to exist in the DakrClient interface. This means the documented behavior is not implemented — the identifier is persisted and recovered, but never actually used during token exchange.

Bug: CLUSTER_IDENTIFIER not guarded by useSecretForToken condition

📄 helm-chart/zxporter/templates/configmap.yaml:4-7
The new CLUSTER_IDENTIFIER entry in the ConfigMap template is rendered unconditionally, while CLUSTER_TOKEN and PAT_TOKEN are both gated behind {{- if not .Values.zxporter.useSecretForToken }}. When useSecretForToken is true, CLUSTER_IDENTIFIER will still be written to the ConfigMap (as an empty string), which is inconsistent — tokens are in a Secret but the identifier leaks into the ConfigMap. It should follow the same conditional pattern as the other token-related fields.

Quality: Duplicated header-setting logic in ReattachCluster

📄 internal/transport/dakr_client.go:653-665
The new ReattachCluster method (lines 653-665) copies the exact same header-setting block from ExchangePATForClusterToken (lines 618-631): Authorization, HeaderClient, HeaderOperatorType, HeaderOperatorVersion, HeaderOperatorGitSHA. This duplication means any future header change must be updated in both places, risking drift.

Quality: Namespace resolution logic duplicated 6+ times without helper

📄 internal/controller/custom.go:749-756 📄 internal/controller/custom.go:786-793
The POD_NAMESPACE resolution pattern (env var → serviceaccount file → hardcoded default) is copy-pasted in readClusterIdentifierFromSecret, persistClusterIdentifierToIdentitySecret, persistClusterTokenToConfigMap, persistClusterTokenToSecret, and other functions. The two new functions in this diff (readClusterIdentifierFromSecret at L749-756 and persistClusterIdentifierToIdentitySecret at L786-793) add two more copies. This makes maintenance error-prone — e.g., if the default namespace changes, every copy must be updated.

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants