Skip to content

Helm operator: in some rare cases, helm operator is incorrectly determining a release is not installed #4296

@mikeshng

Description

@mikeshng

What did you do?

I am looking at the helm operator manager sync code where it determines if a release is installed or not: https://github.com/operator-framework/operator-sdk/blob/v1.2.0/internal/helm/release/manager.go#L113-#L122
Its relying on an installed helm release to always have a deployed storage available.
But looking at the helm code it seems like at least in one case that might not be always true.
During the helm upgrade, it seems to put the release in another state before creating the new deployed state which creates a very small timing gap that helm operator might incorrectly determine the release is not installed:
https://github.com/helm/helm/blob/v3.4.1/pkg/action/upgrade.go#L345-#L346
https://github.com/helm/helm/blob/v3.4.1/pkg/action/upgrade.go#L137

The consequences seems to be when the helm operator thinks the release is not installed, it tries to perform an install which might fail and perform a uninstall as rollback, causing potential data lost even if the install is recovered afterwards in the next reconcile:

https://github.com/operator-framework/operator-sdk/blob/v1.2.0/internal/helm/controller/reconcile.go#L174-#L179
https://github.com/operator-framework/operator-sdk/blob/v1.2.0/internal/helm/release/manager.go#L170-#L175

What did you expect to see?

That when the helm release is installed, helm operator correctly determines it is installed or double check before performing the install->uninstall rollback

What did you see instead?

In some extremely rare cases, helm operator incorrectly determines an already installed release is not installed.

Environment

Operator type:
/language helm

Kubernetes cluster type:
Kind

$ operator-sdk version
Master branch

Possible Solution

Can we double check against the CR status and see if the deployedRelease is populated to verify that indeed the release is not installed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.language/helmIssue is related to a Helm operator projectlifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions