Add 'none' update reconciliation strategy

Originally raised on Slack: https://cloud-native.slack.com/archives/CLAJ40HV3/p1622138440000400

## What happened?

We are using the helm-controller’s HelmRelease resource (v2beta1) and the Nginx Ingress Controller. All the Pods of the Nginx Ingress Controller were down and we upgraded a chart that included an Ingress resource; the upgrade failed due to the call the Nginx Ingress Controller's webhook failing and then rollback remediation failed for the same reason. This meant that the HelmRelease was stuck even though we have `retries: -1` for upgrade remediation. Looking at the code it seems that if remediation fails once then the helm-controller stops forever: https://github.com/fluxcd/helm-controller/blob/577925cf8384810916f2b4d285f546a09429b125/controllers/helmrelease_controller.go#L320

This same situation could happen for any webhook that becomes temporarily unavailable.

## Suggested solution

I understand that in some cases retrying can be unsafe, but for most cases in my opinion it’s better to keep trying instead of requiring manual intervention (after all, the chart in question had no issues, it was just an unfortunately timed upgrade).
We thought about changing remediation strategy to `uninstall`, but this is not ideal as it would cause downtime for our application (and would also delete CRDs, potentially causing more problems).

Ideally, we could set no remediation strategy and just attempt the upgrade indefinitely (this feels more closely aligned to the GitOps methodology), so I think a good solution would be implement a `none` upgrade reconciliation strategy which always succeeds, allowing update retries to continue.

It seems this was possible using the helm-operator, so maybe there was a reason for removing this feature: https://docs.fluxcd.io/projects/helm-operator/en/stable/helmrelease-guide/rollbacks/#enabling-retries-of-rolled-back-releases

In general the controller shouldn't be determining what to do based on its own status, rather it should be using the state of the world and the desired state in the spec to make a decision (see [API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)). Currently this controller is not able to recreate its status since it depends on observing the result of a particular action: https://github.com/fluxcd/helm-controller/blob/577925cf8384810916f2b4d285f546a09429b125/controllers/helmrelease_controller.go#L661

## Alternatives

Create a CronJob that runs `helm rollback`  on HelmReleases that have failed their rollback. Of course this is not ideal and remediation logic should stay within the controller.

Run a CronJob to remove conditions for stuck HelmReleases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'none' update reconciliation strategy #268

What happened?

Suggested solution

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add 'none' update reconciliation strategy #268

Description

What happened?

Suggested solution

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions