Workqueues are not used to handle all informer notifications

/area API
/kind bug

## Expected Behavior

Controllers that reconcile multiple resources should reconcile all resources without blocking other controllers.

## Actual Behavior

Secondary resources block other controllers during reconcile and retry.

## Detailed Explanation

Currently each controller reconciles the primary resource plus one or more secondary resources. For example, the configuration controller reconciles configurations with `enqueueConfiguration`, plus revisions with `addRevisionEvent` (which updates the configuration status with the latest revision names).

Each primary reconcile loop enqueues update notifications from the informer into a workqueue, then runs 2 worker goroutines processing those updates. The workqueue contains retry logic to allow for retries when reconcile fails.

None of the secondary reconcile loops use the above strategy. Each reconcile loop processes update notifications directly without enqueueing them in a workqueue. If I'm understanding https://github.com/kubernetes/client-go correctly, there are several issues with this:

- Retries are done [by the informer](https://github.com/kubernetes/client-go/blob/8296aa170883e4e6041188593c48e5deb97fee67/tools/cache/shared_informer.go#L548) rather than [the workqueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/default_rate_limiters.go), with different retry behavior for each.
- Reconcile blocks informers. Informers are global per process and all controllers run in the same process, so secondary reconcile handlers block all other controllers receiving events from that informer, and could possibly cause informers to miss watch updates (I don't know enough about the internals of watches and informers to say for sure).
- Informer retries block the informer until the handler processes successfully, which could be never. In this case, no controller would receive another update from that informer.

## Proposed fix(es)

Add a workqueue for each secondary resource reconciled by a controller. Each resource type needs a separate workqueue with separate queue workers, unless the workqueue setup is changed to allow different resource types in a single workqueue.

Alternately, move all secondary resource logic into the primary controller. For example, the code in [`addRevisionEvent`](https://github.com/elafros/elafros/blob/179301595e3548c872cbc9a5f18a348a48e8df5e/pkg/controller/configuration/configuration.go#L385) in configuration controller could be moved to revision controller, where it could take advantage of the existing workqueue for revision updates.

A third alternative is to move secondary resource logic to separate special-purpose controllers. In the configuration controller example, `addRevisionEvent` would move to a new `revision-readiness` controller (name TBD). This helps preserve controller simplicity by defining them as a single reconciliation loop instead of a single controlled resource. In this solution, the 4 current Elafros controllers would be split into 10 controllers, but IMO the intent of each controller would be easier to understand.

## Plea for expert help

This issue is deeper than my Kubernetes knowledge extends, so I'd like to get some advice from Kubernetes maintainers before deciding its importance and correct solution.

/cc @mattmoor @tcnghia 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workqueues are not used to handle all informer notifications #823

Expected Behavior

Actual Behavior

Detailed Explanation

Proposed fix(es)

Plea for expert help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workqueues are not used to handle all informer notifications #823

Description

Expected Behavior

Actual Behavior

Detailed Explanation

Proposed fix(es)

Plea for expert help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions