Cleanup Bus/ClusterBus reconciler#420
Conversation
- only start the needed informer for either Bus or ClusterBus, we don't need both - consolidate the createOrUpdateBus and createOrUpdateClusterBus methods - use BusReference to check if the updated bus is the reconciled bus Refs knative#418
|
@neosab good catch, updated |
|
/lgtm |
|
@neosab: changing LGTM is restricted to assignees, and only knative/eventing repo collaborators may be assigned issues. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| r.bus = clusterBus.DeepCopy() | ||
| if !equality.Semantic.DeepEqual(r.bus.GetSpec(), previousBus.GetSpec()) { | ||
| // stash the new bus on the reconciler while retaining the old bus | ||
| bus, r.bus = r.bus, bus |
There was a problem hiding this comment.
r.bus is no longer a DeepCopy of the bus we get back from the busesLister, but during create we make a deepcopy in this same scenario (https://github.com/knative/eventing/pull/420/files#diff-d2cf608fb7c41df730d64ac95f391e7aR313). I don't see why the DeepCopy is needed (am I missing something?) but I imagine these two codepaths should operate the same way.
There was a problem hiding this comment.
The bus resource on the reconciler should never be mutated. The general pattern is to create a copy before mutating a resource. The DeepCopy here was an extra layer of protected. With the switch to use GenericBus there is no longer a consistent implementation of DeepCopy to call.
I tried to add DeepCopy() GenericBus to the GenericBus interface, but couldn't get the type checker to accept the generated DeepCopy methods on Bus and ClusterBus since they don't return GenericBus but a struct that implements GenericBus.
To make the behavior more consistent we could call DeepCopy() before passing the bus into createOrUpdateBus(), or drop the other (hopefully) redundant DeepCopy calls so that's is obvious a copy needs to be created before mutation. Thoughts?
There was a problem hiding this comment.
Ah, makes sense genericbus would cause issues there. Either way seems reasonable to me, as long as we end up with consistency.
There was a problem hiding this comment.
Now calling .DeepCopy() before passing the bus to createOrUpdateBus().
neosab
left a comment
There was a problem hiding this comment.
Ah, I just meant to say this PR looks good to me 👍
|
Minor question but lgtm |
|
/check-cla |
|
/lgtm |
| r.bus = clusterBus.DeepCopy() | ||
| if !equality.Semantic.DeepEqual(r.bus.GetSpec(), previousBus.GetSpec()) { | ||
| // stash the new bus on the reconciler while retaining the old bus | ||
| bus, r.bus = r.bus, bus |
There was a problem hiding this comment.
Why is this stash needed? How does it work when there are multiple worker threads?
There was a problem hiding this comment.
The workqueue guarantees that the same resource will never be emitted concurrently, so that should handle any race concerns.
We capture the bus resource for a few reasons:
- in order to resolve parameters for Channels and Subscriptions
- to know if a Channel is backed by the particular bus instance
- to record Events against
There was a problem hiding this comment.
The same resource will not be emitted concurrently, but IIUC there is still the case of threadiness different resources being reconciled simultaneously.
- resource A enters worker thread A
- resource B enters worker thread B
- resource A and B race to set
r.bus - What is the value of
r.bus?
Is this scenario possible?
Serving and the Feed controller have context.Context objects plumbed through the entire Reconcile loop now, and it's reasonable to expect this will be a general pattern in Knative controllers. We may be able to eliminate the race (if there is one) by stashing the bus resource in a context.Context. That also makes the single-call scope of the bus stash clear. I'm intentionally ignoring any opinions on the design of context.Context for now. 😁
There was a problem hiding this comment.
A reconciler is for a single bus resource. Resource A and B will never race to set r.bus because only A or B can be for the the bus that is being reconciled (otherwise there's a major bug in workqueue). When r.bus is updated, the kind, apiVersion, name and namespace of the previous and future value are identical. We set the value to pickup spec mutations.
#422 adds all provisioned Channels and/or Subscriptions (as needed) to the workqueue when the Bus params change. This will cause them to be reprocessed.
There was a problem hiding this comment.
@scothis and I talked about this offline. Restating the above a bit: Reconciler will have all Bus update events in its workqueue, but discards update events for all Buses except one. That happens at https://github.com/knative/eventing/pull/420/files#diff-d2cf608fb7c41df730d64ac95f391e7aR542.
The Reconciler's workqueue has multiple object types in it. This is why the Reconciler needs threadiness even though it only ever reconciles a single Bus. This is not conventional. I don't know enough about the implementation to say whether the convention should apply, but I'd to see a comment in the code that acknowledges and briefly explains this break from convention.
So tl;dr my understanding is that this isn't a race condition even though it looks like one. I'd like to see a comment in the code acknowledging that as well.
grantr
left a comment
There was a problem hiding this comment.
/lgtm
/approve
/hold
Ok, we're on the same page now. Thanks for explaining your reasoning @scothis.
These comments describe the contents of workqueue keys incorrectly:
https://github.com/knative/eventing/pull/420/files#diff-d2cf608fb7c41df730d64ac95f391e7aR381
https://github.com/knative/eventing/pull/420/files#diff-d2cf608fb7c41df730d64ac95f391e7aR393
https://github.com/knative/eventing/pull/420/files#diff-d2cf608fb7c41df730d64ac95f391e7aR417
These are issues that are only related to this PR in the sense that I was confused by them while reviewing, so if you want to address them in a separate PR that's fine (you can cancel the hold in that case). Or if you want to address them now I'll lgtm again.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: grantr, scothis The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
Proposed Changes
need both
Refs #418
Release Note
/assign @greghaynes