Refine bus monitor by scothis · Pull Request #88 · knative/eventing

scothis · 2018-06-13T03:40:55Z

Proposed Changes

Allow monitor handler to return errors to requeue event
Filter handler calls to resources for the monitored bus

ericbottard · 2018-06-15T14:16:14Z

+}
+
+func (m *Monitor) removeBus(namespace string, name string) error {
+	// nothing to do


Should this mark this monitor helper as "unusable"?

ericbottard · 2018-06-15T14:21:10Z

+func (m *Monitor) isChannelKnown(subscription channelsv1alpha1.Subscription) bool {
+	channelKey := makeChannelKeyFromSubscription(subscription)
+	summary := m.getChannelSummary(channelKey)
+	return summary != nil && summary.Channel != nil


Why the need for summary.Channel != nil (similar checks appear at other places). Seems like this should already be satisfied by the map lookup

In the case where a subscription resource is loaded before the channel resource, there won't be a channel in the map yet.

Isn't that the summary != nil part? My point is: can summary be non nil but with its Channel member nil? I would expect not

scothis · 2018-06-19T13:30:03Z

No description provided.

scothis · 2018-06-19T21:58:39Z

@vaikas-google We think this is about ready to go. Can you take a look?

evankanderson · 2018-06-21T16:31:47Z

who do you see provisioning a new Bus, a cluster operator or an application developer?

If I had to pick either the developer or operator persona as the bus deployer, I'd hedge and say it falls squarely in the middle of the two, but ultimately if forced I'd lean towards developer (but that may be my own bias as a developer).

I expect most clusters will have a single bus that handles the vast majority of channels with maybe one or two other exotic buses to utilize some special capability. For example, I don't expect a cluster to have both the pub/sub and kafka buses deployed as their capabilities are so similar.

Interesting -- I'd have thought that provisioning a bus would fall squarely on the operator. Would you expect the developer or the operator to provision the kafka cluster in the case of the kafka bus?

scothis · 2018-06-21T17:30:55Z

Interesting -- I'd have thought that provisioning a bus would fall squarely on the operator. Would you expect the developer or the operator to provision the kafka cluster in the case of the kafka bus?

That's a fair point. I'd expect the operator to be actively involved with deploying Kafka since that's certainly not set-and-forget.

scothis · 2018-06-21T20:53:24Z

@vaikas-google @evankanderson what are the next steps here?

vaikas · 2018-06-21T23:16:14Z

I think in the name of continuing with the prototyping work and figuring out the rough shape of the APIs involved and getting the data plane wired in to bindings I'd be fine to get this merged in. However, in parallel we should probably put together some time and thinking of what the control plane API looks like and what belongs there and what doesn't. I could certainly see separate steps:

Create the bus [say, deploy kafka using helm, or whatever]
There's a controller that then knows about this specific bus
CRDs then only represent control interfaces necessary for the wiring

Seems like there's some concerns with the current definitions so we should have some design discussions about the direction we all feel comfortable there, but I also am sensitive to making some progress with the understanding that these things might have to change. @evankanderson thoughts?

evankanderson · 2018-06-21T23:20:09Z

@grantr pointed me to https://kubernetes.io/docs/concepts/storage/storage-classes/, which seems to be addressing a similar problem around many possible implementations of storage for persistent volumes in kubernetes.

Since my concern is actually about #66, rather than this PR, I don't necessarily want to block getting this in. But I'd like to open a discussion about how to refactor the existing bus definitions towards a CRD which might look more like the StorageClass style, e.g.

apiVersion: eventing.knative.dev/v1alpha1
kind: Bus
metadata:
  name: prod-kafka
provisioner: kafka
parameters:  # A map[string]string
  connectString: "mykafka..."
  reclaimAfter: "20d"
  ...

evankanderson · 2018-06-21T23:25:07Z

@mattmoor also had some good observations this morning that a Bus feels like it is more likely to be a cluster-wide resource (thinking about Kafka or the equivalent), rather than a per-namespace resource.

If a Bus is cluster-wide, that also suggests provisioning by an operator. It seems like Channels should probably be per-namespace, which means that the Channel reference to the bus should include both a namespace and a name, unless the names are explicitly cluster-wide (i.e. scope: Cluster on the Bus CRD definition).

evankanderson · 2018-06-21T23:29:44Z

Thanks for the explanation of the monitor -- it wasn't clear that it was actually a controller framework.

I think my main question/point was that it should be possible to have multiple controllers managing resources of the same type, as long as ownership is clearly defined between controllers. (I.e. you could have a Kafka controller and a PubSub controller both watching Channels or Buses, as long as it's clear from the resource definitions which one should act on that particular resource.) We'd probably also want a webhook to keep resources from changing from one controller regime to another without a delete.

grantr · 2018-06-21T23:43:23Z

+	subscriptionsSynced      cache.InformerSynced
+	cache                    map[channelKey]*channelSummary
+	provisionedChannels      map[channelKey]*channelsv1alpha1.Channel
+	provisionedSubscriptions map[subscriptionKey]*channelsv1alpha1.Subscription


What are the cache, provisionedChannels, and provisionedSubscriptions maps used for? Is the information in them recoverable if the monitor pod is restarted?

Yes, these caches are idempotent and will repopulate from the informer when the pod restarts.

grantr · 2018-06-21T23:51:38Z

keeps a queryable cache for fast runtime lookups in the dispatcher.

Is this cache different from an informer? Informers are already caches that automatically stay current with Kubernetes state and can be queried by name and selector (see the generated Listers).

scothis · 2018-06-21T23:57:56Z

I'm in full agreement that the model proposed here needs further iteration. #66 was a green field spike, this PR uses the learnings from implementing PubSub (#98) and Kafka (#99) buses. With these PRs I feel much more comfortable that we can iterate on the model quicker while testing our assumptions with running code.

The notion of StorageClasses as a model for a bus is something that we had talked about in the context of riff, it's interesting to see the same idea arise organically. One possible downside of following that model is that storage is often a tradeoff between speed and cost. Many of the classes are named fast and slow. Messaging is a much richer domain with several other dimensions and tradeoffs. I need to read more about how storage classes are implemented.

The more I think about it, the more I agree that a Bus is primarily an operator construct. Their is a direct analogy between a message broker and a database. In the service broker world, the operator decides which services and configurations are available, while the developer consumes those services.

I think my main question/point was that it should be possible to have multiple controllers managing resources of the same type, as long as ownership is clearly defined between controllers. (I.e. you could have a Kafka controller and a PubSub controller both watching Channels or Buses, as long as it's clear from the resource definitions which one should act on that particular resource.) We'd probably also want a webhook to keep resources from changing from one controller regime to another without a delete.

I completely agree. The monitor is configured for a single bus, bus receives all channels and subscriptions via the informers. The first thing the bus does is filter out channels that are for other buses and subscriptions that are for other channels. The handler funcs on the monitor are only fired for types that the bus needs to handle. There's a bit of extra complexity in this implementation to handle the case of a channel switching buses, or a subscription switching channels. I agree that's a case we should disallow and simplify the monitor accordingly.

scothis · 2018-06-22T00:25:00Z

Is this cache different from an informer? Informers are already caches that automatically stay current with Kubernetes state and can be queried by name and selector (see the generated Listers).

There are two key purposes behind these caches:

allow a bus to lookup all active subscribers for a channel. Since a channel can have 0..N subscriptions, reading from the informer cache is more than an identity lookup. We don't have labels on the resources that would make for an effective selector, but even if we did, I doubt we'd want to do that query for every request in the data plane.
tracking provisioned resources helps with cleanup after a resource is deleted. Since there are potentially multiple buses operating on channels and subscriptions, the monitor figures out whether the deleted resource was one that it needs to care about. This get even more useful if there are params/arguments for a channel that are applicable during deletion (like preserving messages in a queue instead of discarding them).

Based off the conversation we had in slack today, an alternate approach would be to set a key on the bus' status for each channel that is provisioned (and the channel-subscription relationship). This would require that we ensure the master resource would never be deleted until all dependent resources have been deleted, otherwise we could leak resources.

ericbottard · 2018-06-22T13:50:01Z

+	// apply arguments
+	if arguments != nil {
+		for _, arg := range *arguments {
+			if _, ok := known[arg.Name]; ok {


The condition is inverted. Should read if !ok

markfisher · 2018-06-22T21:16:03Z

+	m.bus = bus
+
+	glog.Info("Starting workers")
+	// Launch two workers to process Bus resources


comment shouldn't say "two", depends on the value of threadiness

good artists copy, great artists steal.

https://github.com/kubernetes/sample-controller/blob/c86c38bb830b5b272ff9214b2685270d9a2062d4/controller.go#L165

markfisher · 2018-06-22T21:30:18Z

/lgtm

ericbottard · 2018-06-25T10:35:13Z

@mattmoor also had some good observations this morning that a Bus feels like it is more likely to be a cluster-wide resource (thinking about Kafka or the equivalent), rather than a per-namespace resource.

If a Bus is cluster-wide, that also suggests provisioning by an operator. It seems like Channels should probably be per-namespace, which means that the Channel reference to the bus should include both a namespace and a name, unless the names are explicitly cluster-wide (i.e. scope: Cluster on the Bus CRD definition).

Not sure this is exactly what you're referring to, but indeed in the Kafka Bus implementation for example, I craft the kafka topic name (which can be seen as an implementation detail from the channel user perspective) as being a combination of the channel name + namespace.

This way, assuming there is on Bus Custom Resource instance for one Kafka broker instance (certainly both the responsibility of an operator) then channels from any namespace can use it without clash.

If however, we envision that many Bus Custom Resource instances can share the same backing broker, then we'll want to factor in the name (and possibly namespace if we keep it namespaced) of the Bus itself.

IMO the "compromise" that makes the most sense is

Bus is cluster scoped
Channels and Subscriptions are namespaced
There is the implicit assumption that there is one backing broker per Bus

This is certainly something we had to consider working with riff, and there is a whole range of scenarios to cover

vaikas · 2018-06-25T22:34:37Z

/lgtm
/approve

google-prow-robot · 2018-06-25T22:34:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: scothis, vaikas-google

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [vaikas-google]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-prow-robot assigned ericbottard Jun 13, 2018

google-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 13, 2018

scothis added 7 commits June 14, 2018 09:17

Allow montior handler funcs to return errors

362dc9c

Filter handler funcs for channels and subscriptions on the monitored bus

e17286f

Requeue failed updates

a3eb63f

Return full Channel/Subscription not *Spec

72d39ae

Bus arguments exposed as envvars

62bd4e8

Add volumes to bus

baeb541

Cleanup race conditions

cef2d3b

ericbottard reviewed Jun 15, 2018

View reviewed changes

scothis added 10 commits June 15, 2018 10:35

Split hello sample channel and subscription into separate resources

612a8e8

Revert bus arguments, use envvar instead

11ef741

Remove length check

9f261d1

Cleanup Argument/Parameter relationship

a7449c1

Expose attributes directly to ProvisionFunc and SubscribeFunc

0654496

Define subscription params on the bus

a468a1e

Emit events processed resources

fb308f7

Lookup target bus before processing informer queue

aa822d6

Drop resourceKey, normalize resource types

ce5debd

More type alignment

61d9f85

google-prow-robot assigned markfisher Jun 19, 2018

scothis added 2 commits June 19, 2018 12:36

Merge remote-tracking branch 'upstream/master' into bus-monitor

886cdb5

Move informers into monitor

c2e4f4f

google-prow-robot assigned vaikas Jun 19, 2018

scothis mentioned this pull request Jun 20, 2018

GCP Pub/Sub Knative Bus #98

Merged

ericbottard mentioned this pull request Jun 20, 2018

Kafka Bus Implementation #99

Merged

scothis mentioned this pull request Jun 21, 2018

Resource status for Bus, Channel and Subscription #103

Closed

Merge remote-tracking branch 'upstream/master' into bus-monitor

cf17add

grantr reviewed Jun 21, 2018

View reviewed changes

ericbottard reviewed Jun 22, 2018

View reviewed changes

Fix unknown arguments check

fe854e2

This was referenced Jun 22, 2018

Make Bus a cluster scoped resource? #106

Closed

Prevent mutation of Bus/Channel references in Channels and Subscriptions #107

Closed

markfisher reviewed Jun 22, 2018

View reviewed changes

google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 22, 2018

Merge remote-tracking branch 'upstream/master' into bus-monitor

24e7f7a

google-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2018

google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2018

google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2018

google-prow-robot merged commit ea71285 into knative:master Jun 25, 2018

scothis deleted the bus-monitor branch June 25, 2018 22:57

grantr mentioned this pull request Aug 21, 2018

Add grantr to list of approvers #387

Merged

Conversation

scothis commented Jun 13, 2018

Proposed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scothis commented Jun 19, 2018

Uh oh!

scothis commented Jun 19, 2018

Uh oh!

evankanderson commented Jun 21, 2018

Uh oh!

scothis commented Jun 21, 2018

Uh oh!

scothis commented Jun 21, 2018

Uh oh!

vaikas commented Jun 21, 2018

Uh oh!

evankanderson commented Jun 21, 2018

Uh oh!

evankanderson commented Jun 21, 2018

Uh oh!

evankanderson commented Jun 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grantr commented Jun 21, 2018

Uh oh!

scothis commented Jun 21, 2018

Uh oh!

scothis commented Jun 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scothis Jun 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markfisher commented Jun 22, 2018

Uh oh!

ericbottard commented Jun 25, 2018

Uh oh!

vaikas commented Jun 25, 2018

Uh oh!

google-prow-robot commented Jun 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

scothis Jun 22, 2018 •

edited

Loading