Skip to content

Initial error handling support#1949

Merged
knative-prow-robot merged 10 commits into
knative:masterfrom
lionelvillard:dlc
Oct 29, 2019
Merged

Initial error handling support#1949
knative-prow-robot merged 10 commits into
knative:masterfrom
lionelvillard:dlc

Conversation

@lionelvillard
Copy link
Copy Markdown
Contributor

@lionelvillard lionelvillard commented Sep 23, 2019

Helps #1573 and #1493

Proposed Changes

  • Add DeliverySpec and DeliveryStatus types
  • Add DeliverySpec to Channelable
  • Add DeliveryStatus to ChannelableStatus

Baby step towards proper error handling.
Design document: https://docs.google.com/document/d/1qRrzGoHJQO-oc5p-yRK8IRfugd-FM_PXyM7lN5kcqks/edit#

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Sep 23, 2019
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 23, 2019
@lionelvillard
Copy link
Copy Markdown
Contributor Author

/hold

Holding while the design is under review.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2019
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 25, 2019
@lionelvillard lionelvillard force-pushed the dlc branch 2 times, most recently from 1747896 to 3e9bb46 Compare September 25, 2019 15:20
Comment thread docs/delivery/README.md Outdated

### Dead-Letter Channel

Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel, to forward failed messages to the error sink. In that case, the error sink might be realized by creating a subscription on the error channel.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan still to expose this channel as a channel in the status?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes only if the channel supports it.

@lionelvillard lionelvillard changed the title Extend Channelable with DLC support Initial error handling support Sep 25, 2019
Comment thread docs/delivery/README.md Outdated
* to channel subscribers.
* to source sink.
* to broker/triggers.
* Be able to identify a message couldn’t be delivered (Observability)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/identify a message/identify messages ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events ?

Comment thread docs/delivery/README.md Outdated
* to source sink.
* to broker/triggers.
* Be able to identify a message couldn’t be delivered (Observability)
* Be able to leverage existing native error handling mechanisms (eg. dead letter queues).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native: could we be more specific here? that native means "platform native", if available or the like ?

Comment thread docs/delivery/README.md
* to broker/triggers.
* Be able to identify a message couldn’t be delivered (Observability)
* Be able to leverage existing native error handling mechanisms (eg. dead letter queues).
* Be able to redirect of error'ed events from a channel.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events? or messages? let's be consistent, and I think event is the right term, over message(s) /cc @nachocano

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using event everywhere.

Comment thread docs/delivery/README.md Outdated

### Dead-Letter Channel

Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel, to forward failed messages to the error sink. In that case, the error sink might be realized by creating a subscription on the error channel.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel

Could we add a link to the EIP definition?

and perhaps add something like

Knative Channel implementations may leverage existing platform native error handling support they might provide, like a a [_Dead Letter Channel_](https://www.enterpriseintegrationpatterns.com/patterns/messaging/DeadLetterChannel.html), to forward failed messages from their _Dead Letter Channel_ to the configured error sink. 

Comment thread docs/delivery/README.md Outdated

Typically channel implementations and event sources retry sending messages before redirecting them to the error sink.
While there are many different ways to implement the retry logic
(immediate retry, retry queue, etc...), implementations usually
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to somewhat categories / spec this ? Right now, it's not really know if channels/sources do that (and how often by default)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe changing the wording would help.

Channel implementations and event sources should retry ...

?

Comment thread docs/delivery/README.md Outdated

### Delivery Specification

The goal of this delivery specification is to formally define the vocabulary related to capabilities defined above (error sink, dead-letter queues and retry) to provide consistency across all Knative event sources, channels and brokers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knative event sources, channels implementations and brokers

?

Comment thread docs/delivery/README.md Outdated
)
```

Channel, brokers and event sources are not required to support all this capabilities and are free to add more delivery options.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel implementations, brokers and ....

Comment thread docs/delivery/README.md Outdated

### Exposing underlying DLC

Channels supporting dead letter queue should advertise it in their status.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we mix wording here. DLC / DLQ

EIP calls it "Dead Letter Channel", let's stay with that ?

@lionelvillard
Copy link
Copy Markdown
Contributor Author

/hold cancel
/assign @vaikas-google

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2019
Copy link
Copy Markdown
Contributor

@vaikas vaikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for whipping this into shape! Couple of comments.

Comment thread docs/delivery/README.md Outdated
// More information on Duration format: https://www.ietf.org/rfc/rfc3339.txt
//
// For linear policy, backoff delay is the time interval between retries.
// For exponential policy , backoff delay is backoffDelay*10^<numberOfRetries>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm more accustomed to using base 2 for the backoff (so just double the delay each time instead of 10x it), but do not feel super strongly about this.
Thought? Do we want to cap this after say 10 times or something. This can be a todo later guided by experience.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this from the k8s go client. Now I see it has changed (or I was not looking at the right place): https://github.com/kubernetes/client-go/blob/master/util/workqueue/default_rate_limiters.go#L65

Fixing...

Comment thread docs/delivery/README.md Outdated
```go

// DeliverStatus contains the Status of an object supporting delivery options.
type DeliverStatus struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have DeliverySpec is DeliverStatus a typo? I'd expect them to be consistent.

)

// DeliverStatus contains the Status of an object supporting delivery options.
type DeliverStatus struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I think this should be DeliveryStatus for consistency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Comment thread docs/delivery/README.md

Note that multiple copies of the same event can be sent to the error sink due to multiple subscription failures.

Brokers might decide to change the event type before reposting the failed event into the broker. This could be done by having a special error sink specific to broker.
Copy link
Copy Markdown
Contributor

@sayanh sayanh Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean brokers will aggregate the events(in case of multiple copies) based on id? Why brokers will change the event type? If done, the subscribers will be impacted. Can you give an example where it makes sense to change the event type?

@alexander-alvarez
Copy link
Copy Markdown

@lionelvillard any way to get insight into the design document for those of us outside of the sharing circle?

This capability is a deal breaker for us to trial and later adopt knative for particular use cases

@lionelvillard
Copy link
Copy Markdown
Contributor Author

Not sure how to do that. @vaikas-google ?

@vaikas
Copy link
Copy Markdown
Contributor

vaikas commented Oct 29, 2019

easiest would to join the knative-users@ google group, will give access to the documents in general.
https://github.com/knative/community

@vaikas
Copy link
Copy Markdown
Contributor

vaikas commented Oct 29, 2019

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lionelvillard, vaikas-google

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2019
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2019
@vaikas
Copy link
Copy Markdown
Contributor

vaikas commented Oct 29, 2019

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2019
@knative-prow-robot knative-prow-robot merged commit cb91d30 into knative:master Oct 29, 2019
@lionelvillard lionelvillard deleted the dlc branch August 28, 2020 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants