Initial error handling support#1949
Conversation
|
/hold Holding while the design is under review. |
1747896 to
3e9bb46
Compare
|
|
||
| ### Dead-Letter Channel | ||
|
|
||
| Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel, to forward failed messages to the error sink. In that case, the error sink might be realized by creating a subscription on the error channel. |
There was a problem hiding this comment.
Is the plan still to expose this channel as a channel in the status?
There was a problem hiding this comment.
yes only if the channel supports it.
| * to channel subscribers. | ||
| * to source sink. | ||
| * to broker/triggers. | ||
| * Be able to identify a message couldn’t be delivered (Observability) |
There was a problem hiding this comment.
s/identify a message/identify messages ?
| * to source sink. | ||
| * to broker/triggers. | ||
| * Be able to identify a message couldn’t be delivered (Observability) | ||
| * Be able to leverage existing native error handling mechanisms (eg. dead letter queues). |
There was a problem hiding this comment.
native: could we be more specific here? that native means "platform native", if available or the like ?
| * to broker/triggers. | ||
| * Be able to identify a message couldn’t be delivered (Observability) | ||
| * Be able to leverage existing native error handling mechanisms (eg. dead letter queues). | ||
| * Be able to redirect of error'ed events from a channel. |
There was a problem hiding this comment.
events? or messages? let's be consistent, and I think event is the right term, over message(s) /cc @nachocano
There was a problem hiding this comment.
I'm fine with using event everywhere.
|
|
||
| ### Dead-Letter Channel | ||
|
|
||
| Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel, to forward failed messages to the error sink. In that case, the error sink might be realized by creating a subscription on the error channel. |
There was a problem hiding this comment.
Channel implementations might leverage the existing native error handling support they provide, usually a dead letter channel
Could we add a link to the EIP definition?
and perhaps add something like
Knative Channel implementations may leverage existing platform native error handling support they might provide, like a a [_Dead Letter Channel_](https://www.enterpriseintegrationpatterns.com/patterns/messaging/DeadLetterChannel.html), to forward failed messages from their _Dead Letter Channel_ to the configured error sink.
|
|
||
| Typically channel implementations and event sources retry sending messages before redirecting them to the error sink. | ||
| While there are many different ways to implement the retry logic | ||
| (immediate retry, retry queue, etc...), implementations usually |
There was a problem hiding this comment.
we want to somewhat categories / spec this ? Right now, it's not really know if channels/sources do that (and how often by default)
There was a problem hiding this comment.
Maybe changing the wording would help.
Channel implementations and event sources should retry ...
?
|
|
||
| ### Delivery Specification | ||
|
|
||
| The goal of this delivery specification is to formally define the vocabulary related to capabilities defined above (error sink, dead-letter queues and retry) to provide consistency across all Knative event sources, channels and brokers. |
There was a problem hiding this comment.
Knative event sources, channels implementations and brokers
?
| ) | ||
| ``` | ||
|
|
||
| Channel, brokers and event sources are not required to support all this capabilities and are free to add more delivery options. |
There was a problem hiding this comment.
Channel implementations, brokers and ....
|
|
||
| ### Exposing underlying DLC | ||
|
|
||
| Channels supporting dead letter queue should advertise it in their status. |
There was a problem hiding this comment.
we mix wording here. DLC / DLQ
EIP calls it "Dead Letter Channel", let's stay with that ?
|
/hold cancel |
vaikas
left a comment
There was a problem hiding this comment.
Thanks for whipping this into shape! Couple of comments.
| // More information on Duration format: https://www.ietf.org/rfc/rfc3339.txt | ||
| // | ||
| // For linear policy, backoff delay is the time interval between retries. | ||
| // For exponential policy , backoff delay is backoffDelay*10^<numberOfRetries> |
There was a problem hiding this comment.
nit: I'm more accustomed to using base 2 for the backoff (so just double the delay each time instead of 10x it), but do not feel super strongly about this.
Thought? Do we want to cap this after say 10 times or something. This can be a todo later guided by experience.
There was a problem hiding this comment.
I got this from the k8s go client. Now I see it has changed (or I was not looking at the right place): https://github.com/kubernetes/client-go/blob/master/util/workqueue/default_rate_limiters.go#L65
Fixing...
| ```go | ||
|
|
||
| // DeliverStatus contains the Status of an object supporting delivery options. | ||
| type DeliverStatus struct { |
There was a problem hiding this comment.
We have DeliverySpec is DeliverStatus a typo? I'd expect them to be consistent.
| ) | ||
|
|
||
| // DeliverStatus contains the Status of an object supporting delivery options. | ||
| type DeliverStatus struct { |
There was a problem hiding this comment.
same here, I think this should be DeliveryStatus for consistency?
|
|
||
| Note that multiple copies of the same event can be sent to the error sink due to multiple subscription failures. | ||
|
|
||
| Brokers might decide to change the event type before reposting the failed event into the broker. This could be done by having a special error sink specific to broker. |
There was a problem hiding this comment.
Did you mean brokers will aggregate the events(in case of multiple copies) based on id? Why brokers will change the event type? If done, the subscribers will be impacted. Can you give an example where it makes sense to change the event type?
|
@lionelvillard any way to get insight into the design document for those of us outside of the sharing circle? This capability is a deal breaker for us to trial and later adopt knative for particular use cases |
|
Not sure how to do that. @vaikas-google ? |
|
easiest would to join the knative-users@ google group, will give access to the documents in general. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lionelvillard, vaikas-google The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
Helps #1573 and #1493
Proposed Changes
Baby step towards proper error handling.
Design document: https://docs.google.com/document/d/1qRrzGoHJQO-oc5p-yRK8IRfugd-FM_PXyM7lN5kcqks/edit#