Clean-up of Sequence#1031
Conversation
Signed-off-by: Doug Davis <dug@microsoft.com>
|
Another problem with the ordering is that as the spec is currently written - nothing is going to actually comply with it. Let's consider one of the most popular message brokers - Kafka. Kafka has an offset, which is a number that starts at zero and goes up by one for each event. So, we could put this in the sequence. Except, Kafka partitions topics, and each partition has its own offset. So, you can't use the sequence for ordering all events in that case. More generally, for any message broker that is able to scale topics beyond one node, this is going to be the case, because a single sequence for an entire topic is simply not possible at scale, scaling requires partitioning and partitioning implies multiple independent sequences. I just think that as it stands, we have a spec that no real world implementations can or will conform to, unless they constrain themselves to only handle topics on a single node and not be able to scale beyond one node. |
| can determine the order of the events via a simple string-compare type of | ||
| operation. This means that it might be necessary for the value to include | ||
| some kind of padding (e.g. leading zeros in the case of the value being the | ||
| string representation of an integer). |
There was a problem hiding this comment.
Consider pushing them towards hex if they want numeric
There was a problem hiding this comment.
If the sequencetype is removed, this limits each source to producing just a single sequence at the same time. I could imagine that in addition to a generic sequencetype Integer, there could be domain specific sequence types distinguishing different scenarios or processes. A single source should not be restricted to generating events belonging to a single sequence. There could even be one sequence per combination of source and subject.
There was a problem hiding this comment.
This is interesting because it sounds like you want to use "sequenceType" as a form of differentiator of sequences. While I'm interpreting it as a refinement of the "type" of the sequence... meaning, int vs string vs something else. And so I see the possible values of "sequenceType" as being very limited - in the same way type systems are usually limited. In your model, I think "sequenceType" values would basically be unlimited and almost app/stream specific. I'm not sure that was the intent... but I could be wrong. Am I following you correctly?
There was a problem hiding this comment.
Yes, I agree that my interpretation of sequencetype might not be what was originally intended. I just wonder, if we really want to limit each unique source to exactly one sequence? Of course, there is some freedom in how the value for sequence is generated, but a generic intermediary will always sort events from one unique source using the sequence attribute.
I see two ways to extend this:
- Add an attribute
sequenceidthat allows to distinguish between multiple sequences of thesource. - State that sources must be sufficiently fine-grained to always only emit one sequence. You could achieve this by adding the
sequenceidas an additional path segment to thesource.
There was a problem hiding this comment.
@BenBeattieHood @jroper any thoughts on @deissnerk's last comment?
I don't think the sequence attribute must be mapped 1:1 to a topic or a partition. First of all the attribute is particularly useful, if your infrastructure does not guarantee the event order. In addition, the partitioning in Kafka uses the partition key to make sure that all events belonging to the same sequence (because they have the same partition key) are stored in the same partition. The opposite is not true. Not all events inside a single partition need to belong to the same sequence. |
|
@jroper your comment is interesting because as I read the spec (and this PR) I tend to think that it's pretty vague (purposely so) w.r.t. the values of "sequence". Meaning, the choice of the value can be based on just about anything that the sender thinks is appropriate. So, calculating a value that takes into account a multi-node sequence is perfectly acceptable. I think the only constraint in the spec/PR is the events are linked to a "unique event source". Can you elaborate on why you think the current text doesn't allow for the type of ordering you're looking for? |
|
Thanks for the thought that has gone into refining this. I think it's a good adjustment, thanks @duglin. @jroper (as @deissnerk says) sequence is per aggregate instance, rather than per topic. Eg. topic = 'sensors', where each aggregate would be each individual sensor; in reality each sensor creates an ordered sequence of events - this 'sequence' value just captures the order so that consumers can correctly interpret events over unordered transport. Take an example of a light sensor. It emits two types of events: 'light is on' and 'light is off'. If a consumer received the following events from an unordered transport, what should be the final state? ['off', 'on', 'on', 'off', 'off', 'off', 'on', 'off'...] etc And so sequence can actually be a requirement when using unordered transport. In addition to this, if the producer is able to guarantee contiguity, then sequence can also be helpful for integrity - the consumer can scale statelessly and still guarentee that consumed events are only applied once, and that no events are missed during consumption. Lastly, this utility of sequence (if contiguous) for guaranteeing no events are missed during consumption also allows for non-guaranteed transport, which is another important tool for scaling. So all-in-all, there's a few aspects of sequence that both fit well with scaling (eg Kafka's partitioning whole aggregates, as @deissnerk pointed out), as well as actually aid scaling (unordered transport and non-guaranteed transport). |
|
Approved on the 7/21 call |
Alternative to #1026
@BenBeattieHood @jroper comments?
@jroper I know you'd like to remove the "lexicographical" stuff, but I couldn't bring myself to do that yet... baby steps :-)
Signed-off-by: Doug Davis dug@microsoft.com
Fixes #923
Proposed Changes
Release Note