Skip to content

Clean-up of Sequence#1031

Merged
duglin merged 1 commit into
cloudevents:mainfrom
duglin:sequence-part2
Jul 21, 2022
Merged

Clean-up of Sequence#1031
duglin merged 1 commit into
cloudevents:mainfrom
duglin:sequence-part2

Conversation

@duglin
Copy link
Copy Markdown
Collaborator

@duglin duglin commented Jul 13, 2022

Alternative to #1026

@BenBeattieHood @jroper comments?

@jroper I know you'd like to remove the "lexicographical" stuff, but I couldn't bring myself to do that yet... baby steps :-)

Signed-off-by: Doug Davis dug@microsoft.com

Fixes #923

Proposed Changes

  • Remove SequenceType from Sequence extension

Release Note

Remove the SequenceType attribute from the Sequence extension to simplify things. Also added text to make it clear that "padding" might be needed to ensure the values can be compared properly.

Signed-off-by: Doug Davis <dug@microsoft.com>
@jroper
Copy link
Copy Markdown
Contributor

jroper commented Jul 13, 2022

Another problem with the ordering is that as the spec is currently written - nothing is going to actually comply with it. Let's consider one of the most popular message brokers - Kafka. Kafka has an offset, which is a number that starts at zero and goes up by one for each event. So, we could put this in the sequence. Except, Kafka partitions topics, and each partition has its own offset. So, you can't use the sequence for ordering all events in that case.

More generally, for any message broker that is able to scale topics beyond one node, this is going to be the case, because a single sequence for an entire topic is simply not possible at scale, scaling requires partitioning and partitioning implies multiple independent sequences.

I just think that as it stands, we have a spec that no real world implementations can or will conform to, unless they constrain themselves to only handle topics on a single node and not be able to scale beyond one node.

can determine the order of the events via a simple string-compare type of
operation. This means that it might be necessary for the value to include
some kind of padding (e.g. leading zeros in the case of the value being the
string representation of an integer).
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pushing them towards hex if they want numeric

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sequencetype is removed, this limits each source to producing just a single sequence at the same time. I could imagine that in addition to a generic sequencetype Integer, there could be domain specific sequence types distinguishing different scenarios or processes. A single source should not be restricted to generating events belonging to a single sequence. There could even be one sequence per combination of source and subject.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting because it sounds like you want to use "sequenceType" as a form of differentiator of sequences. While I'm interpreting it as a refinement of the "type" of the sequence... meaning, int vs string vs something else. And so I see the possible values of "sequenceType" as being very limited - in the same way type systems are usually limited. In your model, I think "sequenceType" values would basically be unlimited and almost app/stream specific. I'm not sure that was the intent... but I could be wrong. Am I following you correctly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that my interpretation of sequencetype might not be what was originally intended. I just wonder, if we really want to limit each unique source to exactly one sequence? Of course, there is some freedom in how the value for sequence is generated, but a generic intermediary will always sort events from one unique source using the sequence attribute.
I see two ways to extend this:

  1. Add an attribute sequenceid that allows to distinguish between multiple sequences of the source.
  2. State that sources must be sufficiently fine-grained to always only emit one sequence. You could achieve this by adding the sequenceid as an additional path segment to the source.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBeattieHood @jroper any thoughts on @deissnerk's last comment?

@deissnerk
Copy link
Copy Markdown
Contributor

@jroper

More generally, for any message broker that is able to scale topics beyond one node, this is going to be the case, because a single sequence for an entire topic is simply not possible at scale, scaling requires partitioning and partitioning implies multiple independent sequences.

I don't think the sequence attribute must be mapped 1:1 to a topic or a partition. First of all the attribute is particularly useful, if your infrastructure does not guarantee the event order. In addition, the partitioning in Kafka uses the partition key to make sure that all events belonging to the same sequence (because they have the same partition key) are stored in the same partition. The opposite is not true. Not all events inside a single partition need to belong to the same sequence.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 14, 2022

@jroper your comment is interesting because as I read the spec (and this PR) I tend to think that it's pretty vague (purposely so) w.r.t. the values of "sequence". Meaning, the choice of the value can be based on just about anything that the sender thinks is appropriate. So, calculating a value that takes into account a multi-node sequence is perfectly acceptable. I think the only constraint in the spec/PR is the events are linked to a "unique event source".

Can you elaborate on why you think the current text doesn't allow for the type of ordering you're looking for?

@BenBeattieHood
Copy link
Copy Markdown

BenBeattieHood commented Jul 14, 2022

Thanks for the thought that has gone into refining this. I think it's a good adjustment, thanks @duglin.

@jroper (as @deissnerk says) sequence is per aggregate instance, rather than per topic. Eg. topic = 'sensors', where each aggregate would be each individual sensor; in reality each sensor creates an ordered sequence of events - this 'sequence' value just captures the order so that consumers can correctly interpret events over unordered transport.

Take an example of a light sensor. It emits two types of events: 'light is on' and 'light is off'. If a consumer received the following events from an unordered transport, what should be the final state? ['off', 'on', 'on', 'off', 'off', 'off', 'on', 'off'...] etc And so sequence can actually be a requirement when using unordered transport. 

In addition to this, if the producer is able to guarantee contiguity, then sequence can also be helpful for integrity - the consumer can scale statelessly and still guarentee that consumed events are only applied once, and that no events are missed during consumption.

Lastly, this utility of sequence (if contiguous) for guaranteeing no events are missed during consumption also allows for non-guaranteed transport, which is another important tool for scaling.

So all-in-all, there's a few aspects of sequence that both fit well with scaling (eg Kafka's partitioning whole aggregates, as @deissnerk pointed out), as well as actually aid scaling (unordered transport and non-guaranteed transport).

@duglin duglin mentioned this pull request Jul 21, 2022
@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 21, 2022

Approved on the 7/21 call

@duglin duglin merged commit fd720c7 into cloudevents:main Jul 21, 2022
@duglin duglin deleted the sequence-part2 branch July 21, 2022 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"sequence" extension - integer encoded but lexicographically-orderable string?

4 participants