Skip to content

Suggest padding for ints#1026

Closed
duglin wants to merge 2 commits into
cloudevents:mainfrom
duglin:sequence
Closed

Suggest padding for ints#1026
duglin wants to merge 2 commits into
cloudevents:mainfrom
duglin:sequence

Conversation

@duglin
Copy link
Copy Markdown
Collaborator

@duglin duglin commented Jun 21, 2022

Fixes #923

Signed-off-by: Doug Davis dug@microsoft.com

Release Note

Make sequence lexicographically-orderable, sometimes and don't ignore whitespace

Fixes cloudevents#923

Signed-off-by: Doug Davis <dug@microsoft.com>
@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jun 26, 2022

@BenBeattieHood @jroper Any thoughts on this?

@BenBeattieHood
Copy link
Copy Markdown

Thanks @duglin :) I appreciate the additional work, but I think the original is clearer - sequence is a string, and used to order the events - therefore it's inferable that integers are required to be zero-padded. Zero padding integers might be worth calling out specifically, but I don't think the additional text around recommendations/'should's helps make this point clearer in this case. And including commentary on whitespace might conflict with formatting (eg yml vs json) and so should be defined in the format instead.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 5, 2022

@BenBeattieHood thanks for responding. Got a question for ya.... while I agree that requiring leading zeros makes things easier because then (basically) sequenceType can be ignored and people can always just compare "sequence" using strcmp, what do you see sequenceType actually being used for then? I mean, it might be kind of interesting to know that it can be thought of an an "int" but for the purpose of ensuring ordering that's not really necessary, right?

@c-pius
Copy link
Copy Markdown
Contributor

c-pius commented Jul 6, 2022

what do you see sequenceType actually being used for then?

FYI: we for instance considered using the sequenceType to also define what the sequence value relates to. E.g., is it valid comparing sequence for a given subject, a given event type, across subjects and event types , ... But we discarded that as we thought this is not the real intent of sequenceType.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 6, 2022

@BenBeattieHood see if the new edits look better

Signed-off-by: Doug Davis <dug@microsoft.com>
@BenBeattieHood
Copy link
Copy Markdown

BenBeattieHood commented Jul 7, 2022

@duglin I like the new revision - it seems much clearer to me. Thank you! 

In answer to your question, I guess in the case where one runs out of padding on an int, or moves from one string encoding to another, then sequenceType can help transmit this change. But yep, it's probable that it's an overengineering, and just removing sequenceType will be simpler.

@c-pius I believe sequence is only ever necessary across a single aggregate instance (ie across all events from a single sensor, or across all events from a single domain entity instance). Sequence isn't really important across anything else, as its really there to ensure events are consumed in the same order as they are emitted from an event source instance. (more info on this here)

@jroper
Copy link
Copy Markdown
Contributor

jroper commented Jul 7, 2022

My thoughts are that sequence should be opaque, and use case specific. I don't expect any clients to actually use it to try and sort events, most messaging providers are capable of delivering in order within certain constraints. What I do expect is it to be used for offset based consumption restarting, so when extending consumption out to a device via a TCP connection, eg, gRPC or WebSockets, and the connection is interrupted the sequence can be used to indicate what the last event you consumed was so the stream can be resumed. In that case, the value is completely opaque.

I'm not a fan of the lexicographical ordering requirement. Consider event sources that come from Cassandra, if you want an ordered sequence in Cassandra, best practice is to use time based UUIDs. These produce a stable ordering based on time, but are not lexicographically sortable. We use these in for our event sourced journal in Akka when running on Cassandra.

Many other distributed databases, eg, Spanner or Yugabyte, have a transaction timestamp that is commonly used to order events. These are usually 64 bit ints and they don't increase by one. So the requirement to increase by one excludes event sources from those databases too.

Another thing to consider is that an event source might not want to reveal its throughout, as this could be confidential business data, and so instead of emitting a plain sequence number, encrypted sequence numbers might make more sense, which obviously are not lexicographically sortable.

So, I would prefer to make sequence opaque, with any meaning taken from it being committed out of band.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 7, 2022

@BenBeattieHood any thoughts on @jroper's comment?

Copy link
Copy Markdown
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable fudge for the moment, but if we ever want to actually standardize sequence/sequencetype, I think we'll need a bit more work.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 7, 2022

Based on the above, it seems to me that this PR at least clarifies the current state of things... yes Virginia you need to pad things. But there are still some questions around things like opacity of "sequence" and "lexicographical ordering" that might be good to discuss in a follow-up issue/PR.
Net: I'm going to propose that we merge this PR "as is" and deal with the other stuff later.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 7, 2022

@markpeek need to address the "wrapping" issue too - not sure what to do about negative numbers

@duglin duglin changed the title Suggest padding for ints - sometimes Suggest padding for ints Jul 7, 2022
@duglin duglin mentioned this pull request Jul 13, 2022
@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 21, 2022

I'm going to suggest we adopt #1031 instead of this PR during today's call.

@duglin
Copy link
Copy Markdown
Collaborator Author

duglin commented Jul 21, 2022

Per 7/21 call, closing in favor of #1031

@duglin duglin closed this Jul 21, 2022
@duglin duglin deleted the sequence branch July 21, 2022 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"sequence" extension - integer encoded but lexicographically-orderable string?

5 participants