Skip to content

What correlation should Cloud Events support? #195

@inlined

Description

@inlined

We've talked about correlation in a few PRs (#128, #138, and other discussions I can't find based on sequence numbers).

I'd like to move discussions here where we can debate scope and approach in a common nexus to make sure all discussions have the appropriate scope.

I propose that there are two kinds of correlation possible:

  1. Sequence correlation. Events may be related to others by ordering or causality. Ordered sequence correlation may be achieved using the eventTimestamp and source fields, though limited precision and clock skews may introduce error; a vector clock would fix this if we wanted to officially support the use case. Causality sequencing would require a new context attribute like "causedBy", or a weaker sounding property like "precededBy" to handle sequence correlation as well. The gotcha with these headers is that they cause head-of-queue blocking and I'm not sure what a system should do if the precededBy event were never received.
  2. Attribute correlation. Events could expose data that is possibly redundant with fields within data that are explicitly transparent to routing software. A subscription could chose to subscribe to limited event streams by filtering for only matching IDs or could enforce affinity based on that ID.

In my mental model, if one were to use SQL over a stream of CloudEvents, case (1) is an ORDER_BY clause and case (2) allows a WHERE or GROUP_BY clause. An individual query could compose (1) and multiple instances of (2).

Note, that I think it is inappropriate for (2) to actually pre-determine the correlation. The actual GROUP_BY or WHERE clause should be part of the query, not part of the data structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions