Add identityLabels attribute#258
Conversation
| * Type: `Map` | ||
| * Description: A list of "key-value" pairs providing information about the | ||
| event, which are used by an event-based application. The "key-value" pairs | ||
| MUST be in a flat structure. The value MUST be a string. There MUST be no |
There was a problem hiding this comment.
I think the sentence about "flat structure" is redundant and can be removed since the value MUST be a string implies it. If not, then I'm not sure what "flat structure" means and it may need to be defined.
There was a problem hiding this comment.
isn't "MUST be a string" even more restrictive than "flat structure" since it also precludes numeric or boolean values?
| ``` JSON | ||
| application-properties | ||
| { | ||
| "building address": "12 Main Street, Copenhagen, Denmark, 123456", |
There was a problem hiding this comment.
I would suggest that we don't allow spaces in the key names (just alphanumeric) otherwise serializing these as http headers will be tricky.
There was a problem hiding this comment.
Doug, any other restriction on the key names?
| "building address": "12 Main Street, Copenhagen, Denmark, 123456", | ||
| "floor": "2", | ||
| "apartmentId": "345", | ||
| "sensor-location": "room1.window2" |
There was a problem hiding this comment.
If we adopt @clemensv's proposal for serializing maps then the HTTP header for this would be:
CE-application-properties-sensor-location - seems kind of long. And yes I'm going to revisit my "white whale" here.... :-) I don't see why CE-application-properties-sensor-location would be easier to parse or understand than just CE-sensor-location. Nor does the description text here explain why I would use this property instead of an extension, in fact, given the length of these names I would opt for the shorter one every time just to not be annoyed. There is nothing that would prevent this property from being an extension and work the exact same way, so we're giving people two choices for almost the same thing (w/o a clear reason to use each), and I think that's going to lead to confusion and interop isuses when the receiver expects them in one but the sender puts it in the other by mistake.
| "key-value" pair to correlate multiple events (from different event sources) | ||
| associated with a serverless application workflow. | ||
| * Constraints: | ||
| * REQUIRED |
There was a problem hiding this comment.
Leaving a question here that I also brought up in the call, so that it can be clarified in the text: If my application doesn't need to correlate events, what should go in this field, since it's required for the event?
There was a problem hiding this comment.
I don't think this should be a REQUIRED attribute since not all events need this - for example, we didn't use/need this in our demo at KubeCon so I'm not sure why it needs to be required.
There was a problem hiding this comment.
An event could be used by multiple applications. If one of the applications needs to correlate events and the information is not in the event, then that application can not work. Since the event producer and the serverless application or serverless platform are different entities, the event producer may not be able to predict how the applications or platforms will use it. To support those applications or platforms that need to correlate events, we should require the event producer to put the info in the event attribute
There was a problem hiding this comment.
The "required" is not from the event's perspective, but from the application/platform's perspective, which need to use the info for correlation purpose.
There was a problem hiding this comment.
not every application is going to be correlating events, and even when correlation is required, the source (and eventType) attribute will often be sufficient
There was a problem hiding this comment.
Mark: Yes not every application is going to correlate events. But if one application does it, the information MUST be there. Can we guarantee the information will be in the source attribute ?
There was a problem hiding this comment.
I remember we discussed this before. It is mandatory but can be empty
There was a problem hiding this comment.
If it can be empty, then I don't see the need to require an empty bag.
|
@cathyhongzhang given that we are potentially moving extensions to the top level #225 can you explain why this is needed? Couldn't functionality like this simply be provided in a custom extension? To me this honestly seems duplicative of the original |
| [RFC 2046](https://tools.ietf.org/html/rfc2046) | ||
| * For Media Type examples see [IANA Media Types](http://www.iana.org/assignments/media-types/media-types.xhtml) | ||
|
|
||
| ### application-properties |
There was a problem hiding this comment.
If this is intended to be a general purpose bucket for custom key/value pairs that are not official spec attributes or extension attributes, I actually think "properties" was a better name. The name "application-properties" implies that the properties describe the application, but most of the examples that have been discussed refer to properties that describe the source of the event and/or the event itself.
That said, I'm personally in favor of allowing arbitrary properties as top-level attributes since the extensions have been elevated and therefore serialization implementations etc will already need to deal with the classification of top-level keys (core, extensions, or custom).
| "key-value" pair to correlate multiple events (from different event sources) | ||
| associated with a serverless application workflow. | ||
| * Constraints: | ||
| * REQUIRED |
There was a problem hiding this comment.
this should be OPTIONAL since not all events will carry additional information beyond the source, eventType, and eventId attributes (which are all REQUIRED)
(in many cases, the "source" attribute alone would provide sufficient information for correlating multiple events "from different event sources")
There was a problem hiding this comment.
@markfisher's comment is something I was going to bring up on the call but we ran out of time. @cathyhongzhang's your examples mentioned during the call really made it sound like you were providing more information about "the source", which made me wonder why the "source" attribute wouldn't provide (at least some of) this information?
There was a problem hiding this comment.
I am not sure if the source will for sure provide the needed info for correlation. But I agree that if the source does provide the info, the info can be used for correlating multiple events.
|
I'd like to look at this differently, rather than asking to explain when someone would use one extensibility point over another, I'd like to understand what use-case breaks if For example, what wouldn't work if the CloudEvent came into an event consumer with an HTTP header of: instead of: In order for someone to do something meaningful with this data, they would need to know about the key name in advance (even if its dynamically specified). So,why does the presence of |
|
What if the event does not have HTTP header or some other transport header? If it is a storage event or database event, where should these information go? BTW, I have modified the description. Would you like to take a look at it to see if it provides better clarification? |
|
Brain: I understand your confusion. I have modified the description. Would you like to take a look at it to see if it provides better clarification? |
|
@cathyhongzhang I was using HTTP just as an example, I think the same rules/concerns apply no matter what transport we use. In all cases the only difference between how "extensions" and "application-properties" appear would be purely syntactical. The real purpose behind any of our properties is in its definition, not in its location. IMO location-specific semantics only makes sense when the exact same property name can (and should) appear in more than one location at the same time and each use implies a different set of semantics. For example, do we want to encourage/allow this type of CloudEvent: Having "room" appear in 3 different locations with potentially 3 different semantics is, IMO, asking for an interop issue to happen. I think it would be much cleaner to only have one definition of "room" and if we really did need to have 3 different meanings of "room" then each should be more explicit in their naming so that they can all live at the same location w/o conflicts in the serialization. |
| duplication of keys. | ||
| * Constraints: | ||
| * REQUIRED, but can be empty if all unique identification information is | ||
| already provided in the source etc. attributes. |
There was a problem hiding this comment.
Is this still considered as REQUIRED given that it can be empty?
| already provided in the source etc. attributes. | ||
| * Examples: | ||
| ``` JSON | ||
| correlationProperties |
There was a problem hiding this comment.
I can understand the needs to put correlation in the spec but I still worry about this "open" format that puts no restriction on what can be put in this key-value map. Is there no way that we can provide pre-defined fields? If there is no way, then this does not feel like something that can be defined in a spec.
|
@cathyhongzhang can you fix the DCO issue? You need to make sure all of your commits are signed. Also, can you elaborate on how you see the receiver of this information using it? By that I mean, since there could be 1000 labels in this bag do you expect the receiver to have a list of well-known labels it will search for and process (and ignore the rest)? |
|
I'm still struggling with this new bag of extensions. Often people look at their particular use cases as special and therefore some "generic mechanism" isn't good enough for their needs. But, of course, this can't possibly be true otherwise we'd end up with multiple copies of the generic mechanism just with each use case's name/type associated with it. Today it's "correlation", but tomorrow someone else will come up with their own special reason for why they need to put all properties related to their use case into a special bag of extensions because, after all, if correlation did it why can't they? So, my questions:
Sorry for all of the question, but I think this will really help us develop the right text to explain why this bag is here and what its purpose in life is, and why its extensions different from all other extensions. |
|
@duglin I see application-properties or correlationLabels as properties that are specific for some application domain. They only make sense in the context of a specific application or deployment. I would not see them as extension to the eventing domain and therefore not as candidates for standardization. The analogy to the extension of HTTP headers may be problematic, as HTTP is point-to-point, while events do not have a destination and may be propagated into environments, the event producer might not have foreseen. |
|
@brianneisler I have updated the PR to add a better description on what labels should go into this attribute. Hope now it is clear that it is not a place holder for any random key-value pairs as the case with the "extension" bag. Correlation labels are not vendor specific extensions or any transport specific extension. It is a common functionality that are needed by many serverless applications that involve multiple events from different event sources. Without the info, those serverless applications can not work. I think it is a core attribute of the CloudEvents. |
I do not understand this. If we're not defining the labels as part of the spec (which we're not) then they must be extensions. @cathyhongzhang can you please explain how you see these properties being used? I know they're for correlation but how does a consumer know what properties in this bag to look at when its doing the correlation? All of them or just certain ones? And if just some, how does it determine which ones? Does each app pick which ones it cares about? |
|
|
@duglin I think the consensus of the F2F meeting is that we will define the correlationLabel (previously is named application-properties) in the spec. |
|
@cathyhongzhang On 2), thanks and that is exactly what I assumed, which tells me that no matter where these properties sit (top-level, in "extensions" or in "correlationLabels") all that matters is the name and the infrastructure can find it because they're looking for a very specific name. So if we collapsed everything down to one spot then they can find "sensor-location" at that one spot, it doesn't need to be under "correlationLabels" for them to find it. If there's a concern about conflicting names then the same concern is true not matter where it sits since at each of those locations its an open-ended bag of properties that could have more than one piece of code populating it. Multiple bags/locations doesn't avoid the issue - it just introduces more confusion. |
|
@duglin Your questions apply to some other attributes in the spec. Why is eventTime in the spec? Why is contentType in the spec? |
|
@cathyhongzhang I think you're mixing the definition of the labels with the definition of a bag for the labels. These are two very different topics. If, as a group, we believe that some correlationLabel (e.g. Its purpose (correlation, identification of the source, identification of the shape of the data, etc...) doesn't matter to me - its all about how often we think it will be used that determines it. ie. All the time == REQUIRED, most of the time == OPTIONAL, all others == extension. But that's about the definition of the labels/properties On the topic of "the bag", any of the correlationLabels/properties could be used for more than just correlation - for example, any of them might be used for routing as well. Do we expect people to duplicate it into multiple bags? I hope not. And how would the event source even know this in advance to put it in both spots? It will put each property into a bag based on its own view of how things "are meant to be used", which may, or may not, align with the consumer's view. To ensure maximum interop there should be just one place to put the data and let the receiver of it decide how it'll be used - all we need to do is make it easy to find. |
|
@duglin If we do not define a clear place in the standard spec for the event producer to put these event identification information, then most probably the event producer would forget or do not know how to put the info into the event. The result is that many serverless applications and use cases will not be supported. But if we define the format and type of the identification labels clearly in a well-defined attribute, then it will help both the event producer and the event consumer. Due to the diverse nature of the events, the identification labels themselves can not be predefined, and thus need to be grouped into a bag. But that does not mean these identification information/labels are not important or not common enough for serverless applications and should not be added into the spec. As to the duplication concern, since these are event identification labels, they should not be duplicated into other places. Why does anyone want to duplicate the same info into multiple places? To avoid the duplication, I will modify the PR to clarify that correlation is just one usage of the labels. |
|
@cathyhongzhang defining a bag will not guarantee or even encourage people to use it because the bag by itself is meaningless. Its the data that goes in the bag that's meaningful and if the producer knows what info it needs to put in the bag (and why) then it'll do it no matter where we tell them to put those properties - bag or no bag. re: duplication - if you try to put a semantic meaning on a bag and there's a property that might be used for more than one purpose you have no choice but to duplicate it because you don't know for which purpose the consumer will look at that property. Some may view it for correlation purposes and some may view it for routing - the producer has no idea how it will be used. You're making a lot of assumptions about things that I just don't think can be guaranteed - and if they can be then you're talking about a very well defined producer and consumer that know about each other and I would then (again) claim we don't need a bag because those same properties can be put in 'extensions' and they both will know exactly where they live. Show me an example CloudEvent that will not work with the correlationLabels as extensions? And why it won't work if the consumer (as you said) knows which labels to look for. |
|
@duglin I am afraid that I can not agree with your points. A well-defined attribute in the spec (does not matter whether it is in bag format or not bag format) will make both the producer's life and the consumer's life easier. The producer will know where to put the identification info and in what format to put it. The consumer will know where to look for it and how to decode it. Why do you want to duplicate the identification labels in multiple bags because they are used for multiple purposes? If we define the place to put the identification information (it is not restricted to correlation usage), it does not make sense to duplicate them in other random places. I am sorry about the previous attribute name which might confuse you. The whole purpose of defining this in the spec is because the producer and the consumer do not know each other. You are putting your assumption onto me :-) |
|
@duglin I believe it would be very useful to have a optional top-level attribute that is used by producers and consumers to identity a set of events that are related. This seems to be common behavior that can be helpful for several different use-cases. This "well-known" attribute would certainly help interoperability as developers would not need to use their own bespoke extension attribute. |
|
@cathyhongzhang i think one of your commits is missing the signing stuff. If you rebase/squash it down to one commit then i think it should be happy. Could you update the other docs to pick up the new name too? |
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
|
Expanding on my prior comments, I propose to close this PR without adopting it, and instead explicitly move extensibility (including extensibility with Map-typed attributes as shown here) to the top-level of the CE with event formats introducing appropriate extensibility constructs as required by their respective specific needs. See #289 and PoC here https://github.com/clemensv/cloudeventsformatdemo |
|
I think it will benefit event producers, event consumers, and application developers to add a clearly defined identityLabels bag syntax and semantics at the top level in the spec.md. so that event producers have a consistent place to put the identity key-value pairs and event consumers just need to parse that bag to get the required info. Otherwise different event producers could use different names for this bag and the event consumers will have to parse different bags for different events to get the required identity key-value pairs. It will also be complicated if the serverless application developers need to specify the correlation labels. |
|
@cathyhongzhang you are correct that having more than one bag may lead to confusion - which is exactly why all of the properties that you think should go into the "identificationLabels" bag should just be put with all other extensions. Then there's just one place for people to look and there's no confusion as to whether or not each is for "identification" purpose or some other purpose and put into other bags by mistake (or in addition to the one you're proposing). Consumer just look for the properties they're interested in wherever we decide extensions should be place - no confusion at all. |
|
@duglin I think in previous meeting discussions, the group agree that the spec should allow top-level attributes that are in the format of bags. That is why this attribute is proposed. Having an identityLabel bag with well-defined type will enable the event producer to organize event identity labels and have a central place to put them in, the transport to easily map to its format, the consumer to easily decode them or ignore them, and the serverless app developer to build correlation table. Without this bag, these labels will be mixed with all other event extension attributes, which is confusing and will lead to proliferation of top-level attributes. Just to clarify this bag is only for identityLabels of the event, not for other information of the event. |
n3wscott
left a comment
There was a problem hiding this comment.
I think this change misses the mark for broad appeal. I would not recommend inclusion into the top level spec.
|
|
||
| ### extensions | ||
|
|
||
| ### idendityLabels |
There was a problem hiding this comment.
2 problems: 1 - there is a typo: identity. 2 - This name is too prescriptive to the problem you are trying to solve. You want to add labels and use them as identity, but at the top level there is only a subset of events that require identity. But all events could opt into labeling.
just call it labels
There was a problem hiding this comment.
Will correct the typo. Thanks. Labels sounds too generic and the event producer can put anything inside this bag, which defeats the purpose of this attribute. This attribute is optional, so if an event producer does not have additional identity labels, then the event will not have this attribute field.
There was a problem hiding this comment.
This solves the special case of identity, but not the general case of labeling.
| See the [Extensions](extensions.md) document for a list of possible | ||
| attributes. | ||
| * Description: This is a place for custom key-value pairs a producer or | ||
| middleware wants to include to provide additional identity information |
There was a problem hiding this comment.
A producer would include, the middleware would inject.
There was a problem hiding this comment.
I would argue the action for both is inject.
| See the [Extensions](extensions.md) document for a list of possible | ||
| attributes. | ||
| * Description: This is a place for custom key-value pairs a producer or | ||
| middleware wants to include to provide additional identity information |
There was a problem hiding this comment.
identity is an attribute (identity labels belong to this attribute category). The identify labels inside this bag can be used for many different use cases. For example, a serverless app for burglary detection can use it, a serverless app for travel request/approval can use it, etc..
There was a problem hiding this comment.
Yes, I understand the examples, and I am saying they are not enough to justify addition to the top level of the spec.
| * Description: This is a place for custom key-value pairs a producer or | ||
| middleware wants to include to provide additional identity information | ||
| about the event. The syntax and semantics of each specific label | ||
| inside this map are open for each event type producer to define, |
There was a problem hiding this comment.
for each event type producer to define
seems like in saying this, you have lost the value of the identity aspect if that is what this is for.
There was a problem hiding this comment.
We have to leave what goes inside this identity bag open since different types of events could have different labels and we can not predict all the labels. What we know is that an event could have an identity bag of labels, which serverless platforms (we can view the platform as an event consumer) need to support many serverless applications.
| about the event. The syntax and semantics of each specific label | ||
| inside this map are open for each event type producer to define, | ||
| but the keys of this map are of type string and | ||
| the values are restricted to be of simple types only. |
There was a problem hiding this comment.
simple types
better to say key is a string, value is a string, and add type, which tells the processor how to process the value.
| * Examples: | ||
| * authorization data | ||
| ``` JSON | ||
| properties |
There was a problem hiding this comment.
properties being a map is still a bad idea. See my previous comments.
There was a problem hiding this comment.
OK. Existing spec defines a list of types that can be used by the attributes. Map is the closest one for the need of this attribute, that is why there is further clarification inside the description on the type of the value.
There was a problem hiding this comment.
if these labels are a map of LabelKey: LabelValue then adding metadata on the label is impossible without complex logic to point back to the key.
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
|
This seems very similar to #161 where I offered |
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
Signed-off-by: cathyhongzhang <cathy.h.zhang@huawei.com>
|
+1 to @clemensv's comment on closing this with no action. I think #301 covers what's needed. For any particular property, its ability to be used for "identification" purposes is not controlled by whether its is wrapped in an "identityLabels" bag, but rather by its definition/semantic meaning. Therefore the wrapper become not only unnecessary but potentially an interop problem if people are not clear on when something does, or does not, fall into any particular bag's category. We can avoid this entirely by just not having the bag at all. |
|
This PR is replaced by the other PR |
Signed-off-by: cathyhongzhang cathy.h.zhang@huawei.com