Currently, binary values get mapped to base64ed strings in JSON. The problem is, there's no solid way to know, when converting JSON back to a CloudEvent, whether a JSON string is a CloudEvents String type, or a CloudEvents Binary type that has been base64ed. The result is an opportunity for incompatibilities between systems, since the behaviour is unspecified, different systems may handle it in different ways (eg, some might always treat strings as strings, some might have heuristics that determine if something looks like a base64 value, etc).
Now, for known attributes that are Binary, that information can be used to handle it. But for unknown attributes (eg, extension attributes that a particular implementation does not understand but may pass along as is), there's no way to correctly handle them. Also, for Any typed attributes, there's no way to correctly handle them.
I propose therefore that we change the way binary values are encoded, such that binary values are encoded using a JSON object with a single member with a special name, eg:
"someanytypedattribute": {
"__ce-binary": "... <base64ed content here> ..."
}
Of course, this would prevent encoding any Maps or JSON values that had a key of __ce-binary, but the risk of that ever occurring in the real world is next to zero. If the above were adopted, then it would always be unambiguous as to whether a particular field was binary or a string. The same could be used for the data, and it would mean decoders would not have to rely on complex heuristics based on the content type to determine whether the value is meant to be binary, or text, producers could make whatever decision they want as to whether the data is binary, text or JSON, and the consumer will always be able to correctly consume it without the risk of accidentally treating a base64ed bytes as a string, or vice versa.
Currently, binary values get mapped to base64ed strings in JSON. The problem is, there's no solid way to know, when converting JSON back to a CloudEvent, whether a JSON string is a CloudEvents String type, or a CloudEvents Binary type that has been base64ed. The result is an opportunity for incompatibilities between systems, since the behaviour is unspecified, different systems may handle it in different ways (eg, some might always treat strings as strings, some might have heuristics that determine if something looks like a base64 value, etc).
Now, for known attributes that are Binary, that information can be used to handle it. But for unknown attributes (eg, extension attributes that a particular implementation does not understand but may pass along as is), there's no way to correctly handle them. Also, for Any typed attributes, there's no way to correctly handle them.
I propose therefore that we change the way binary values are encoded, such that binary values are encoded using a JSON object with a single member with a special name, eg:
Of course, this would prevent encoding any Maps or JSON values that had a key of
__ce-binary, but the risk of that ever occurring in the real world is next to zero. If the above were adopted, then it would always be unambiguous as to whether a particular field was binary or a string. The same could be used for the data, and it would mean decoders would not have to rely on complex heuristics based on the content type to determine whether the value is meant to be binary, or text, producers could make whatever decision they want as to whether the data is binary, text or JSON, and the consumer will always be able to correctly consume it without the risk of accidentally treating a base64ed bytes as a string, or vice versa.