Skip to content

Extensibility vs Forward Compatibility vs Structured data in JSON and binary formats #294

@rachelmyers

Description

@rachelmyers

This is a summary of what we've encountered while trying to support CloudEvents at Google. This is less an argument for or against property bags and more a summary of the concerns we have at Google and the tradeoffs the working group needs to make.

Proto disambiguation:

  • “Proto" roughly means two things.
    • There's a human readable ".proto" file in the Protobuf Interface Definition Language that is used to generate multiple event consumer libraries, encodings, etc.
    • One of those encodings is the proto binary format, and that’s the other thing “Proto” is used to mean.
  • To avoid confusion, I’m going to use “proto lang” to refer to the protobuf IDL, and “proto binary” or “binary format” to refer to the encoding in proto.
  • And these two meanings translate to the two concerns we have:
    • Proto binary has the concern of semantic versioning. The go example in this repo shows that adding a new attribute to the spec can be a breaking change, even in JSON.
      • We probably don’t want to create a JSON that can only be safely used with a SDK, because the WG would have to make sure there are high quality up to date SDKs in all languages.
    • For that reason we think that promotion of an extension is a breaking change.
  • Proto lang generates JSON specs
    • Every valid proto definition has corresponding valid JSON, but not all valid JSON has a corresponding proto definition.
    • The JSON library that a .proto file would generate would have no way to access JSON fields which weren't defined in the .proto file.
    • We’re asking to keep the extensions bag, because we’re trying to keep the CloudEvents json compliant with the Proto Lang defined JSON. We know that the JSON is clunkier and this makes the promotion process for JSON-only systems more complicated.
  • See the Protobuf Event Format for CloudEvents for more about Proto.

Forward compatibility

Forward compatibility and the semantic versioning that we use for future versions is less flexible in binary formats

  • Assumptions:
    • We want to be able to add new attributes to the spec in the future without breaking event consumers that still use the old version.
    • In semantic versioning, major changes are breaking, minor versions are non-breaking, and patch level versions are non-breaking bug fixes.
  • For event consumers using JSON implementations of the spec, when new top level attributes are added to future iterations of the spec
    • JSON keys are strings
    • If keys are uniquely named, there will not be collisions
    • Values are all json primitives. The WG hasn’t been concerned about specifying that a value will be an int64 number or date encoded as a string.
    • A future iteration of the spec that adds new main properties will be a non-breaking change for existing JSON event consumers, and only require incrementing the minor version
      • For example, an event consumer that’s consuming CE v1.0 JSON events can accept all new top level attributes of CE v1.1.
        • Applications won’t use the new attributes, but can now start
        • Gateways can pass the attributes through
  • Why should event consumers using proto binary not use “unknown fields” for extensions?
    • It’s not a good solution for forwards compatibility; we picked a random high number for the integer key to avoid collisions instead of the normal low-index integer.
    • It wouldn’t be convertible to JSON if it only has an int ID
    • It’s very hard to use two extensions; an event consumer would have to personally write a fourth proto lang that defines the base spec and the two extensions.
    • If the extensions are in a property bag instead of unknown fields, then you can’t move them from property bags to the known field in a minor change. (That’s why this is prompted by removing the extensions bag.)

The Extensibility problem:

  • Assumption:
    • We want to make the spec extensible by allowing arbitrary attributes, either by allowing extensions in a property bag or at the top level
  • If we provide extensibility by allowing arbitrary attributes in a property bag:
    • For event consumers using a binary format CEs, extensions can be passed through or used without special handling
      • For example, a vendor-specific extension, sampledRate is added to the extensions property bag. Event consumers have assigned extensions to be a struct, which has been specifically designed to handle arbitrary keys and values.
      • In the case of converting the binary format to JSON, a lot of work has gone into handling the conversion for structs, and we can take advantage of this
    • For event consumers using a JSON format CEs, extensions in the property bag are used or passed through without special handling.
      • It will be passed through by gateways in the same format
      • It will be deserialized in applications as any other valid JSON is deserialized
    • The path to promoting an extension to a top level attribute is a breaking change, and the process is the same for both binary and JSON formats.
      • For example, if sampledRate becomes so widely used that most CloudEvents are using it, and the WG decides to promote it to a top level attribute,
        • Event consumers that want to be backwards compatible and continue to accept CE v1.0 will continue to check for sampledRate in the extensions bag.
        • To support CE v2.0, event consumers will also need to look for sampledRate as a main spec level property.
      • Avoiding the breaking changes is a motivation to move away from the property bag and put all extensions at the top level
  • If we make the spec extensible by allowing arbitrary attributes at the top level:
    • Event consumers using JSON-only implementations:
      • Can handle arbitrary top-level attributes, as long as they are uniquely named
      • The promotion path is seamless; event consumers looking for a particular attribute will see no change between the attribute being an extension and being a normal spec attribute.
    • Event consumers supporting binary formats:
      • Cannot easily handle arbitrary top-level attributes
        • The keys for binary formats are integers, not strings
          • If an event consumer assigns an integer to an unknown attribute, it risks causing a collision, assigning the same integer to different attributes.
        • A possible workaround would be that the event consumer receives a JSON CE, assigns the known attributes to a integer-keyed attribute in the binary format, and assigns unknown attributes to a property bag in the binary format
          • This has the following problems:
            • It would require special casing CloudEvents from the way the binary format normally generates json.
            • In special casing it, we introduce security risks and the likelihood of mistakes
            • We decrease the likelihood that we can support CloudEvents across our hosted products and open source libraries
      • Alternatively, event consumers supporting binary formats could drop unknown top-level attributes
        • If the event consumer is a gateway, it won’t pass the unknown attributes through
        • If the event consumer is an application, which wasn’t using the unknown attributes, so won’t miss them.
        • For example, if two different CEs each include a different top level vendor-specific extension that is not defined in the spec, sampledRate and distributedTracing, the event ocnsumer doesn’t make this available.
        • This alternative is like saying “the solution to extensibility is that we don’t allow it”

Conclusion

  • What happens if the extensions bag is removed, and extensions become top level properties?
    • If Google publishes a .proto file so that we can support Cloud Events on gRPC, that proto lang will define both proto binary and JSON bindings. This means Google will be unable to avoid fracturing the spec.
    • Special casing CloudEvents for each service will be costly.
  • So we have two requests.
    • First, as a group we understand the conflict between forward compatibility, extensibility, and structured data, highlighted by Thomas’ example repo, and decide which design goals we value and what we’re willing to sacrifice. That is, that every new property to the spec be breaking change that increments the major version. And if the WG wants to sacrifice support for structs, we need to know that the goal of the spec is to be JSON-specific sooner rather than later.
    • Secondly, that json implementations of the CE spec be compatible with proto lang , which allows Google to avoid introducing a second JSON spec by virtue of having endorsed a .proto file. We can handle arbitrary attributes in expected property bags.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions