KAFKA-19634: Formalize nullable and non-nullable type distinctions in protocol specification#20614
KAFKA-19634: Formalize nullable and non-nullable type distinctions in protocol specification#20614chia7712 merged 24 commits intoapache:trunkfrom
Conversation
junrao
left a comment
There was a problem hiding this comment.
@DL1231 : Thanks for the updated PR. A few more comments.
-
Regarding the implementation of the nullable vs non-nullable types. We use 3 different approaches. (a) For bytes, we implement two independent classes BYTES and NULLABLE_BYTES. (b) For array, we use one class ArraryOf, which takes a nullable param. (c) For schema, we implement NULLABLE_SCHEMA as a subclass of SCHEMA. Is it possible to pick one approach to implement all nullable types in a consistent way? Perhaps (b) or (c) is a bit better since it allows more code sharing.
-
In the generated html, could we introduce notations for 4 different types of arrays (nullable vs non-nullable, compact vs non-compact)?
-
This is an existing issue and can probably be done in a separate PR. All static classes in Field except TaggedFieldsSection are not really being used. We can probably remove them.
|
@junrao : Thanks for your review.
I think (c) might be more suitable, as it not only allows for more code reuse but also enables better separation of logic between nullable and non-nullable types.
How about adding the array type after the []? For example: ConsumerGroupHeartbeat Response (Version: 0) => throttle_time_ms error_code error_message member_id member_epoch heartbeat_interval_ms assignment _tagged_fields
throttle_time_ms => INT32
error_code => INT16
error_message => COMPACT_NULLABLE_STRING
member_id => COMPACT_NULLABLE_STRING
member_epoch => INT32
heartbeat_interval_ms => INT32
assignment => NULLABLE_STRUCT [topic_partitions]COMPACT_ARRAY _tagged_fields
topic_partitions => STRUCT topic_id [partitions]COMPACT_ARRAY _tagged_fields
topic_id => UUID
partitions => INT32
Filed KAFKA-19822 to track this case.
I agree that we probably don't need to. As you rightly pointed out, an empty request or response serves no purpose. |
junrao
left a comment
There was a problem hiding this comment.
@DL1231 : Thanks for the updated. PR.
I think (c) might be more suitable, as it not only allows for more code reuse but also enables better separation of logic between nullable and non-nullable types.
What do you think about addressing this issue in a separate PR? The changes required to modify the implementation of all nullable types might be a bit more involved.
Sounds good.
topic_partitions => STRUCT topic_id [partitions]COMPACT_ARRAY _tagged_fields
How about we use [T], [T]?, (T) and (T)? to represent array, nullable array, compacted array and nullable compacted array, respectively?
Also, could we add the STRUCT keyword to the top level schema in the generated html?
Finally, could you rebase the PR to pick up a fix for flaky test #20713?
|
@junrao : Thanks for your review. Filed KAFKA-19833 to track this issue.
The generated HTML looks like this: ConsumerGroupHeartbeat Response (Version: 0) => STRUCT throttle_time_ms error_code error_message member_id member_epoch heartbeat_interval_ms assignment
throttle_time_ms => INT32
error_code => INT16
error_message => COMPACT_NULLABLE_STRING
member_id => COMPACT_NULLABLE_STRING
member_epoch => INT32
heartbeat_interval_ms => INT32
assignment => NULLABLE_STRUCT (topic_partitions)
topic_partitions => STRUCT topic_id (partitions)
topic_id => UUID
partitions => INT32
|
| for (Iterator<StructSpec> iter = structRegistry.commonStructs(); iter.hasNext(); ) { | ||
| StructSpec struct = iter.next(); | ||
| generateSchemas(struct.name(), struct, message.struct().versions()); | ||
| generateSchemas(struct.name(), struct, message.struct().versions(), Versions.NONE); |
There was a problem hiding this comment.
This is a bit problematic. A shared schema could be used by multiple fields. Some of them can be nullable and some others can be non-nullable. Not sure what's the best approach to address this issue. One potential way is to only support Schema for now. The generated code already handles null just with Schema. So far, for non-generated code usage, it seems that there hasn't been a need for a nullable schema. So, we could punt on that until there is a need.
There was a problem hiding this comment.
Thanks for pointing this out. Filed KAFKA-19870 to track it.
There was a problem hiding this comment.
@DL1231 : This one is important. So, I think we need to get this part right in this PR, instead of a followup one.
There was a problem hiding this comment.
So, we could punt on that until there is a need.
Sorry, I misunderstood your point earlier. I will address this issue asap.
There was a problem hiding this comment.
A shared schema could be used by multiple fields. Some of them can be nullable and some others can be non-nullable.
@junrao Pardon me, I may be misunderstanding you comment, but IIRC, the common struct does not support nullable property. So using Version.None should be good in this case
There was a problem hiding this comment.
@junrao If Y is nullable, then declare that field as new NullableSchema(X);
if Z is non-nullable, then reference X directly.
X, by default, should be declared as new Schema(). WDYT?
There was a problem hiding this comment.
Or, we could reject the json file if the common struct is used in both nullable and non-nullable definition. I think this may be reasonable since it should not be “common” if it has different definitions.
There was a problem hiding this comment.
Or, we could reject the json file if the common struct is used in both nullable and non-nullable definition. I think this may be reasonable since it should not be “common” if it has different definitions.
This seems arbitrary. If we allow a struct field to be null, it seems that we should allow it regardless of how the struct is defined.
If Y is nullable, then declare that field as new NullableSchema(X);
This feels awkward to me. The generated code explicitly generates code that handles nulls. So, NullableSchema(X) is unnecessary and will likely confuse people.
There was a problem hiding this comment.
The current method parameter uses Version.None by default, indicating that the common struct only supports Schema.
Should we keep the existing logic unchanged?
There was a problem hiding this comment.
If Y is nullable, then declare that field as new NullableSchema(X);
if Z is non-nullable, then reference X directly.
X, by default, should be declared as new Schema().
@DL1231 : Thinking a bit more. I feel the above solution that you proposed probably works the best. We will need to change the constructor of NullableSchema to take a Schema. We will use this approach for both shared and non-shared schema when generating the classes.
| for (Iterator<StructSpec> iter = structRegistry.commonStructs(); iter.hasNext(); ) { | ||
| StructSpec struct = iter.next(); | ||
| generateSchemas(struct.name(), struct, message.struct().versions()); | ||
| generateSchemas(struct.name(), struct, message.struct().versions(), Versions.NONE); |
There was a problem hiding this comment.
If Y is nullable, then declare that field as new NullableSchema(X);
if Z is non-nullable, then reference X directly.
X, by default, should be declared as new Schema().
@DL1231 : Thinking a bit more. I feel the above solution that you proposed probably works the best. We will need to change the constructor of NullableSchema to take a Schema. We will use this approach for both shared and non-shared schema when generating the classes.
|
@junrao Thanks for the review. I have updated the PR. The generated HTML looks like this: ConsumerGroupHeartbeat Response (Version: 0) => { throttle_time_ms error_code error_message member_id member_epoch heartbeat_interval_ms assignment }
throttle_time_ms => INT32
error_code => INT16
error_message => COMPACT_NULLABLE_STRING
member_id => COMPACT_NULLABLE_STRING
member_epoch => INT32
heartbeat_interval_ms => INT32
assignment => ?{ (topic_partitions) }
topic_partitions => { topic_id (partitions) }
topic_id => UUID
partitions => INT32
|
| RECORDS, COMPACT_RECORDS, new ArrayOf(STRING), new CompactArrayOf(COMPACT_STRING)}; | ||
| RECORDS, COMPACT_RECORDS, NULLABLE_RECORDS, COMPACT_NULLABLE_RECORDS, | ||
| new ArrayOf(STRING), new CompactArrayOf(COMPACT_STRING), ArrayOf.nullable(STRING), CompactArrayOf.nullable(STRING), | ||
| new Schema(), new NullableSchema(new Schema())}; |
There was a problem hiding this comment.
This is an existing issue. For COMPACT_BYTES and COMPACT_NULLABLE_BYTES, could you add a space in front of "Then N bytes follow. ?
Also, for all Array types, could we add a period at the end of the documentation to be consistent?
There was a problem hiding this comment.
Thanks very much for your detailed and patient review. I have updated the PR—please take another look when you have time.
yes, will take a look later! |
|
|
||
| @Override | ||
| public String leftBracket() { | ||
| return "?{"; |
There was a problem hiding this comment.
Given that the Array types documentation include the symbol, should it also be included in the documentation for consistency?
There was a problem hiding this comment.
Thanks for the review, I have updated the PR. PTAL
|
the flaky is already traced by https://issues.apache.org/jira/browse/KAFKA-18952 |
…protocol specification (apache#20614) This patch introduces a clear separation between nullable and non-nullable data structures. The key changes include: 1. Differentiates between nullable and non-nullable versions of `RECORDS`, `COMPACT_RECORDS`, and `Schema` types. 2. Adds explicit nullable type names for `ArrayOf` and `CompactArrayOf`. 3. Introduces a new, concise syntax for representing types: - `{}` for struct, `?{}` for nullable struct - `[T]` for array, `?[T]` for nullable array - `(T)` for compact array, `?(T)` for nullable compact array 4. Declares shared schemas as non-nullable `Schema` by default. A field that references a shared schema and is nullable must be explicitly declared as a new `NullableSchema(X)`. 5. Add UTs to verify the consistency between schema and message serialization. Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…protocol specification (apache#20614) This patch introduces a clear separation between nullable and non-nullable data structures. The key changes include: 1. Differentiates between nullable and non-nullable versions of `RECORDS`, `COMPACT_RECORDS`, and `Schema` types. 2. Adds explicit nullable type names for `ArrayOf` and `CompactArrayOf`. 3. Introduces a new, concise syntax for representing types: - `{}` for struct, `?{}` for nullable struct - `[T]` for array, `?[T]` for nullable array - `(T)` for compact array, `?(T)` for nullable compact array 4. Declares shared schemas as non-nullable `Schema` by default. A field that references a shared schema and is nullable must be explicitly declared as a new `NullableSchema(X)`. 5. Add UTs to verify the consistency between schema and message serialization. Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This patch introduces a clear separation between nullable and
non-nullable data structures. The key changes include:
RECORDS,COMPACT_RECORDS, andSchematypes.ArrayOfandCompactArrayOf.{}for struct,?{}for nullable struct[T]for array,?[T]for nullable array(T)for compact array,?(T)for nullable compact arraySchemaby default. A fieldthat references a shared schema and is nullable must be explicitly
declared as a new
NullableSchema(X).serialization.
Reviewers: Jun Rao junrao@gmail.com, Chia-Ping Tsai
chia7712@gmail.com