MINOR: Fixing null handilg in ValueAndTimestampSerializer#7679
Conversation
Since `ValueAndTimestampSerializer` wraps an unknown `Serializer`, the output of that `Serializer` can be `null`. In which case the line ```java .allocate(rawTimestamp.length + rawValue.length) ``` will throw a `NullPointerException`. This pull request returns `null` instead.
|
I'm not quite sure what's the etiquette for drawing attention to a pull request here. Thanks |
|
Why would this be valid? Returning To be fair, Kafka "core" does not really specify how serialization has to work, however, because of compacted topics and tombstone handling, Kafka Streams is more opinionated and requires a Seems we missed to add a unit test for this class -- can you create one? |
|
I'm not sure whether what I'm doing is correct or some sort of abuse, but as you said, there's no explicit contract for serialization. So I'm doing the following. I want to have something like Byte[] serialize(String topic, StoredValue<T> data) {
if (data.isEmpty()) return null;
else return ...; // serialize T
}
StoredValue<T> deserialize(String topic, Byte[] data) {
if (data == null) StoreValue.emtpy();
else return new Present(...); // deserialize T
}The point being that Given the code above, the current behavior of Is this a legitimate use-case? |
|
@mjsax, a reminder, in case this slipped through the cracks... Thanks |
|
Thanks for the reminder. I see what you are saying, but it won't fit into the KS "eco system". If you have a "tombstone" KS can only process the tombstone correctly, if the value is Not sure if there is a way to avoid that \cc @guozhangwang @bbejeck @vvcephei Do you have any good idea / advice how to handle this case? |
|
Since
@ncreep for your case, if the passed in object is null, the serde would be skipped and the null bytes would be returned, would that be sufficient? But I think for ValueAndTimestamp Serde, maybe we have some gaps to respect the above rules and caused unexpected results. It worth double checking if some fixes are needed. |
|
Thanks @mjsax and @guozhangwang for your responses. In the logic described by @guozhangwang number 3. would break my use-case. Since in the client code, I want to convert the The current issue is an instance of breaking 2.. So if it's agreed that the behavior for non-null objects is that they can still be serialized to |
|
@ncreep Thanks for the detailed description of your use case. As for 3) since it is a principle at the Kafka broker side that As for 2) yes I think we do need to fix the current behavior, and your PR looks good to me. cc @mjsax . |
|
Okay, thanks. |
|
I am still wondering (because of issue (3)) if we should allow it (ie, if we allow it, don't we mask an actual issue)? If @ncreep's user code uses In Kafka, I am just not 100% sure, if this change is "safe" or not atm. I think for plain consumers/producers users can do whatever they want, but in KS, we should not allow to serialize non-null object to |
|
The current PR is only on serializers, that when the inner serializer returns null bytes (indicating a delete tombstone), we would skip wrapping the null bytes (which would throw NPE) with the timestamp raw bytes. I think this is the right fix to do. Beyond that, I agree @ncreep use case serializing non-null objects to null bytes should be adjusted (as I recommended in previous comment). |
But this does (or should) never happen in Kafka Streams -- if we have a The issue only happens, because there is no
Hence, the violation of the null-to-null mapping pattern is (from my point of view) the root cause for the issue. My concern is, that we would mask the root cause what could lead to other problems at different parts in the code? |
|
@mjsax, ignoring for a moment my opinion about the validity of my use-case. I think that as a client of the code being discussed, the error-reporting in this case makes it more difficult to debug the root cause. If you want to enforce the Back to me who wants my use-case to work. I would think that there should be a clear separation between the in-memory representation of a value and the serialized representation. It's a "coincidence" that I can also report, that from my (non-extensive) experience, the |
|
I agree that we could throw a more informative exception. I also agree, that it would be great to separate the representation of objects and their serialized format -- however, this is something that affects the whole system, not just Kafka Streams. Hence, if we want to change it, we need to start at the broker level: instead of representing tombstones as Hence, don't get my objection on this PR the wrong way. I totally agree with you, that the end-state you desire makes sense. I only disagree that we should merge this PR as it does not get us closer to this end state but might only introduce subtle bugs that are even harder to debug for end-users. |
|
Like @mjsax mentioned if we want to not using Back to this PR, I think it still makes sense as to align with is a public interface so in the future we add other callers than its own overloaded OR: if we think this function should not be public, we should just make it private and then we do not this check. |
|
I'm trying to achieve the same goal as @ncreep and I faced the same problem. I'm using scala and I want to use |
|
Personally, I am not convinced. But I won't block this PR any longer. To repeat my original review comment: Can we add a unit test for this case (please also add a comment why this want/need to support this). |
fd4750c to
4f248cb
Compare
|
Thanks @mjsax and @guozhangwang for taking the time to discuss this pull request. I've added tests and comments. |
|
Thanks a lot @ncreep! Sorry for the long discussion and thanks a lot for contributing! @guozhangwang Should we cherry-pick this to |
Conflicts: * build.gradle: moved avro plugin definition below newly added test retry plugin. * apache-github/trunk: MINOR: further InternalTopologyBuilder cleanup (apache#8046) MINOR: Add timer for update limit offsets (apache#8047) HOTFIX: Fix spotsbug failure in Kafka examples (apache#8051) KAFKA-9447: Add new customized EOS model example (apache#8031) KAFKA-8164: Add support for retrying failed (apache#8019) HOTFIX: checkstyle for newly added unit test KAFKA-9261; Client should handle unavailable leader metadata (apache#7770) MINOR: Fix typos introduced in KIP-559 (apache#8042) MINOR: Fixing null handilg in ValueAndTimestampSerializer (apache#7679) KAFKA-9113: Clean up task management and state management (apache#7997) MINOR: fix checkstyle issue in ConsumerConfig.java (apache#8038) KAFKA-9491; Increment high watermark after full log truncation (apache#8037) KAFKA-9477 Document RoundRobinAssignor as an option for partition.assignment.strategy (apache#8007) KAFKA-9074: Correct Connect’s `Values.parseString` to properly parse a time and timestamp literal (apache#7568) KAFKA-9492; Ignore record errors in ProduceResponse for older versions (apache#8030)
…t-for-generated-requests * apache-github/trunk: (410 commits) KAFKA-8843: KIP-515: Zookeeper TLS support MINOR: Add missing quote for malformed line content (apache#8070) MINOR: Simplify KafkaProducerTest (apache#8044) KAFKA-9507; AdminClient should check for missing committed offsets (apache#8057) KAFKA-9519: Deprecate the --zookeeper flag in ConfigCommand (apache#8056) KAFKA-9509; Fixing flakiness of MirrorConnectorsIntegrationTest.testReplication (apache#8048) HOTFIX: Fix two test failures in JDK11 (apache#8063) DOCS - clarify transactionalID and idempotent behavior (apache#7821) MINOR: further InternalTopologyBuilder cleanup (apache#8046) MINOR: Add timer for update limit offsets (apache#8047) HOTFIX: Fix spotsbug failure in Kafka examples (apache#8051) KAFKA-9447: Add new customized EOS model example (apache#8031) KAFKA-8164: Add support for retrying failed (apache#8019) HOTFIX: checkstyle for newly added unit test KAFKA-9261; Client should handle unavailable leader metadata (apache#7770) MINOR: Fix typos introduced in KIP-559 (apache#8042) MINOR: Fixing null handilg in ValueAndTimestampSerializer (apache#7679) KAFKA-9113: Clean up task management and state management (apache#7997) MINOR: fix checkstyle issue in ConsumerConfig.java (apache#8038) KAFKA-9491; Increment high watermark after full log truncation (apache#8037) ...
Since
ValueAndTimestampSerializerwraps an unknownSerializer, the output of thatSerializercan benull. In which case the linewill throw a
NullPointerException.This pull request returns
nullinstead.Not sure where to place tests for this, any suggestions would be appreciated.
Thanks