KAFKA-3832: Kafka Connect's JSON Converter never outputs a null value#6027
KAFKA-3832: Kafka Connect's JSON Converter never outputs a null value#6027hachikuji merged 3 commits intoapache:trunkfrom
Conversation
ef9d91c to
d90d831
Compare
gunnarmorling
left a comment
There was a problem hiding this comment.
Some typo fixes inline. Not sure whether that PID file should be in there?
d90d831 to
3a65687
Compare
|
hey @gunnarmorling
Yes that happened by mistake! All fixed now, thanks so much for the early review! |
|
@renatomefi, thanks for trying to address this long standing issue. Even though I agree returning a null key or value is more desirable than returning an envelope with null schema and value, we need to be careful about backward compatibility and unnecessarily changing behavior that people might be relying upon. Previous to this, the JsonConverter always returned a non-null envelope, while now it's possible that the key/value is null. One might argue that the converters may always return a null key/value, and that this change doesn't stray from that. But it is possible that some people have set up pipelines that are defined with the current behavior in mind, and that introducing these changes might break such pipelines. Have you thought about this? If so, why would we think it's safe to change the behavior? Would it be more appropriate to introduce this behavior via a new config property? The latter will of course require a KIP, but the purpose of a KIP is to make sure that changes are intentional and that all compatibility concerns have been addressed or at least properly considered. |
That's a good point, but OTOH I don't think this should stop us from fixing bugs (and hindering topic compaction is quite troublesome IMO). Esp. it seems quite simple to restore the old behavior using an (sink side) SMT that converts a null value into the previously emitted empty envelope structure, unless I miss anything? So I'd conclude that it's acceptable to do this change, accompanied by a note on how to restore the old behavior using an SMT in the migration notes.
Yes, that'd be a tad simpler than implementing a custom SMT. I'd only suggest to have it default to the new, correct behavior, while sticking to the old one through opt-in. I.e. the emitted messages should be "correct" by default while enabling the "compatible" behavior through opt-in. |
|
Hello @rhauch, thanks so much for your considerations!
Indeed your points are really good, I did take it in consideration, I can't really argue that people might be doing custom things, but on the line of the source and sink connectors the compatibility is kept since the ConnectRecord instance stays the same after this change, the problem would be for traditional consumers which might indeed, be relying on the current empty schema/value. In that case I'd suggest this change to be introduced in a new minor or even major, r if it's too dangerous indeed we could make it an option. I've shipped an extended JsonConverter with this behavior so we can move one at my company (for future issue surfers): https://github.com/usabilla/kafka-connect-json-JsonConverter Please let me know where should I go from here, if creating a KIP or waiting to merge this another bigger release. |
rhauch
left a comment
There was a problem hiding this comment.
LGTM. Thanks again, @renatomefi, for submitting this fix, and being patient while I looked into the change to satisfied my own concerns. I looked at quite a few connectors, just to see how they handled null schemas and values, and I don't think this will really affect how (or whether) they handle null values. The code still returns a non-null SchemaAndValue even if it gets a null serialized value or the old JSON version of that.
@ewencp or @hachikuji, would you mind reviewing and merging if/when it's acceptable to you? Thanks!
hachikuji
left a comment
There was a problem hiding this comment.
Thanks for the patch. Left a couple minor comments.
3a65687 to
07a274e
Compare
07a274e to
143cdf6
Compare
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the fix!
…#6027) When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC. Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
…#6027) When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC. Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
|
@renatomefi By the way, I've added you as a contributor on the project JIRA. This allows you to assign tasks to yourself in the future. Thanks for the contribution! |
…apache#6027) When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC. Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Committer Checklist (excluded from commit message)
Context
https://issues.apache.org/jira/browse/KAFKA-3832
When using the Connect JsonConverter it's impossible to produce tombstone messages, thus impacting the compaction of the topic.
Solution
Allow the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message
When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like
"{ "schema": null, "payload": null }", this way the sink connectors can maintain their functionality and reduces the BCCTests
The first commit only introduces the fix and shows that possibly no BC break is introduced; Please help me confirm that's the case
The second commit introduce tests for the fix
Notes
The fix could also be done at
fromConnectData, but I felt like since the last path within the change isconvertToJsonWithEnvelopethis could avoid future internal calls to be also be faulty