Skip to content

KAFKA-3832: Kafka Connect's JSON Converter never outputs a null value#6027

Merged
hachikuji merged 3 commits intoapache:trunkfrom
renatomefi:KAFKA-3832
Dec 28, 2018
Merged

KAFKA-3832: Kafka Connect's JSON Converter never outputs a null value#6027
hachikuji merged 3 commits intoapache:trunkfrom
renatomefi:KAFKA-3832

Conversation

@renatomefi
Copy link
Copy Markdown
Contributor

@renatomefi renatomefi commented Dec 12, 2018

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Context

https://issues.apache.org/jira/browse/KAFKA-3832

When using the Connect JsonConverter it's impossible to produce tombstone messages, thus impacting the compaction of the topic.

Solution

Allow the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message
When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like "{ "schema": null, "payload": null }", this way the sink connectors can maintain their functionality and reduces the BCC

Tests

The first commit only introduces the fix and shows that possibly no BC break is introduced; Please help me confirm that's the case
The second commit introduce tests for the fix

Notes

The fix could also be done at fromConnectData, but I felt like since the last path within the change is convertToJsonWithEnvelope this could avoid future internal calls to be also be faulty

@renatomefi renatomefi changed the title Kafka 3832: Kafka Connect's JSON Converter never outputs a null value KAFKA-3832: Kafka Connect's JSON Converter never outputs a null value Dec 12, 2018
@renatomefi renatomefi force-pushed the KAFKA-3832 branch 2 times, most recently from ef9d91c to d90d831 Compare December 13, 2018 11:48
Copy link
Copy Markdown
Contributor

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typo fixes inline. Not sure whether that PID file should be in there?

Comment thread connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java Outdated
Comment thread connect/json/src/test/java/org/apache/kafka/connect/json/JsonConverterTest.java Outdated
Comment thread connect/json/src/test/java/org/apache/kafka/connect/json/JsonConverterTest.java Outdated
@renatomefi
Copy link
Copy Markdown
Contributor Author

hey @gunnarmorling

Some typo fixes inline. Not sure whether that PID file should be in there?

Yes that happened by mistake! All fixed now, thanks so much for the early review!

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented Dec 17, 2018

@renatomefi, thanks for trying to address this long standing issue.

Even though I agree returning a null key or value is more desirable than returning an envelope with null schema and value, we need to be careful about backward compatibility and unnecessarily changing behavior that people might be relying upon. Previous to this, the JsonConverter always returned a non-null envelope, while now it's possible that the key/value is null. One might argue that the converters may always return a null key/value, and that this change doesn't stray from that. But it is possible that some people have set up pipelines that are defined with the current behavior in mind, and that introducing these changes might break such pipelines.

Have you thought about this? If so, why would we think it's safe to change the behavior? Would it be more appropriate to introduce this behavior via a new config property? The latter will of course require a KIP, but the purpose of a KIP is to make sure that changes are intentional and that all compatibility concerns have been addressed or at least properly considered.

@gunnarmorling
Copy link
Copy Markdown
Contributor

But it is possible that some people have set up pipelines that are defined with the current behavior in mind, and that introducing these changes might break such pipelines.

That's a good point, but OTOH I don't think this should stop us from fixing bugs (and hindering topic compaction is quite troublesome IMO). Esp. it seems quite simple to restore the old behavior using an (sink side) SMT that converts a null value into the previously emitted empty envelope structure, unless I miss anything? So I'd conclude that it's acceptable to do this change, accompanied by a note on how to restore the old behavior using an SMT in the migration notes.

via a new config property?

Yes, that'd be a tad simpler than implementing a custom SMT. I'd only suggest to have it default to the new, correct behavior, while sticking to the old one through opt-in. I.e. the emitted messages should be "correct" by default while enabling the "compatible" behavior through opt-in.

@renatomefi
Copy link
Copy Markdown
Contributor Author

Hello @rhauch, thanks so much for your considerations!

Have you thought about this? If so, why would we think it's safe to change the behavior? Would it be more appropriate to introduce this behavior via a new config property? The latter will of course require a KIP, but the purpose of a KIP is to make sure that changes are intentional and that all compatibility concerns have been addressed or at least properly considered.

Indeed your points are really good, I did take it in consideration, I can't really argue that people might be doing custom things, but on the line of the source and sink connectors the compatibility is kept since the ConnectRecord instance stays the same after this change, the problem would be for traditional consumers which might indeed, be relying on the current empty schema/value.
As @gunnarmorling said, the lack tombstone is a big deal, thus if we can't fix that I think the documentation should be aligned since so far it seems like it should just work.

In that case I'd suggest this change to be introduced in a new minor or even major, r if it's too dangerous indeed we could make it an option.

I've shipped an extended JsonConverter with this behavior so we can move one at my company (for future issue surfers): https://github.com/usabilla/kafka-connect-json-JsonConverter

Please let me know where should I go from here, if creating a KIP or waiting to merge this another bigger release.

Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks again, @renatomefi, for submitting this fix, and being patient while I looked into the change to satisfied my own concerns. I looked at quite a few connectors, just to see how they handled null schemas and values, and I don't think this will really affect how (or whether) they handle null values. The code still returns a non-null SchemaAndValue even if it gets a null serialized value or the old JSON version of that.

@ewencp or @hachikuji, would you mind reviewing and merging if/when it's acceptable to you? Thanks!

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Left a couple minor comments.

Comment thread connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java Outdated
Comment thread connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java Outdated
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@hachikuji hachikuji merged commit 964e2c5 into apache:trunk Dec 28, 2018
hachikuji pushed a commit that referenced this pull request Dec 28, 2018
…#6027)

When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC.

Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
hachikuji pushed a commit that referenced this pull request Dec 28, 2018
…#6027)

When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC.

Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
@hachikuji
Copy link
Copy Markdown
Contributor

@renatomefi By the way, I've added you as a contributor on the project JIRA. This allows you to assign tasks to yourself in the future. Thanks for the contribution!

@renatomefi renatomefi deleted the KAFKA-3832 branch December 28, 2018 19:32
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…apache#6027)

When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC.

Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants