[WIP] KAFKA-7157: Connect TimestampConverter SMT doesn't handle null values by Nimfadora · Pull Request #6446 · apache/kafka

Nimfadora · 2019-03-14T17:08:03Z

Goal

Introduce null-value handling to TimestampConverter SMT.

Details

The existing org.apache.kafka.connect.transforms.TimestampConverter does not handle null values. When null value is passed to SMT the NPE is thrown. This PR introduces null vallue handling for this SMT.

schemaless null value will result in null record value
schemaless null complex object will result in null record value
null struct(has schema) will result in null record value with optional struct schema
null struct(has schema) field will result in record value with null field value and optional struct schema for that field

Important

We consider that original schema with null value will have optional modifier. Maybe we should be smarter and decide on the value of optional modifier based on the field actual nullability.

Testing

Unit tests are provided

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

Nimfadora · 2019-03-14T17:09:04Z

@rhauch could you please review this pr?

Nimfadora · 2019-03-18T14:45:48Z

@ewencp as this SMT is authored by you, could you please take a look at these changes?

Nimfadora · 2019-04-16T09:30:44Z

@rhauch @ewencp could you please take a look at this straight-forward PR?

rhauch

Thanks, @Nimfadora. I like your general approach to this bug, but I think it would be easier to merge if the changes were smaller with fewer modified lines. If we were just apply to trunk, this might be okay. But we actually want to backport this, and minimizing the changes might also help make it more clear that the behavior is indeed only changing when the timestamps are null. WDYT?

rhauch · 2019-05-01T02:12:31Z

+        final Map<String, Object> value = requireMap(rawValue, PURPOSE);
+        final HashMap<String, Object> updatedValue = new HashMap<>(value);
+        updatedValue.put(config.field, convertTimestamp(value.get(config.field)));
+        return newRecord(record, null, updatedValue);


I understand that the if block always returns, and so an else block is unnecessary. But because we likely want to backport this, and because we'd like to minimize the changes to help ensure the behavior remains the same for non-null timestamps, it's probably worth it to not remove the else block and keep the original indentation.

rhauch · 2019-05-01T02:14:19Z

+        }
+        // Value is Struct, only its single field should be converted
+        if (rawValue == null) {
+            Schema updatedSchema = updateSchema(originalSchema);


This is not using a cached schema, which means that a new schema has to be built for every record that uses some original schema, A, but that also has a null timestamp field. Wouldn't it make sense to cache this updated schema? And, since the updates schema is not a function of the record, we should be able to look for the updated schema in the cache or update the schema and cache it before dealing with the raw value.

This might also mean that there are fewer lines changed, especially if we keep the else block and previous indentation (as mentioned before, even though strictly speaking the else is unnecessary), which might help us ensure that the behavior doesn't change for the non-null timestamps.

rhauch · 2019-05-01T02:15:28Z


-            Struct updatedValue = applyValueWithSchema(value, updatedSchema);
-            return newRecord(record, updatedSchema, updatedValue);
+    private Schema updateSchema(Schema originalSchema) {


Again, if we're only going to call this from one location (see previous comment), maybe it's better to not pull this logic out to minimize changes.

kkonstantine · 2020-03-21T23:50:52Z

This fix was actually implemented by: #7070 which referenced this initial implementation here. KAFKA-7157 has been resolved.

Closing this PR

Valeriia Vasileva added 2 commits April 9, 2019 12:45

KAFKA-7157: Connect TimestampConverter SMT doesn't handle null values

9d61415

Merge branch 'trunk' of https://github.com/apache/kafka into KAFKA-7157

ac2c218

rhauch reviewed May 1, 2019

View reviewed changes

rayokota mentioned this pull request Jul 10, 2019

KAFKA-7157: Fix handling of nulls in TimestampConverter #7070

Merged

3 tasks

kkonstantine closed this Mar 21, 2020

kkonstantine added the connect label Mar 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] KAFKA-7157: Connect TimestampConverter SMT doesn't handle null values#6446

[WIP] KAFKA-7157: Connect TimestampConverter SMT doesn't handle null values#6446
Nimfadora wants to merge 2 commits intoapache:trunkfrom
Nimfadora:KAFKA-7157

Nimfadora commented Mar 14, 2019

Uh oh!

Nimfadora commented Mar 14, 2019

Uh oh!

Nimfadora commented Mar 18, 2019

Uh oh!

Nimfadora commented Apr 16, 2019

Uh oh!

rhauch left a comment

Uh oh!

rhauch May 1, 2019

Uh oh!

rhauch May 1, 2019

Uh oh!

rhauch May 1, 2019

Uh oh!

kkonstantine commented Mar 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Nimfadora commented Mar 14, 2019

Goal

Details

Important

Testing

Committer Checklist (excluded from commit message)

Uh oh!

Nimfadora commented Mar 14, 2019

Uh oh!

Nimfadora commented Mar 18, 2019

Uh oh!

Nimfadora commented Apr 16, 2019

Uh oh!

rhauch left a comment

Choose a reason for hiding this comment

Uh oh!

rhauch May 1, 2019

Choose a reason for hiding this comment

Uh oh!

rhauch May 1, 2019

Choose a reason for hiding this comment

Uh oh!

rhauch May 1, 2019

Choose a reason for hiding this comment

Uh oh!

kkonstantine commented Mar 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants