Skip to content

KIP-145: Add SMTs, HeaderFrom, DropHeaders and InsertHeader#9549

Merged
tombentley merged 5 commits intoapache:trunkfrom
tombentley:KIP-145-add-missing-smt
Apr 16, 2021
Merged

KIP-145: Add SMTs, HeaderFrom, DropHeaders and InsertHeader#9549
tombentley merged 5 commits intoapache:trunkfrom
tombentley:KIP-145-add-missing-smt

Conversation

@tombentley
Copy link
Copy Markdown
Member

@tombentley tombentley commented Nov 3, 2020

These SMTs were originally specified in KIP-145 but never implemented at the time.

HeaderTo is not included since its original specification doesn't deal with the fact that there can be >1 header with the same name, but a field can only have a single value (while you could use an array, that doesn't work if the headers for the given name had different schemas).

@tombentley
Copy link
Copy Markdown
Member Author

@kkonstantine would you be able to review this? These SMTs were originally specified in KIP145 but never implemented. It seems they were forgotten about.

@C0urante maybe you'd also like to review?

@C0urante
Copy link
Copy Markdown
Contributor

C0urante commented Nov 9, 2020

Thanks for reaching out @tombentley! Happy to take a look.

For reference, maybe you could add a link to https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect#KIP145ExposeRecordHeadersinKafkaConnect-Transformations to the description?

Will begin reviewing shortly.

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! Will check out the tests after the core functional changes are settled.

Comment on lines 57 to 71
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to refer to the MOVE.name and COPY.name fields declared in the Operation enum below instead of using the literal "move" and "copy" strings in this section.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOVE.name is not a compile time constant (at least the compiler doesn't see it that way). I factored out constants to avoid the duplication of literals.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why duplicate headers here? According to the Header class's Javadocs, the collection should be mutable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't completely sure whether a transformation was allowed to mutate headers, since it has to create a new ConnectRecord. Take this code from WorkerSourceTask as an example:

            final SourceRecord record = transformationChain.apply(preTransformRecord);
            final ProducerRecord<byte[], byte[]> producerRecord = convertTransformedRecord(record);
            if (producerRecord == null || retryWithToleranceOperator.failed()) {
                counter.skipRecord();
                commitTaskRecord(preTransformRecord, null);
                continue;
            }

See how preTransformRecord can be used after the transformation chain has been applied? I think it would be incorrect for commitTaskRecord to commit the original record but with headers which had been mutated by transformations, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's fair. Does seem to be the pattern followed by the other out-of-the-box transformations as well; probably best to continue to follow that pattern.

I'm a little unnerved by this though, since as far as I can tell it's not publicly documented and so it's possible people writing their own transformations may be violating this implicit rule.

Out of scope, so I've filed KAFKA-10720 to track the need for possible documentation improvements.

Comment on lines 129 to 157
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fairly expensive to perform for every record. Do you think it might make sense to perform some caching, similarly to what's done in the ReplaceField transform?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here RE: duplication

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here RE: duplication

@tombentley
Copy link
Copy Markdown
Member Author

@C0urante thanks for the review, some really helpful comments there.

Comment on lines 334 to 343
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little silly to run this for every iteration of the parameterized test 😛
I'm guessing there isn't an easy way to run this only once?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, parameterized tests do have their faults. AFAIK the only way to handle this using Junit 4 is to have two tests. You can arrange to have both tests in the same source file so both parameterised and non-parameterised tests are still easily discoverable (e.g. using Enclosed), but I've never seen that done in practice and it looks rather fiddly. Since these tests execute pretty quickly, I'm inclined to think we should just live with it. The real solution might just be to adopt Junit 5.

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few small comments but none of them are blockers. Thanks Tom!

@tombentley
Copy link
Copy Markdown
Member Author

@kkonstantine or perhaps @rhauch please could one of you take a look?

@tombentley
Copy link
Copy Markdown
Member Author

@kkonstantine, @rhauch did you get a chance to look at this?

@tombentley tombentley force-pushed the KIP-145-add-missing-smt branch from e426dbf to 1a79ffd Compare November 26, 2020 17:40
Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks or the PR! Good to see these SMTs finally being added.
I've made a first quick pass (haven't looked at HeaderFrom yet) and left a few suggestions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this transformation!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Headers is a LinkedList, remove() has to iterate the whole list each time. I wonder if we could instead start from an empty headers list and add the headers not being removed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, extra line

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, extra line

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce these fields are not null?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that makes sense @mimaison, but then for consistency shouldn't we make DropHeaders.headers, HeaderFrom.headers and HeaderFrom.fields reject empty lists?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reuse MOVE_OPERATION and COPY_OPERATION here?

@tombentley tombentley force-pushed the KIP-145-add-missing-smt branch from 1a79ffd to a211f1e Compare April 7, 2021 11:22
@tombentley
Copy link
Copy Markdown
Member Author

@mimaison I went ahead and enforce non-nullness and non-emptiness, if you want to do another pass?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I don't think it's reachable by users, should we have a message here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is impossible due to the ConfigDef.ValidString.in(MOVE_OPERATION, COPY_OPERATION), so this is really an assertion failure. The line number in the stacktrace would be enough to track it down if it ever did happen due to a later refactoring, so imho an error message is of no value. But I'm happy to add one if you like.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fair enough. Thanks

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom, it looks good!

@mimaison
Copy link
Copy Markdown
Member

Not sure why the build had failed, I've rekicked it.

…Header

These SMTs were originally specified in KIP-145 but never implemented
at the time.

HeaderTo is not included since its original specification doesn't deal with
the fact that there can be >1 header with the same name, but a field can only
have a single value (which could be an array, but not if the headers for
the given name had different schemas).
@tombentley tombentley force-pushed the KIP-145-add-missing-smt branch from a211f1e to c7ef2dc Compare April 16, 2021 08:15
@tombentley
Copy link
Copy Markdown
Member Author

Failing tests are those reported in KAFKA-12629, KAFKA-12284 and KAFKA-9295:

Build / JDK 11 and Scala 2.13 / testCreateClusterAndCreateAndManyTopicsWithManyPartitions() – kafka.server.RaftClusterTest32s
Build / JDK 8 and Scala 2.12 / testReplication() – org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest1m 32s
Build / JDK 8 and Scala 2.12 / testReplicationWithEmptyPartition() – org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest1m 26s
Build / JDK 8 and Scala 2.12 / testOneWayReplicationWithAutoOffsetSync() – org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest1m 23s
Build / JDK 8 and Scala 2.12 / testReplication() – org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest1m 27s
Build / JDK 8 and Scala 2.12 / testReplicationWithEmptyPartition() – org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationTest1m 16s
Build / JDK 8 and Scala 2.12 / shouldInnerJoinMultiPartitionQueryable – org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinMultiIntegrationTest

All these pass using the relevant JDKs on my machine. Merging to trunk.

@tombentley tombentley merged commit f8f1769 into apache:trunk Apr 16, 2021
@tombentley tombentley deleted the KIP-145-add-missing-smt branch April 16, 2021 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants