Skip to content

KAFKA-10304: refactor MM2 integration tests#9224

Merged
mimaison merged 1 commit intoapache:trunkfrom
ning2008wisc:kafka-10304
Jan 14, 2021
Merged

KAFKA-10304: refactor MM2 integration tests#9224
mimaison merged 1 commit intoapache:trunkfrom
ning2008wisc:kafka-10304

Conversation

@ning2008wisc
Copy link
Copy Markdown
Contributor

@ning2008wisc ning2008wisc commented Aug 26, 2020

The main proposals of this PR:
(1) extract the common functions into a base class MirrorConnectorsIntegrationBaseTest
(2) test in SSL-enabled cluster

@ning2008wisc ning2008wisc force-pushed the kafka-10304 branch 5 times, most recently from 7341a27 to 37ff7fd Compare September 9, 2020 02:12
@ning2008wisc ning2008wisc changed the title improve MM2 unit tests re-factor MM2 integration tests Sep 9, 2020
@ning2008wisc ning2008wisc changed the title re-factor MM2 integration tests refactor MM2 integration tests Sep 9, 2020
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propose TestUtils to be the central place to host common functions that will be used by integration tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the simple move from MirrorConnectorsIntegrationTest with generalization of connector class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mostly copy-paste from MirrorConnectorsIntegrationTest

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check for topic config sync, since the topic created on primary cluster has a "cleanup.policy" config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when creating test-topic-1 topic on primary cluster, add a topic config. Later on, we will check if the config is synced from primary to backup cluster.

@ning2008wisc ning2008wisc changed the title refactor MM2 integration tests KAFKA-10304: refactor MM2 integration tests Sep 9, 2020
@ning2008wisc
Copy link
Copy Markdown
Contributor Author

very appreciated for any feedback on what to test additionally and how to get close to the real scenario. Some of the ideas come from https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ConnectWorkerIntegrationTest.java

@ning2008wisc ning2008wisc force-pushed the kafka-10304 branch 5 times, most recently from 80d32a0 to 2c3cbe7 Compare September 11, 2020 06:39
@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@ryannedolan @mimaison when possible, very appreciated for your attentions and feedback :) Thanks

@mimaison
Copy link
Copy Markdown
Member

mimaison commented Nov 9, 2020

@ning2008wisc I've not forgotten this PR, I just haven't had time to do reviews yet :(

@mimaison
Copy link
Copy Markdown
Member

If we have to control the complexity, I would prefer to drop testWithBrokerRestart and keep MirrorConnectorsIntegrationSSLTest, as it makes sense to run simple validation in SSL setup.

I think that would be a good idea. We can try to introduce this test in a follow up PR.

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I've made another pass.

I've left a bunch of minor comments but I'd like to highlight bigger issues on the following 2 points:

  • SSL tests: Even though MM2 does not do anything special with SSL (nor SASL) I think it's fine adding tests for SSL. However this should not cause such a large change. In this PR, SSL has leaked in all classes. Instead we want an SSL class that just changes the configuration.

    For a good example, see https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/SslConsumerTest.scala. If you can get it in a similar shape, then it makes sense to cover SSL

  • testWithBrokerRestart: Testing semantics of connectors is tricky. I don't think the current test does it properly at the moment. Have a look at the comments I left on the test. Maybe we should focus on the other issues first and have another attempt in a different PR. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have SSL specific methods here? Could we move all the SSL bits into the SSL class?

We have fields for the configurations. So we could set them accordingly (without or without SSL) in each concrete class. Then in the base class, we just use the fields to create the clusters without having to know if it's SSL or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move all the SSL bits into the SSL class?
I believe yes

We have fields for the configurations. So we could set them accordingly (without or without SSL) in each concrete class. Then in the base class, we just use the fields to create the clusters without having to know if it's SSL or not.
I agree

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By catching Throwable, aren't we silencing the error and not immediately failing the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is in Integration test, if we run into catch, the test should fail immediately, right? If you have suggestions on what to put here, I am happy to follow :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just let the Exception flow, that will automatically fail the test

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are only enabling SSL in one of the 2 clusters?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The primary reasons are: (1) I have seen some MM2 use cases which mirror data from in-house Kafka cluster to public hosted Kafka. For example, https://issues.apache.org/jira/browse/KAFKA-10704 So this setup (non-SSL -> SSL) seems to represent a certain coverage of use cases. (2) There are 4 combinations between SSL and non-SSL, so adding all combinations are great, but also verbose. Proving MM2 can work with one SSL-enabled cluster should be convincing enough to claim "MM2 can mirror in SSL case".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this field in both concrete class. Can we move it to the base class instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I am thinking in future, if we have test for MirrorSinkConnector (which will also extend the MirrorConnectorsIntegrationBaseTest class), it may be more flexible to have each extended test class to define what is the list of connectors to boot up?

I am also fine to move CONNECTOR_LIST into the base class now and move it out later on when we have MirrorSinkConnector

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class be abstract?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Collections.singletonMap()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use StringDeserializer.class.getName()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not specific to SSL, why do we have it ?
Should instead all tests in MirrorConnectorsIntegrationTest go to the base class so we run them both with and without SSL?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't MM2/Connect using at least once by default? ie, the producer in the runtime can cause duplicates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - it is a typo, should be "at least once". I am removing this test from this PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this test is doing.
Why do we need background clients instead of producing upfront and consuming the data mirrorred at the end of the test?
It looks like we are testing the primary->backup scenario but we are restarting the backup cluster. The source connector should not interact with the backup cluster.

Copy link
Copy Markdown
Contributor Author

@ning2008wisc ning2008wisc Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need background clients instead of producing upfront and consuming the data mirrorred at the end of the test?

I think I answered the above question as the comments in the code. Please see below,

// the purpose of background producer and consumer is to better test the failure case
// to avoid serialization (produce -> broker restart -> consumer) by decoupling the
// producer, embedded kafka and consumer. Since the consumer offsets are stored in
// kafka topic at backup cluster and offset commit is periodic, when kafka broker (at backup)
// restarts, duplicate records will be consumed, meaning "at least once" delivery guarantee

I believe this test has unique value and is more close to the realistic scenario, since the producer -> kafka -> consumer are always running on different machines, rather than within one process or thread.

Due to the complexity limitation, I am happy to remove this test for now

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

ning2008wisc commented Nov 20, 2020

Thanks @mimaison for your suggestive and insightful feedback.

Regarding to your 2 major concerns, I agree and I believe there exists a feasible solution, especially for a very lean SSL test. I will update this PR early next week by addressing all your major and minor comments.

@ning2008wisc ning2008wisc force-pushed the kafka-10304 branch 2 times, most recently from f2d4125 to bd6ee1a Compare November 23, 2020 05:01
@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@mimaison Thanks again for your previous detailed review. I updated the PR to resolve the exact 2 concerns you raised.

Very appreciated for your another review!

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ning2008wisc for the updates.
It's looking much much better. There are still a few things that need to be improved to merge but it's almost there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use allConfigs.get() to find the configs we want instead of searching for them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allConfigs is a java.util.Collection object and seems does not directly support get() https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just let the Exception flow, that will automatically fail the test

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

#9224 (comment)

I think the try catch block is needed for several Exception.

/kafka/connect/mirror/src/test/java/org/apache/kafka/connect/mirror/integration/MirrorConnectorsIntegrationSSLTest.java:41: error: unreported exception IOException; must be caught or declared to be thrown
        sslConfig = TestSslUtils.createSslConfig(false, true, Mode.SERVER, TestUtils.tempFile(), "testCert");

/kafka/connect/mirror/src/test/java/org/apache/kafka/connect/mirror/integration/MirrorConnectorsIntegrationSSLTest.java:42: error: unreported exception GeneralSecurityException; must be caught or declared to be thrown
            sslConfig = TestSslUtils.createSslConfig(false, true, Mode.SERVER, TestUtils.tempFile(), "testCert");

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented Dec 4, 2020

@ning2008wisc, @mimaison: Please see https://issues.apache.org/jira/browse/KAFKA-10811 and #9698 for a required fix to the MirrorConnectorsIntegrationTest to prevent the testReplication() method from prematurely terminating the builds. Please be sure this (or an equivalent) is incorporated into this refactoring.

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@rhauch if your pr is merged first, I will be happy to rebase on #9698

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @ning2008wisc. You've now addressed the most major issues. I've left a few more comments suggesting small improvements. A
Can you also rebase on trunk to resolve the conflict?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the code that created the consumer should close it. If I call waitForConsumingAllRecords(), I'd not expect it to close my consumer instance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

Hello @mimaison I addressed all your of comments, please take the final review.

@ning2008wisc ning2008wisc force-pushed the kafka-10304 branch 2 times, most recently from a4af47b to cd77b18 Compare December 9, 2020 01:36
@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented Dec 9, 2020

@ning2008wisc, https://issues.apache.org/jira/browse/KAFKA-10811 and #9698 have been merged (to multiple branches).

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@rhauch I incorporated #9698 in my latest version here

@mimaison
Copy link
Copy Markdown
Member

mimaison commented Dec 9, 2020

@ning2008wisc There are a few checkstyle failures:

> Task :connect:runtime:checkstyleTest
[ant:checkstyle] [ERROR] /Users/mickael/github-ws/kafka/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java:458:30: ',' is not followed by whitespace. [WhitespaceAfter]
[ant:checkstyle] [ERROR] /Users/mickael/github-ws/kafka/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java:459:30: ',' is not followed by whitespace. [WhitespaceAfter]
[ant:checkstyle] [ERROR] /Users/mickael/github-ws/kafka/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java:460:30: ',' is not followed by whitespace. [WhitespaceAfter]

You can run ./gradlew connect:runtime:test to reproduce

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@mimaison thanks for pointing out. Fixed the checkstyle, sorry for just running the check for connect/mirror

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the updates! It looks like we lost some logic that was added recently in MirrorConnectorsIntegrationTest.java .
Can you take a look and make sure we don't lose any logic?

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@mimaison thanks so much for your careful review. I checked #9698 again and make sure we do not miss anything at this moment.

@ning2008wisc
Copy link
Copy Markdown
Contributor Author

@mimaison thanks so much for all your efforts on reviewing. Really appreciated if you may have time before end of this year to do 1-2 final reviews to merge this PR.

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ning2008wisc for all the updates! LGTM

@mimaison mimaison merged commit 2cde6f6 into apache:trunk Jan 14, 2021
@ning2008wisc
Copy link
Copy Markdown
Contributor Author

Thanks @mimaison so much for your multiple rounds of valuable feedback and comments which greatly improving the code quality of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants