Skip to content

KAFKA-12229: reset to original class loader after connector stop #9942

Merged
kkonstantine merged 5 commits intoapache:trunkfrom
showuon:KAFKA-12229
Jan 26, 2021
Merged

KAFKA-12229: reset to original class loader after connector stop #9942
kkonstantine merged 5 commits intoapache:trunkfrom
showuon:KAFKA-12229

Conversation

@showuon
Copy link
Copy Markdown
Member

@showuon showuon commented Jan 21, 2021

java.lang.NullPointerException at 
org.apache.kafka.connect.mirror.MirrorSourceConnector.listTopics(MirrorSourceConnector.java:348) at 
org.apache.kafka.connect.mirror.MirrorSourceConnector.findSourceTopicPartitions(MirrorSourceConnector.java:192) at 
org.apache.kafka.connect.mirror.MirrorSourceConnectorTest.testRefreshTopicPartitionsTopicOnTargetFirst(MirrorSourceConnectorTest.java:222)

After days of investigation, I finally found the root cause of the test failure reason: class loader.

The issue is quite weird, we mocked the method, but still call the real method, and cause the NPE. Digging into the Mockito, found it's not about JUnit 5, it's because of the class loader. In Mockito, we relies on the class loader to generate the proxy instance (source) to intercept the method call, and if the class loader is not expected, we'll generate the wrong proxy instance (with wrong class path). We set the class loader during connector start to resolve conflicting dependencies (KIP-146), so we should set it back to the original class loader after connector stop in tests (EmbeddedConnectCluster is only used in tests) for the following Mockito works as expected.

So, there's an interference of integration tests with unit tests when Connect integration tests run before the MM2 unit tests, and that will cause the Mockito used in unit tests not work as expected.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 21, 2021

@chia7712 @ijuma @guozhangwang , please take a look at the PR. Thanks.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 21, 2021

All tests passed.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 25, 2021

@chia7712 @ijuma , please help review this PR. Thanks.

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @showuon
I'd like to understand a bit better what you've observed.
Are you saying there's an interference of integration tests with unit tests when Connect integration tests run before the MM2 unit tests? If that's the case, I think this connection is missing from the description. Have you noticed how often that happens (every time or occasionally)?

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 25, 2021

@kkonstantine , thanks for the comments. Answer your questions below:

  1. Have you noticed how often that happens (every time or occasionally)?
    --> I'd say, it happens often. I just checked the kafka-trunk-jdk11, it failed the test cases from build KAFKA-2744: Commit source task offsets after task is completely stopped to ensure no additional messages are processed during the offset commit when stopping tasks for rebalancing. #423 ~ # 431 (the latest). And also happens a lot in kafka-trunk-jdk8 and kafka-trunk-jdk15. And sometimes it didn't happen is because the unit tests run faster than the integration tests.

  2. Are you saying there's an interference of integration tests with unit tests when Connect integration tests run before the MM2 unit tests? If that's the case, I think this connection is missing from the description.
    --> Yes, you're right. There's an interference of integration tests with unit tests when Connect integration tests run before the MM2 unit tests. I've added in the description.

Thank you very much.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 25, 2021

So, to easily reproduce it, you can try to start the EmbeddedConnectCluster before the tests in MirrorSourceConnectorTest. What I did is to copy the beforeEach startClusters() method from MirrorConnectorsIntegrationBaseTest and paste into the MirrorSourceConnectorTest. So you'll find even we mock the method, the real methods still got called.

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @showuon
The fix makes sense. Given that we use a util method to swap the classloader in Connect in all the places that this happens, I've added an optional suggestion that is equivalent with your solution for better tracking.

Otherwise, LGTM

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 26, 2021

@kkonstantine , thanks for the comments and updates.
Failed tests are not related to my changes.

Test Result (2 failures / +2)

    Build / JDK 11 / org.apache.kafka.clients.consumer.KafkaConsumerTest.testCloseWithTimeUnit()
    Build / JDK 11 / org.apache.kafka.clients.consumer.internals.FetcherTest.testEarlierOffsetResetArrivesLate()

@kkonstantine
Copy link
Copy Markdown
Contributor

Indeed @showuon
jdk8 and 15 are green. Thanks for the non-obvious fix.
LGTM

@kkonstantine kkonstantine merged commit 45b7a0a into apache:trunk Jan 26, 2021
@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 26, 2021

The latest kafka-trunk-jdk8, kafka-trunk-jdk11, kafka-trunk-jdk15 are not seeing these failed tests anymore. YEAH~

ijuma added a commit to ijuma/kafka that referenced this pull request Jan 26, 2021
…e-allocations-lz4

* apache-github/trunk: (562 commits)
  MINOR: remove unused code from MessageTest (apache#9961)
  MINOR: Fix visibility of Log.{unflushedMessages, addSegment} methods (apache#9966)
  KAFKA-12229: Restore original class loader in integration tests using EmbeddedConnectCluster during shutdown  (apache#9942)
  KAFKA-12190: Fix setting of file permissions on non-POSIX filesystems (apache#9947)
  MINOR: Remove `toStruct` and `fromStruct` methods from generated protocol classes (apache#9960)
  MINOR: Fix typo in Utils#toPositive (apache#9943)
  MINOR: MessageUtil: remove some deadcode (apache#9931)
  MINOR: Update zstd-jni to 1.4.8-2 (apache#9957)
  MINOR: Revert assertion in MockProducerTest (apache#9956)
  MINOR: Optimize assertions in unit tests (apache#9955)
  MINOR: Tag `RaftEventSimulationTest` as `integration` and tweak it (apache#9925)
  MINOR: Update to Gradle 6.8.1 (apache#9953)
  MINOR: A few small group coordinator cleanups (apache#9952)
  MINOR: Upgrade ducktape to version 0.8.1  (apache#9933)
  MINOR: fix record time in test shouldWipeOutStandbyStateDirectoryIfCheckpointIsMissing (apache#9948)
  MINOR: Restore interrupt status when closing (apache#9863)
  KAFKA-10357: Extract setup of repartition topics from Streams partition assignor (apache#9848)
  KAFKA-12212; Bump Metadata API version to remove `ClusterAuthorizedOperations` fields (KIP-700) (apache#9945)
  MINOR: log 2min processing summary of StreamThread loop (apache#9941)
  MINOR: Drop enable.metadata.quorum config (apache#9934)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants