Skip to content

KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins#7315

Merged
rhauch merged 36 commits intoapache:trunkfrom
gharris1727:classloader-fix
Oct 17, 2019
Merged

KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins#7315
rhauch merged 36 commits intoapache:trunkfrom
gharris1727:classloader-fix

Conversation

@gharris1727
Copy link
Copy Markdown
Contributor

@gharris1727 gharris1727 commented Sep 9, 2019

During plugin class loading, instance creation, and configuration
change the current thread's context ClassLoader
to the PluginClassLoader. This prevents issues where
static initialization loads the wrong resources, or
fails to find resources that are inside the plugin.
This change impacts the following classes:
Plugins, DelegatingClassLoader, Worker.

In order to verify the original bug and verify the fix,
a group of test plugins for collecting data are included in this commit:

  • AlwaysThrowException shows that exceptions are captured
  • Sampling is for basic initialization and field aliasing
  • SamplingConverter takes samples during converter method calls
  • SamplingHeaderConverter takes samples during header converter method calls
  • SamplingConfigProvider takes samples during config provider method calls
  • SamplingConfigurable takes samples during configurable method calls
  • ServiceLoaderPlugin samples the active classloader for service loaded classes

These plugins are stored in the resources directory.
Each plugin has it's own source tree, and can have arbitrary code.
The plugins are compiled into class files in a separate temporary directory.
They are then packaged into a jar in a temporary file
which is deleted after the process exits.
This jar is then exposed for use in plugin.path, and tests
can access to the plugin class names through constants.

These changes will be backported to all versions of AK which have plugin isolation (introduced in 0.11.0.0):

  • 0.11.0.4
  • 1.0.3
  • 1.1.2
  • 2.0.2
  • 2.1.2
  • 2.2.2
  • 2.3.1
  • 2.4

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

* Add test/resources/test-plugins with source for test plugins
* Add TestPlugins for building these test plugins for unit tests
* Change Plugins PluginsTest and DelegatingClassLoaderTest to include TestPlugins
* Add test cases for addressing isolation bugs
* Change PluginsTest to isolate the Plugins instance

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
@gharris1727
Copy link
Copy Markdown
Contributor Author

cc @rhauch @kkonstantine @C0urante for review

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @gharris1727! Don't let the number of comments fool you, this looks pretty good. Mostly have questions/suggestions about testing methodology.

@C0urante
Copy link
Copy Markdown
Contributor

Also, can you update the description to include a list of feature branches that this change should be backported to?

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
@gharris1727
Copy link
Copy Markdown
Contributor Author

This patch breaks the build entirely, it can't even complete most of the times:
https://gist.github.com/gharris1727/b40831fd2c175f798397753a81bb27c8

@gharris1727
Copy link
Copy Markdown
Contributor Author

Also, can you update the description to include a list of feature branches that this change should be backported to?

I don't know what branches this can be backported to, how can I tell?

@C0urante
Copy link
Copy Markdown
Contributor

C0urante commented Sep 19, 2019

I don't know what branches this can be backported to, how can I tell?

Which AK versions are affected by the bug that this PR aims to address? If I recall correctly, it's been present since plugin isolation was added to Kafka; if that's the case, then we'll want to backport these changes to every feature branch that includes plugin isolation.

Signed-off-by: Greg Harris <gregh@confluent.io>
…plugin

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments on the recent changes

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
@gharris1727
Copy link
Copy Markdown
Contributor Author

@C0urante it appears that the jenkins builder results are inconclusive.
51c6f57 was the commit with finally and 0df3a96 was the build without.
How flaky is the trunk build normally? I'm currently 2/5 for build failures I can't immediately explain.

@C0urante
Copy link
Copy Markdown
Contributor

@gharris1727 none of the failures with the 51c6f57 build came from Connect; it's pretty unlikely the finally block had any impact on those.
Might be a result of how you're running these tests (in IDE vs. directly from the command line)? Either way, I'd personally say it's probably fine to include the finally block in these changes for now and if we notice a significant uptick in flaky build failures (which, sadly, there's already a few of) we can investigate further.

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good, just one more comment from the latest changes.

@C0urante
Copy link
Copy Markdown
Contributor

I think I've contributed as much as I can as a reviewer and, apart from the following open items, LGTM:

  • Use of a finally block to guarantee a compare/swap with the old class loader at the end of method calls
  • Modifications to the behavior of the DelegatingClassLoader.connectorLoader(String connectorOrAlias) method
  • Small question about testing methodology for service loading in plugins

Nice job, @gharris1727!

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gharris1727
I did a first pass during which I focused only on the actual code (no testing yet) and I've commented on a few things on the periphery of your fix. Addressing those first might help focusing on the actual changes.

I'll follow up with a second pass on the core of your fix and I'll also review the new tests, that seem quite valuable overall.

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
…ted.

Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
@gharris1727
Copy link
Copy Markdown
Contributor Author

I found another bug where all of the plugins that are found with a ServiceLoader (my example was the ConfigProvider) were statically initialized with the wrong classloader. I've applied the fix to the DelegatingClassLoader, and added a test to confirm that ConfigProvider plugins are properly isolated.

I think this brings us up to 6 additional call sites where we swap out the ClassLoader, when not doing so would be a bug.
I also think that the code for configuring these plugins inside these temporary swaps is relatively concise now, and may not warrant a big refactor, so i'll put up the non-refactored diff for review.

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot's of commits after my first reviews. I'm retracting my approval and will have to take another look

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented Oct 11, 2019

@gharris1727 wrote:

I also think that the code for configuring these plugins inside these temporary swaps is relatively concise now, and may not warrant a big refactor, so i'll put up the non-refactored diff for review.

I agree. Let's not refactor this and focus on a minimal-sized fix that we can backport. Also note that we started using Java 8 in AK 2.0, so we need to avoid using lambdas and function refs in the PR unless we want to create an alternative PR for 1.x and 0.x branches.

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed another pass after the new commits were added.
This LGTM now.

@rhauch rhauch changed the title KAFKA-8340: Use PluginClassLoader while statically initializing plugins KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins Oct 15, 2019
Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gharris1727. This is good stuff and quite thorough. I have a few suggestions for this PR for merging into trunk and 2.4, but I think it's going to be challenging to backport this PR as-is beyond 2.0 because of some of the features that were introduced since 2.0 and because we use Java 8 only starting with 2.0.

Having said that, let's focus on the things that we can fix immediately, try to backport this as far back as feasible, and then deal with earlier branches via separate PRs targeted to specific branches (which we'll try to backport as far back as possible, etc.).

log.debug("Configuring the header converter with configuration keys:{}{}", System.lineSeparator(), converterConfig.keySet());
plugin.configure(converterConfig);

HeaderConverter plugin;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeaderConverter and this method don't exist prior to AK 1.1.

Map<String, Object> configProviderConfig = config.originalsWithPrefix(configPrefix);
plugin.configure(configProviderConfig);

ConfigProvider plugin;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigProvider was introduced in AK 2.0, so this change can't be backported to earlier branches.

…lass

Signed-off-by: Greg Harris <gregh@confluent.io>
Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the quick turnaround, @gharris1727! This is an excellent PR.

@rhauch rhauch merged commit ff68b60 into apache:trunk Oct 17, 2019
rhauch pushed a commit that referenced this pull request Oct 17, 2019
…lizing plugins (#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
rhauch pushed a commit that referenced this pull request Oct 17, 2019
…lizing plugins (#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
rhauch pushed a commit that referenced this pull request Oct 17, 2019
…lizing plugins (#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
rhauch pushed a commit that referenced this pull request Oct 17, 2019
…lizing plugins (#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
rhauch pushed a commit that referenced this pull request Oct 17, 2019
…lizing plugins (#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
sdandu-gh added a commit to confluentinc/kafka that referenced this pull request Nov 9, 2019
* KAFKA-8649: send latest commonly supported version in assignment (apache#7425)

Reviewers: Matthias J. Sax <matthias@confluent.io>

* KAFKA-8262, KAFKA-8263: Fix flaky test `MetricsIntegrationTest` (apache#6922) (apache#7457)

- Timeout occurred due to initial slow rebalancing.
- Added code to wait until `KafkaStreams` instance is in state RUNNING to check registration of metrics and in state NOT_RUNNING to check deregistration of metrics.
- I removed all other wait conditions, because they are not needed if `KafkaStreams` instance is in the right state.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

* MINOR: clarify wording around fault-tolerant state stores (apache#7510)

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

* MINOR: Port changes from trunk for test stability to 2.3 branch (apache#7424)

Reviewer: Matthias J. Sax <matthias@confluent.io>

* KAFKA-9014: Fix AssertionError when SourceTask.poll returns an empty list (apache#7491)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>

* KAFKA-8945/KAFKA-8947: Fix bugs in Connect REST extension API (apache#7392)

Fix bug in Connect REST extension API caused by invalid constructor parameter validation, and update integration test to play nicely with Jenkins

Fix instantiation of TaskState objects by Connect framework.

Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <mageshn@confluent.io>, Randall Hauch <rhauch@gmail.com>

* KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins (apache#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>

* MINOR: log reason for fatal error in locking state dir (apache#7534)

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>

* KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511)

The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again.

This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added.

Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>

* Fix bug in AssignmentInfo#encode and add additional logging (apache#7545)

Same as apache#7537
but targeted at 2.3 for cherry-pick
Reviewers: Bill Bejeck <bbejeck@gmail.com>

* Bump version to 2.3.1

* Update versions to 2.3.2-SNAPSHOT

* HOTFIX: fix bug in VP test where it greps for the wrong log message (apache#7643)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants