Skip to content

(WIP) Support explicit dependencies between extensions.#16968

Closed
georgew5656 wants to merge 19 commits intoapache:masterfrom
georgew5656:supportExtensionDependencies
Closed

(WIP) Support explicit dependencies between extensions.#16968
georgew5656 wants to merge 19 commits intoapache:masterfrom
georgew5656:supportExtensionDependencies

Conversation

@georgew5656
Copy link
Copy Markdown
Contributor

Allow specifying extensions between extensions without having to explicitly install jars in multiple places.

Description

Currently, in order to have extension A depend on extension B, you have to explicitly include extension B as a build dependency in extension A's pom.xml. This cause the jar for extension B to be included in extension A's directory and loaded as a module.

This can lead to some weird behavior e.g. the following bug linked from #16929 caused by druid-kafka-extraction-namespace depending on druid-lookups-cached-global.

There is logic in ExtensionsLoader.tryAdd that checks whether a module has already been loaded during initialization and skips it if it already has been loaded.

This is a problem when both druid-kafka-extraction-namespace and druid-lookups-cached-global are specified because they both load NamespaceExtractionModule.

If druid-kafka-extraction-namespace is specified first, both NamespaceExtractionModule and KafkaExtractionNamespaceModule are loaded by the druid-kafka-extraction-namespace classloader, and the druid-lookups-cached-global classloader doesn't load anything since NamespaceExtractionModule was already loaded. This is fine because the features of druid-lookups-cached-global are served through the module NamespaceExtractionModule being loaded in druid-kafka-extraction-namespace. (this is essentially the same behavior as just loading druid-kafka-extraction-namespace, and this is why loading both extensions in this order works)

If druid-lookups-cached-global is specified first, NamespaceExtractionModule is loaded by the druid-lookups-cached-global class loader. The druid-kafka-extraction-namespace classloader will only load KafkaExtractionNamespaceModule because NamespaceExtractionModule has already been loaded. This is a problem because kafka lookups rely on classes bound in NamespaceExtractionModule that it can't access (because NamespaceExtractionModule is only bound in the druid-lookups-cached-global classloader).

to get around this issue, my proposed fix is to allow chaining classloaders without having to actually copy jars. e.g. extension A won't actually have extension B's jar (we include the dependency as provided in the pom.xml). when extension A is loading classes, it will try to use extension B's classloader to find classes it can't find.

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

We currently support two kinds of classloaders for each extension (extension-first or not). hadoop always uses the non extension-first classloader and other druid services check the druid.extensions.useExtensionClassloaderFirst property. this made it a little more annoying to implement chained classloading (see ExtensionLoader), but I didn't think we could deprecate this behavior so I left it in. I had to add a StandardClassLoader class because we were just using a regular UrlClassLoader object for the non extension-first classloader. In both StandardClassLoader and ExtensionFirstClassLoader i added the logic to check other extension's classloaders for classes. this is the part of the code i'd like some more detailed feedback on.

I added a root level extension-dependencies.json file to specify inter-extension dependencies. the kafka/global cached lookup dependency is included here. When building druid bundles, maven copies this file into the extensions/ directory so the PullDependencies command can read it. I was hoping to just have the PullDependencies command to copy the json over instead but the Main command that runs PullDependencies actually needs extensions to be loaded (https://github.com/apache/druid/blob/master/services/src/main/java/org/apache/druid/cli/Main.java#L98), so this doesn't work.
Since PullDependencies can delete the extensions/ directory before copying jars in, I had to update the logic to replace the extension-dependencies.json file after the extensions/ directory has been repopulated.

The logic to actually set dependencies from extension-dependencies.json in extension classloaders is in ExtensionsLoader.

I tested this with druid-kafka-extraction-namespace and druid-lookups-cached-global and everything seems to work okay.

I haven't added any unit tests yet because I want to get feedback on whether this approach seems reasonable.

Release note

  • support explicit extension dependencies in druid
Key changed/added classes in this PR
  • ExtensionLoader
  • ExtensionFirstClassLoader
  • StandardClassLoader
  • PullDependencies

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@georgew5656 georgew5656 changed the title Support explicit dependencies between extensions. (WIP) Support explicit dependencies between extensions. Aug 28, 2024
@georgew5656
Copy link
Copy Markdown
Contributor Author

closing in favor of #16973

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant