Skip to content

Move multi-stage-query module out of extension#18394

Merged
cryptoe merged 6 commits intoapache:masterfrom
kfaraz:msq_core_module
Aug 21, 2025
Merged

Move multi-stage-query module out of extension#18394
cryptoe merged 6 commits intoapache:masterfrom
kfaraz:msq_core_module

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Aug 12, 2025

Description

Druid MSQ engine offers a very convenient method of performing SQL-based batch ingestion, export, etc.

Given the importance and heavy adoption of MSQ, this patch converts MSQ into a core capability of Druid
rather than an extension, as was always intended.

Changes

  • Move multi-stage-query module out from extensions-core
  • Add MSQ modules to the Cli* classes of applicable services
  • Update dependencies as needed
  • Remove mentions of extension from example common.runtime.properties files
  • Update embedded tests to not load MSQ extension modules
  • Address cyclic dependencies
    • Change {@link} to {@code} in javadocs of PeonProcessingBufferProvider,
      IndexerProcessingBufferProvider and IndexerResourcePermissionMapper.
    • Move LiveCatalogResolver from druid-catalog to multi-stage-query. This class is used in MSQInsertTest.

Pending

  • Once initial feedback is received, the docs will also be updated to remove mentions of the multi-stage-query extension.

Release note

  • Multi-stage-query (MSQ engine) is now a core capability of Druid rather than an extension.
  • Remove druid-multi-stage-query from druid.extensions.loadList in common.runtime.properties.

Upgrade notes

While upgrading, the extension druid-multi-stage-query should be removed from druid.extensions.loadList.
Future Druid version would fail to start as they would not be able to locate this extension.
For the time being, this extension would just be ignored.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@Akshat-Jain Akshat-Jain reopened this Aug 13, 2025
@github-actions github-actions Bot added Area - Batch Ingestion Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Aug 13, 2025
@Akshat-Jain Akshat-Jain reopened this Aug 13, 2025
@github-actions github-actions Bot added the GHA label Aug 17, 2025
@adarshsanjeev
Copy link
Copy Markdown
Contributor

Thanks for this change! This would really help while making a lot of changes in MSQ. I am taking a look at this now.

Regarding the codeQL failures, most of these seem to be false alerts from tests, which can be silenced in the check, but I do see some of them (like Comparison of narrow type with wide type in loop condition at multi-stage-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollector.java) as minor fixes which we could clean up now.

Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay, aside from the checkstyle issues.

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Aug 21, 2025

Thanks a lot for the review, @adarshsanjeev !

Given that the check failures are existing, I am going ahead with the merge of this PR as I want this commit to be just about the migration.

The checks can be addressed in a follow up if needed.

@@ -193,9 +193,18 @@ public void initializeExtensionFilesToLoad()
if (toLoad == null) {
extensionsToLoad = rootExtensionsDir.listFiles();
} else {
final LinkedHashSet<String> validExtensionsToLoad = new LinkedHashSet<>(toLoad);
if (validExtensionsToLoad.remove("druid-multi-stage-query")) {
Copy link
Copy Markdown
Contributor

@cryptoe cryptoe Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it feels a bit awkward having this since configs which are set are not being honored but I guess this might be the cleanest unclean way to support upgrading clusters without manual intervention.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, didn't quite like it myself. That's why the initial commits didn't have this change.
But as you said, this is the simplest path forward.
We can probably remove this after a few Druid releases, since we have started giving out the warning message.

@cryptoe cryptoe merged commit 1168cf0 into apache:master Aug 21, 2025
69 of 70 checks passed
@kfaraz kfaraz deleted the msq_core_module branch August 21, 2025 06:08
@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Aug 21, 2025

Thanks for the review, @cryptoe !

@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 GHA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants