Skip to content

Use vector cursors when possible in MSQ#18305

Merged
gianm merged 11 commits intoapache:masterfrom
adarshsanjeev:msq-shim-vector
Jul 31, 2025
Merged

Use vector cursors when possible in MSQ#18305
gianm merged 11 commits intoapache:masterfrom
adarshsanjeev:msq-shim-vector

Conversation

@adarshsanjeev
Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev commented Jul 22, 2025

Add a "shim" cursor implementation, which is an adapter from Cursor to Vector Cursor. This allows vector cursors to be used as normal cursors, by simply reading a vector of values and stepping through it as the cursor advances.

Shim cursors have been used in MSQ ScanQueryFrameProcessor to use vector cursors when possible.

Also does some minor refactoring around MSQTestBase, to make MSQ tests actually use the modules registered by the getCoreModule() call in the test framework.

Main changes

  • Added ShimCursor to adapt VectorCursor to the Cursor interface. (processing/src/main/java/org/apache/druid/segment/shim/ShimCursor.java)
  • Introduced ShimColumnSelectorFactory to create appropriate cursors. (processing/src/main/java/org/apache/druid/segment/shim/ShimColumnSelectorFactory.java)
  • Added specific adapters for vectorized selectors:
    • ShimMultiValueDimensionSelector for multi-value dimension vector selectors. (processing/src/main/java/org/apache/druid/segment/shim/ShimMultiValueDimensionSelector.java)
    • ShimNumericColumnValueSelector for numeric vector selectors. (processing/src/main/java/org/apache/druid/segment/shim/ShimNumericColumnValueSelector.java)
    • ShimObjectColumnValueSelector for object vector selectors. (processing/src/main/java/org/apache/druid/segment/shim/ShimObjectColumnValueSelector.java)

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - Segment Format and Ser/De Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 22, 2025
@adarshsanjeev adarshsanjeev marked this pull request as ready for review July 28, 2025 05:26
{
if (hasMultipleValues) {
Object object = getObject();
ArrayList arrayList = (ArrayList) object;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brittle, it assumes that the underlying object is always going to be an ArrayList. At least use plain List. You should also handle the case here where the underlying object is null or String. I think this can happen with some underlying selectors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added branches for these cases. I haven't been able to find a test case that that returns a string here, so I haven't added one yet.

{
Object object = getObject();
if (hasMultipleValues) {
ArrayList arrayList = (ArrayList) object;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as getRow, you should handle null, String, and List for the underlying objects.

@gianm gianm merged commit ebd6c33 into apache:master Jul 31, 2025
133 of 134 checks passed
gianm added a commit to gianm/druid that referenced this pull request Aug 5, 2025
Follow-up to apache#18305. Fixes a bug where calling ShimCursor's
makeDimensionSelector on a non-string type would throw an error.
Now, it has behavior similar to QueryableIndexCursor:

- Uses ValueTypes.makeNumericWrappingDimensionSelector on numeric types,
  casting the numbers to strings.
- Returns a nil selector for array and complex types.

The new tests verify selector compatibility more extensively.
cryptoe pushed a commit that referenced this pull request Aug 6, 2025
Follow-up to #18305. Fixes a bug where calling ShimCursor's
makeDimensionSelector on a non-string type would throw an error.
Now, it has behavior similar to QueryableIndexCursor:

- Uses ValueTypes.makeNumericWrappingDimensionSelector on numeric types,
  casting the numbers to strings.
- Returns a nil selector for array and complex types.

The new tests verify selector compatibility more extensively.
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants