Skip to content

Support non time order in MSQ compaction#17318

Merged
kfaraz merged 2 commits intoapache:masterfrom
gargvishesh:support-non-time-order-in-msq-compaction
Nov 27, 2024
Merged

Support non time order in MSQ compaction#17318
kfaraz merged 2 commits intoapache:masterfrom
gargvishesh:support-non-time-order-in-msq-compaction

Conversation

@gargvishesh
Copy link
Copy Markdown
Contributor

Description

#16849 added support for sorting segments with non-time columns. This PR extends that support to MSQ compaction. Specifically, if forceSegmentSortByTime is set in the data schema -- either in the user-supplied compaction config or in the inferred schema -- the following steps are taken:

  • Skip adding __time explicitly as the first column to the dimension schema since it already comes as part of the schema
  • Ensure column mappings propagate __time in the order specified by the schema
  • Set forceSegmentSortByTime in the MSQ context.

Also, the PR adds (missing) unit tests for verifying MSQ spec generated with nested and auto-type columns

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [] added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 10, 2024
: ColumnHolder.TIME_COLUMN_NAME;
ColumnMapping timeColumnMapping = new ColumnMapping(timeColumn, ColumnHolder.TIME_COLUMN_NAME);
if (dataSchema.getDimensionsSpec().isForceSegmentSortByTime()) {
// When not sorted by time, the __time column is missing from dimensionsSpec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// When not sorted by time, the __time column is missing from dimensionsSpec
// When sorted by time, the __time column is missing from dimensionsSpec

@kfaraz kfaraz merged commit 5333c53 into apache:master Nov 27, 2024
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants