Skip to content

Fix backward compatibility issues in WindowOperatorQueryFrameProcessorFactory and WindowOperatorQueryFrameProcessor#17433

Merged
cryptoe merged 1 commit intoapache:masterfrom
Akshat-Jain:make-WindowOperatorQueryFrameProcessorFactory-backward-compatible
Oct 30, 2024
Merged

Fix backward compatibility issues in WindowOperatorQueryFrameProcessorFactory and WindowOperatorQueryFrameProcessor#17433
cryptoe merged 1 commit intoapache:masterfrom
Akshat-Jain:make-WindowOperatorQueryFrameProcessorFactory-backward-compatible

Conversation

@Akshat-Jain
Copy link
Copy Markdown
Contributor

Description

As part of the GlueingPartitioningOperator changes in #17038, we removed 2 fields from WindowOperatorQueryFrameProcessorFactory: maxRowsMaterializedInWindow and partitionColumnNames. This introduces backward incompatibility when the MSQ controller has the Glueing PR changes, but the worker doesn't:
image

This PR adds those fields back to ensure backward compatibility.

Even after adding the 2 fields back, if controller has the Glueing PR changes, but workers don't - then we run into another issue where the controller sends the operatorFactoryList with the new operators, but the workers aren't aware of the new operators (GlueingPartitioningOperator and PartitionSortOperator). This causes the following issue:

org.apache.druid.rpc.HttpResponseException: Server error [400 Bad Request]; body: {"error":"Please make sure to load all the necessary extensions and jars with type 'glueingPartition' on 'druid/indexer' service. Could not resolve type id 'glueingPartition' as a subtype of `org.apache.druid.query.operator.OperatorFactory` known type ids = [naivePartition, naiveSort, scan, window] (for POJO property 'operatorList')

image

This PR handles this by moving the operator transformation logic (NaiveSortOperator -> NaivePartitioningOperator -> WindowOperator to GlueingPartitioningOperator -> PartitionSortOperator -> WindowOperator) from WindowOperatorQueryKit layer to the WindowOperatorQueryFrameProcessor layer. This would allow the worker to either run the older operator chain (if they are on older version, not having the Glueing PR changes), or run the new operator chain (if they have the Glueing PR changes).

Test Plan

To test out the compatibility scenarios, I ran 2 indexers on my local setup, and validated queries for following cases:

  1. Indexer1 (controller) is on older version, indexer2 (some subset of workers) is on newer version.
  2. Indexer1 (controller) is on newer version, indexer2 (some subset of workers) is on older version.

Release note

We are marking 2 fields deprecated for window query execution for MSQ task engine. These will be removed in future releases of Druid, so the upgrade plan should involve this intermediate upgrade stage with these backward compatibility code changes.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…rFactory and WindowOperatorQueryFrameProcessor
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 29, 2024
@cryptoe cryptoe merged commit 63c91ad into apache:master Oct 30, 2024
jtuglu1 pushed a commit to jtuglu1/druid that referenced this pull request Nov 20, 2024
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants