Skip to content

Fix an issue with passing order by and limit to realtime tasks#15301

Merged
cryptoe merged 4 commits intoapache:masterfrom
adarshsanjeev:msq-realtime-orderby-fix
Nov 2, 2023
Merged

Fix an issue with passing order by and limit to realtime tasks#15301
cryptoe merged 4 commits intoapache:masterfrom
adarshsanjeev:msq-realtime-orderby-fix

Conversation

@adarshsanjeev
Copy link
Copy Markdown
Contributor

While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns.

If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks.

  • Fixes a bug with MSQ while reading data from real time tasks with non time ordering

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Nov 1, 2023
}).map(List::toArray);
}

private static ScanQuery prepareScanQuery(@NotNull ScanQuery scanQuery)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a Javadoc for how we are preparing the scan query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


private static ScanQuery prepareScanQuery(@NotNull ScanQuery scanQuery)
{
if (ScanQuery.Order.NONE.equals(scanQuery.getTimeOrder())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can optimize this further - Consider if we have a scan query without any ordering but with a limit. In that case, we don't need to set the limit to 0 to fetch the results. We can keep the limit as is.
Therefore instead of checking the timeOrder, we'd also need to check the ordering inside, and if its empty or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've updated the condition

@LakshSingla LakshSingla added the Bug label Nov 1, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Nov 1, 2023
Copy link
Copy Markdown
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I wonder if we can further optimize the change by planning the native query using #15241 and the MSQ query as is. However it would be difficult to generate the two plans and pass it to MSQ.

@cryptoe cryptoe merged commit 22443ab into apache:master Nov 2, 2023
adarshsanjeev added a commit to adarshsanjeev/druid that referenced this pull request Nov 2, 2023
…e#15301)

While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns.

If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks.

    Fixes a bug with MSQ while reading data from real time tasks with non time ordering
cryptoe pushed a commit that referenced this pull request Nov 2, 2023
… (#15308)

While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns.

If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks.

    Fixes a bug with MSQ while reading data from real time tasks with non time ordering
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
…e#15301)

While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns.

If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks.

    Fixes a bug with MSQ while reading data from real time tasks with non time ordering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants