Skip to content

Add benchmark suite for MSQ window functions#17377

Merged
cryptoe merged 3 commits intoapache:masterfrom
Akshat-Jain:msq-wf-benchmarks
Oct 30, 2024
Merged

Add benchmark suite for MSQ window functions#17377
cryptoe merged 3 commits intoapache:masterfrom
Akshat-Jain:msq-wf-benchmarks

Conversation

@Akshat-Jain
Copy link
Copy Markdown
Contributor

@Akshat-Jain Akshat-Jain commented Oct 18, 2024

Description

This PR adds a benchmark suite for MSQ window function queries.

A sample run on my local gave the following results:

Benchmark                                                         (maxNumTasks)  (rowsPerSegment)  Mode  Cnt      Score      Error  Units
MSQWindowFunctionsBenchmark.windowWithoutGroupBy                              2          20000000  avgt    5  96681.604 ± 2425.579  ms/op
MSQWindowFunctionsBenchmark.windowWithoutGroupBy                              5          20000000  avgt    5  94676.305 ± 4012.108  ms/op
MSQWindowFunctionsBenchmark.windowWithoutSorting                              2          20000000  avgt    5  17515.970 ±  498.066  ms/op
MSQWindowFunctionsBenchmark.windowWithoutSorting                              5          20000000  avgt    5  15996.262 ± 1552.218  ms/op
MSQWindowFunctionsBenchmark.multipleWindows                                   2          20000000  avgt    5   63215.499 ± 4722.604  ms/op
MSQWindowFunctionsBenchmark.multipleWindows                                   5          20000000  avgt    5   69287.847 ± 4985.326  ms/op
MSQWindowFunctionsBenchmark.windowWithHighCardinalityPartitionBy              2          20000000  avgt    5   69469.122 ± 3016.019  ms/op
MSQWindowFunctionsBenchmark.windowWithHighCardinalityPartitionBy              5          20000000  avgt    5   70951.896 ± 4343.354  ms/op
MSQWindowFunctionsBenchmark.windowWithLowCardinalityPartitionBy               2          20000000  avgt    5     507.584 ±  600.999  ms/op
MSQWindowFunctionsBenchmark.windowWithLowCardinalityPartitionBy               5          20000000  avgt    5     413.795 ±   38.195  ms/op
MSQWindowFunctionsBenchmark.windowWithSorting                                 2          20000000  avgt    5   16682.792 ±  239.561  ms/op
MSQWindowFunctionsBenchmark.windowWithSorting                                 5          20000000  avgt    5   16422.890 ±  225.643  ms/op

(Note: The above run was done with the changes of #17373, as otherwise I was running into the Channel has no capacity issue)


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - Querying Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 18, 2024
Comment thread benchmarks/pom.xml
Comment thread extensions-core/multi-stage-query/pom.xml
Comment thread sql/src/test/java/org/apache/druid/sql/calcite/SqlTestFrameworkConfig.java Outdated
Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. Few comments.

runStep.run();
}

BaseExecuteQuery execStep = (BaseExecuteQuery) runSteps.get(runSteps.size() - 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with this, why is the change required to run all runsteps for this benchmark (and not others)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryTestBuilder.results() didn't support MSQ apparently, since it only had the execute step, and not the ExtractResultsFactory step.

So, without this change, instead of getting the query results, we were getting the query ID. The ExtractResultsFactory step (added as a custom runner when declaring QueryTestBuilder) takes the query ID, and contacts the overlord client, and fetches the actual results.

(and not others)?

In the regular MSQ tests, we do testBuilder().run() which handled both steps already.

@cryptoe cryptoe merged commit 21e7e5c into apache:master Oct 30, 2024
jtuglu1 pushed a commit to jtuglu1/druid that referenced this pull request Nov 20, 2024
* Add benchmark suite for MSQ window functions

* Fix inspection checks

* Address review comment: Rename method
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants