-
Notifications
You must be signed in to change notification settings - Fork 3.8k
MSQ: Add limitHint to global-sort shuffles. #16911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,5 +65,4 @@ public int partitionCount() | |
| { | ||
| return numPartitions; | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.druid.msq.kernel; | ||
|
|
||
| import com.fasterxml.jackson.annotation.JsonInclude; | ||
|
|
||
| /** | ||
| * {@link JsonInclude} filter for {@link ShuffleSpec#limitHint()}. | ||
| * | ||
| * This API works by "creative" use of equals. It requires warnings to be suppressed | ||
| * and also requires spotbugs exclusions (see spotbugs-exclude.xml). | ||
| */ | ||
| @SuppressWarnings({"EqualsAndHashcode", "EqualsHashCode"}) | ||
| public class LimitHintJsonIncludeFilter | ||
Check failureCode scanning / CodeQL Inconsistent equals and hashCode
Class LimitHintJsonIncludeFilter overrides [equals](1) but not hashCode.
|
||
| { | ||
| @Override | ||
| public boolean equals(Object obj) | ||
| { | ||
| return obj instanceof Long && (Long) obj == ShuffleSpec.UNLIMITED; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,8 @@ | |
| }) | ||
| public interface ShuffleSpec | ||
| { | ||
| long UNLIMITED = -1; | ||
|
|
||
| /** | ||
| * The nature of this shuffle: hash vs. range based partitioning; whether the data are sorted or not. | ||
| * | ||
|
|
@@ -68,4 +70,17 @@ public interface ShuffleSpec | |
| * @throws IllegalStateException if kind is {@link ShuffleKind#GLOBAL_SORT} with more than one target partition | ||
| */ | ||
| int partitionCount(); | ||
|
|
||
| /** | ||
| * Limit that can be applied during shuffling. This is provided to enable performance optimizations. | ||
| * | ||
| * Implementations may apply this limit to each partition individually, or may apply it to the entire resultset | ||
| * (across all partitions). Either approach is valid, so downstream logic must handle either one. | ||
| * | ||
| * Implementations may also ignore this hint completely, or may apply a limit that is somewhat higher than this hint. | ||
| */ | ||
| default long limitHint() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering if there's any merit in separating the limit hint into a POJO, and having both limit and offset baked into it. Something like I was thinking of use cases when we want to pushdown this limit into other portions of MSQ's stack and want to distinguish between rows [0..offset) (thrown away) and rows [offset, offset + limit) (kept)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think offset can be pushed down? Limit can be pushed down when sorting because if some row is in the top N globally, it must also be in the first N rows of whichever partition it appears in. But with offset, if for example you have |
||
| { | ||
| return UNLIMITED; | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, how does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's this API: https://fasterxml.github.io/jackson-annotations/javadoc/2.9/com/fasterxml/jackson/annotation/JsonInclude.Include.html#CUSTOM
It's kind of goofy, but it's the only tool Jackson provides us for keeping the serialized JSON clean other than "include non-null", "include non-default", and "include non-empty".