Conversation
|
Very supportive of this in general! |
|
Closing/opening to restart teamcity |
…eath-destroyer-of-select
|
Not sure why it is failing teamcity consistently with the same errors in files I didn't modify, merged latest master in to see if it changes things. |
I did not add any special handling for this, so it will show the default error: I guess I could add a |
|
Ah, the teamcity failures are actually legitimate. The two errors are methods that only the select query engine was using: a variant of |
Ok, added back a All it cost was adding 235 lines of code 😞 |
gianm
left a comment
There was a problem hiding this comment.
+1 on the idea of removing the select query; it's been deprecated for a while, causes clusters to crash (ouch), and the scan query is good enough now to mostly replace it. Suggested some wording changes.
|
|
||
| > We encourage you to use the [Scan query](../querying/scan-query.md) type rather than Select whenever possible. | ||
| > In situations involving larger numbers of segments, the Select query can have very high memory and performance overhead. | ||
| > The native Druid Select query has been removed in favor of the [Scan query](../querying/scan-query.md), which should |
There was a problem hiding this comment.
The > make this look weird when rendered (a note/quote block). Also the sentences read strangely together. I'd suggest:
Older versions of Apache Druid (incubating) included a Select query type. Since Druid 0.17.0, it has been removed and replaced by the Scan query, which offers improved memory usage and performance. This solves issues that users had with Select queries causing Druid to run out of memory or slow down.
The Scan query has a different syntax, but supports many of the features of the Select query, including time ordering and limiting. Scan does not include the Select query's pagination feature; however, in many cases pagination is unnecessary with Scan due to its ability to return a virtually unlimited number of results in one call.
There was a problem hiding this comment.
Updated with your suggestion 👍
| |`druid.broker.cache.useResultLevelCache`|true, false|Enable result level caching on the Broker.|false| | ||
| |`druid.broker.cache.populateResultLevelCache`|true, false|Populate the result level cache on the Broker.|false| | ||
| |`druid.broker.cache.resultLevelCacheLimit`|positive integer|Maximum size of query response that can be cached.|`Integer.MAX_VALUE`| | ||
| |`druid.broker.cache.unCacheable`|All druid query types|All query types to not cache.|`["groupBy", "select"]`| |
…eath-destroyer-of-select
gianm
left a comment
There was a problem hiding this comment.
LGTM. Leaving open for the next committer to merge, as this needs one more design review.
When writing the release notes, make sure to mention that users of Druid SQL that do select queries must upgrade first to 0.15.0 or 0.16.0, then to 0.17.0, because 0.15.0 is the version that switched SQL selects to Scan. Upgrading directly from 0.14 to 0.17 will cause SQL selects to not work right during the rolling upgrade.
jihoonson
left a comment
There was a problem hiding this comment.
@clintropolis nice work! I like removing stuffs.
I can not tell if you already do this from the diff but I think not: would it make sense to have a useful error message if someone does try to issue a select query explaining that it has been removed?
I'm wondering whether we really need this. Should we remove SelectQuery at some point anyway because we cannot keep all legacy classes forever? A nice error message is always good for Druid users, but I guess this may not be needed since we will call out in the release notes and keep the doc for removed select query.
| */ | ||
| @JsonTypeName("select") | ||
| public class SelectQuery extends BaseQuery<Result<SelectResultValue>> | ||
| public class SelectQuery implements Query<Object> |
I am in this camp as well, originally the PR just removed it all, but I guess this is a nicer user experience. I do think we could remove this placeholder in a release or two after the next one. |
I would suggest keeping the error message for at least a few releases. The Select query, even though flawed, was somewhat popular and I think this message will reduce confusion in the user base. |
Description
This PR removes the Select query type, in favor of the Scan query. Since the introduction of time ordered scan query with #7133, our documentation has already been recommending people use scan queries instead of select due to the heavy memory impact of the select query, and Druid SQL has not generated select queries at all since #7373. Functionally, scan is equivalent, less the paging functionality, however the more efficient nature of the scan query makes the paging functionality un-necessary.
Additionally, I feel that removing the choice between 2 very similar query types should be more user friendly, and for code hygiene and lower maintenance for the query processing layer on our end.
This is obviously incompatible with previous releases so will need to be called out prominently in the release notes should we merge this PR.
This PR has: