Skip to content

Errors take 3#14004

Merged
cheddar merged 17 commits intoapache:masterfrom
imply-cheddar:errors-take-3
Jun 19, 2023
Merged

Errors take 3#14004
cheddar merged 17 commits intoapache:masterfrom
imply-cheddar:errors-take-3

Conversation

@imply-cheddar
Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar commented Mar 31, 2023

This is the 3rd take at creating a unified Exception in the DruidException. Given that I had input on the previous two attempts, I figured I'd take the next stab at offering something that matches the input that I had provided.

DruidException and ErrorResponse have class level javadoc discussing how it's all setup and why, so I would recommend starting there. Then branch out and look at some of the points where DruidExceptions are created (DruidPlanner and various of the other places).

There is a compatibility layer in this PR that tries to continue mimicing the old way that QueryException was getting serialized out, but the current PR treats that as best-effort. Specifically, anything that started its life as a DruidException will be an "unknown" error type when serialized back as a QueryException. The error message should make it through, but the type characterization will be different. This should only happen in the middle of roll-forward/roll-back processes, so it will be short-lived and, given that it preserves the message, is hopefully a decent compromise.

Additionally, as I was adjusting tests, I got tired of the SQL tests taking roughly 5 minutes to run through the entire suite, so I looked into why it was doing that and through moving some initialization around and fixing some time-based async tests to be based on latches instead, was able to get that timing down to ~1.5 minutes, so some of those changes are mixed in here as well.

Release note

A new unified exception for surfacing errors to users has been introduced. It is partially compatible with the old way of reporting error messages, response codes remain the same, all fields that previously existed on the response will continue to exist and be populated, including the errorMessage. But some error messages have changed to be more consumable by humans and some cases will have the message restructured. Any clients that are actively looking at the error message to make decisions on what to do should explore the changes to the error messages that they care about, but if all a client looks at is the response code, there should be no impact.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment thread server/src/main/java/org/apache/druid/server/QueryResultPusher.java Fixed
clientNoTrailingSlash = DriverManager.getConnection(
StringUtils.maybeRemoveTrailingSlash(server.url),
CalciteTests.TEST_SUPERUSER_NAME,
"druid"

Check failure

Code scanning / CodeQL

Hard-coded credential in API call

Hard-coded value flows to [sensitive API call](1).
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the design of DruidException as an internal holder for error stuff seems like a nice improvement over the current QueryException stuff, and splitting the serde out of it should allow stuff to handle things in a much more flexible manner depending on the channel the exception is going to be pushed through (nice friendly messages for users and different stuff for logs, etc).

while looking over stuff i got a bit distracted with individual error messages, so feel free to ignore most of those comments since the things largely aren't new error messages, just minor adjustments to existing things

* possible to make a message that will make sense to a different persona. Generally speaking, there is a hope
* that only DEFENSIVE error messages will target this persona.
*/
DEVELOPER
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this effectively "file a bug report"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. this is where I assumed we can have IAE or ISE thrown. Would there be equivalents of IAE, ISE that are implementation of DruidException?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be "file a bug" or it might just be "this error message is completely impenetrable, we know that, but don't have the time to make it so that we can make a useful error message, so we just mark it as intended for the DEVELOPER audience and move on". It would be great if there are no errors for this persona (or if they are basically all just DEFENSIVE exceptions)...

Comment on lines +31 to +36
* Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended
* to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal
* points where the operation that was happening cannot make forward progress. As such, the only reason to catch a
* DruidException is if the code has some extra context that it wants to add to the message of the DruidException using
* {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not
* be using the DruidException.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice description but I feel like it doesn't quite answer the questions I have as a developer who might be wanting to use this (apologies if this was covered in previous discussions/iterations of this PR...)

in what types of places should I as a developer throws these errors instead of like ISE or IAE? Should these only be used at surface layers, or is it fine for like a historical to throw one of these too? If I do throw it on a historical, should a broker do stuff to it (like merge errors from multiple hosts perhaps or decorate any additional context?) Are these intended to be used for both synchronous http responses and also async error status values that might be user facing? Just for queries or also other APIs and maybe someday ingestion tasks too?

I think clearing up some of these things will help make this actually get traction (this was also imo one of the harder parts about QueryException as well, like .. when should i actually throw one instead of throwing something else)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ISE, IAE will be thrown in places where the exception is likely due to a code bug as opposed to something that a user did. For example, in case one comes across an enum that is unknown or a type that is unknown. Maybe we have a different superclass for those exceptions. That superclass should encourage developers to include as much diagnostic information in the exception message as much as possible.

IAE, ISE --> 5xx
DruidException -> 4xx

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, IAE/ISE would disappear and all be replaced by DruidException. We can/definitely would continue to have rules that map IAE/ISE into DruidException (probably as a DEVELOPER-focused persona exception), but in the fullness of time, we'd probably want IAE/ISE to be replaced with DruidException instances.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is still a use case for them maybe for low level exceptions that don't have the full picture because they are missing some context to provide a meaningful exception at the surface. In these cases I expect the code area which does have the context will catch IAE/ISE/UOE etc and then extract the details and decorate with additional context such as column and table name, etc.

That seems preferable to me than having to push all of this information down all of the way to the bottom, particularly in more 'hot' areas of the code since passing arguments isn't completely free. Though, I suspect some of these cases are probably better to use more specific exceptions dedicated to a purpose, or I guess we could still throw DruidException but catch it at a higher level to decorate with the additional context, so I think we'll just have to see how it works out in practice.


import java.util.function.Function;

public class DMatchers
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what is DMatchers? (javadoc or better name needed?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going away from the PR and coming back to it, I have no clue what it is either :P Will look into it and update.

Comment on lines +109 to +113
private static DruidException unbound(SqlDynamicParam param)
{
return InvalidSqlInput.exception("No value bound for parameter (position [%s])", param.getIndex() + 1);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this and the rel parameterizer shuttle share?

for (Object element : list) {
if (element == null) {
throw new IAE("An array parameter cannot contain null values");
throw InvalidSqlInput.exception("An array parameter [%s] cannot contain null values", posn + 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this message seems a bit confusing, should it be ordered in some manner so it reads like "parameter in position [n] is an array but has null values which are illegal for some reason" (side note: why aren't they allowed i wonder?)

case NULL:
if (!literal.isNull()) {
throw new UnsupportedSQLQueryException("Query has a non-null constant but is of NULL type.");
throw InvalidSqlInput.exception("Expected a NULL literal, but got non-null constant [%s]", literal);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this isn't new... but can this.. happen? seems strange

Comment on lines +129 to +134
throw InvalidSqlInput.exception(
"Expression [%s] evaluates to an unsupported value [%s], expected something that"
+ " can be a Double. Consider casting with 'CAST(<col> AS BIGINT)'",
druidExpression.getExpression(),
exprResultDouble
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i know this isn't new, but these messages barely feel like user errors since it happens when calcite is trying to reduce some constant expression and we blow it for one reason or another. Looking back this seems to originally come from #11409 for the one doing numbers and then copied to arrays in #11968

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it as the expression has a return type and that return type isn't matching the input of another expression or something like that. Which is why it had the suggestion to cast... But, I was mostly just keeping what was in existence around rather than creating anything new.


package org.apache.druid.error;

public class InvalidSqlInput extends InvalidInput
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it weird this is defined in processing but sql is not here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is weird that sql is not in processing (see what I did there? :P)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, that isn't really possible without also merging druid-server into processing, or alternatively splitting druid-sql operators and stuff into processing and putting the schema and higher level stuff that builds on broker stuff in druid-server (which might also be nice, but would make testing slightly more painful maybe?)

}
}

public abstract static class Failure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: javadocs

Copy link
Copy Markdown
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some commentary that I had forgotten to publish. I think steering developers to think about persona is a good step in the right direction. I also think that similarly, we should be able to have developers also write an action associated with the error message. For example, if router can't find any brokers, we log

org.apache.druid.java.util.common.ISE: No default server found!`

This doesn't really offer much help to a new user. The error itself could have been better rewritten. But it would also be nice if it says, go and check your broker's health.

Comment on lines +31 to +36
* Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended
* to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal
* points where the operation that was happening cannot make forward progress. As such, the only reason to catch a
* DruidException is if the code has some extra context that it wants to add to the message of the DruidException using
* {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not
* be using the DruidException.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ISE, IAE will be thrown in places where the exception is likely due to a code bug as opposed to something that a user did. For example, in case one comes across an enum that is unknown or a type that is unknown. Maybe we have a different superclass for those exceptions. That superclass should encourage developers to include as much diagnostic information in the exception message as much as possible.

IAE, ISE --> 5xx
DruidException -> 4xx

* possible to make a message that will make sense to a different persona. Generally speaking, there is a hope
* that only DEFENSIVE error messages will target this persona.
*/
DEVELOPER
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. this is where I assumed we can have IAE or ISE thrown. Would there be equivalents of IAE, ISE that are implementation of DruidException?

/**
* Indicates some unsupported behavior was requested. TODO
*/
UNSUPPORTED(501),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is UNSUPPORTED a 501? shouldn't this be a 400 (Bad request)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

501 == "Not Implemented" i.e. it's using functionality that could work but hasn't been implemented yet. You could argue that it's user-error for using something that's not implemented yet. Or you could argue that it's the server's fault for not implementing the thing.

IIRC, this is pretty seldom used and it's mostly used in SQL planning stuffs for things that should exist as far as SQL is concerned, but don't yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so lets say someone issues a UNION query with de-duplicating while only supports UNION ALL. would you consider that as a 501 or 400? Such planning errors are very common. I am concerned about the error code for such scenarios because a cluster admin might have set an alert on 5xx but not on 4xx. Since the former will indicate the server being bad while the latter indicates the user being bad.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm fine with both. The 501 code is actually just a hold-over from the QueryException rather than something I'm actually introducing. If anything, reviewing this change has caused it to be more clear that we are returning 501s in some instances, leading to a question of "WTF", why? Which I think is a totally valid question ;).

Fwiw, I have a TODO in there because this is something I wanted to revisit myself. At this point, the only usages of this in the code in this PR are

https://github.com/apache/druid/pull/14004/files#diff-483e089932b8faa5ca9669607eab7fcba883d27e2899a7c9fba2f37afb9de97bR599-R602

Which is what happens when we get a query planning failure and no message at all. I don't know if/when this actually happens, so I'm unclear on whether it's actually meaningful. That said, with the switch to the DruidException, we should be able to eliminate the whole plannerContext().getPlanningError() thing that is the source of potentially not having a message (and also is the source of red-herring error messages more often than not), which should, in turn, eliminate the need for this category. Regardless, it needs to exist to maintain what QueryException is doing...

@imply-cheddar
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87

On your suggestion about encouraging an action be included. You gave a great example that explains what this PR is all about. Specifically, think of this error message

org.apache.druid.java.util.common.ISE: No default server found!

If someone said that it was "Persona.USER", I would hope that the reviewer would be like, "uhhh, why will a user know what a default server is? Your message is not well aligned with your persona" and ask for the message to be updated.

If I were to adjust that exception to this new model and not change the message, I would likely make it Persona.DEVELOPER because it's so obtuse, only a developer would be able to actually make sense of what it means. Of course, the exception is actually happening at a point where there is some behavior that is meaningful for the end user.

We could adjust the message to be an ADMIN persona message and say

Router unable to find a broker, check that brokers are up and active

Or a USER persona message and say

Cannot find a queryable server, contact your cluster administrator to validate that all services are operational

Both of these include an action, like you suggest, but that action becomes apparent and clear once we actually identify the persona. My hope is that by identifying the persona, reviewers will start to validate that the exception messages align with the intended persona and we all police each other for creating good error messages. I.e. I'm hoping that just identifying the persona is enough to help us do better.

Alternatively, for USER-facing messages, I could add the requirement for yet-another field which is "potential actions" (or something similar) which should be populated with a message describing actions that can be taken. I'd personally like to keep it to just a message and code-review based policing. But I'm also willing to add the field if I'm in the minority...

Tagging other people whose name also appears in the conversation part of this PR to see if there are opinions one way or the other: @gianm @kfaraz @clintropolis

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Now I am also wondering how we will pick a persona :) For example, in this case, I would like to tell the user that you really need to go and check the logs of the broker service. The action of checking error logs is something that an admin persona can do. But most people when they are setting up druid for the first time, are end-users as well as an admin. The error occurs directly in response to something that an end-user is doing.

I can return a user-persona-facing error saying that "Contact your cluster administrator to make sure that services are operational" but it's difficult for new users to figure out where to go from there.

So, in this case, maybe we return an error message meant for the admin persona even though this error is in user-initiated request-response path. We leave it to cluster admins if they want to filter messages that match admin persona, in case they are not exposing druid directly. For a new user setting up druid for first time, however they will get all these messages unfiltered. If you have something similar in mind, we should ideally document that developers should avoid adding potentially sensitive information in messages that are meant for user persona.

@imply-cheddar
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87 once again, you've provided a great example! Essentially, you went through the thought process that I'm hoping we can just do as a part of every exception message that we generate.

In this case, making it ADMIN-focused and having more details about server names and stuff makes a lot of sense to me. And the argument that someone who is trying to "hide" Druid should be able to siphon off this message makes a lot of sense to me too. So, I think that I'm now convinced that the message in question should be targeting the ADMIN persona and specifically mention that you should check that the brokers are up and running.

To me, this type of conversation, back-and-forth and your eventual conclusion is the whole goal of the adjustments to the error messages. And, once we change that exception to be ADMIN focused, the code will include the fact that this is an ADMIN-focused message. So any future person who comes across it can clearly see the intentions of who gets the message and why, if they feel the message needs updating, they can have their own internal dialog and motivate their feelings by either saying it should target a different persona OR it needs more details to target the current persona.

So anyway, I think that yes, you are thinking along the same lines that I'm thinking at this point. I'm all for adding documentation about whatever, but I'm unclear on where you think it would be added for the most impact. Ultimately, this has to become part of the culture of the project rather than be a pure documentation thing, but I can document it anywhere you think it can help ;)

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Re culture, I agree. I think we add these bits in the APIs javadocs, that should be more than enough. And then reviewers can just say, "The error message isn't so great. please read the javadocs for reference"

Comment on lines +422 to +435
() -> new DataSchema(
"",
parser,
new AggregatorFactory[]{
new DoubleSumAggregatorFactory("metric1", "col1"),
new DoubleSumAggregatorFactory("metric2", "col2"),
},
new ArbitraryGranularitySpec(
Granularities.DAY,
ImmutableList.of(Intervals.of("2014/2015"))
),
null,
jsonMapper
));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [DataSchema.DataSchema](1) should be avoided because it has been deprecated.
Comment on lines +455 to +462
() -> new DataSchema(
dataSource,
Collections.emptyMap(),
null,
null,
null,
jsonMapper
)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [DataSchema.DataSchema](1) should be avoided because it has been deprecated.
@imply-cheddar
Copy link
Copy Markdown
Contributor Author

Okay, had a clean bill. Aside from some static checks. Going to push those up. @abhishekagarwal87 @clintropolis I'd love to get this merged today or tomorrow if possible, please do what you need to do to feel comfortable approving.

image

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor questions, but I'll go ahead and throw a +1 on this since it seems an overall improvement 🤘

Comment on lines +31 to +36
* Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended
* to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal
* points where the operation that was happening cannot make forward progress. As such, the only reason to catch a
* DruidException is if the code has some extra context that it wants to add to the message of the DruidException using
* {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not
* be using the DruidException.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is still a use case for them maybe for low level exceptions that don't have the full picture because they are missing some context to provide a meaningful exception at the surface. In these cases I expect the code area which does have the context will catch IAE/ISE/UOE etc and then extract the details and decorate with additional context such as column and table name, etc.

That seems preferable to me than having to push all of this information down all of the way to the bottom, particularly in more 'hot' areas of the code since passing arguments isn't completely free. Though, I suspect some of these cases are probably better to use more specific exceptions dedicated to a purpose, or I guess we could still throw DruidException but catch it at a higher level to decorate with the additional context, so I think we'll just have to see how it works out in practice.

}

@Nullable
private static String nullOrString(Object o)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a similar existing method, Evals.asString though its primarily used for processing data for expressions and query time stuff, so i don't feel super strongly about using it here or not


package org.apache.druid.error;

public class InvalidSqlInput extends InvalidInput
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, that isn't really possible without also merging druid-server into processing, or alternatively splitting druid-sql operators and stuff into processing and putting the schema and higher level stuff that builds on broker stuff in druid-server (which might also be nice, but would make testing slightly more painful maybe?)

-->

WARNING WARNING
TODO: this README has not been adjusted to align with the current code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still true? (maybe it should be refreshed to align with the current code before we merge?)

also any possible way to add something like a 'tl;dr' to summarize the important parts? 😛


import java.util.Map;

public class ErrorResponseTest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since ErrorResponse is an 'over the wire' thingo, should serde tests be here? (or did i miss them somewhere?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they are covered primarily in the server tests. This test also covers it some (by validating asMap, which is what eventually gets serialized), but this test was created primarily just to get coverage inside of the processing package, it doesn't actually validate much more than what the other tests already do from the server package.

final Response response = expectSynchronousRequestFlow(SIMPLE_TIMESERIES_QUERY);
Assert.assertEquals(Status.INTERNAL_SERVER_ERROR.getStatusCode(), response.getStatus());

final ErrorResponse entity = (ErrorResponse) response.getEntity();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i guess this is the serde test i was looking for?

Comment on lines +323 to +325
try {
throw e;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to just make all of these instanceof checks, especially since some of the catch still have instanceof?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, I think I started doing that and then stopped for some reason, I don't remember why though.

@clintropolis clintropolis added Ease of Use Area - Operations Area - Dev For items related to the project itself, like dev docs and checklists, but not CI labels Jun 16, 2023
@clintropolis
Copy link
Copy Markdown
Member

also, should there be any web-console changes to go with the adjustments to error response?

Copy link
Copy Markdown
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 since I don't have major blocker comments and this PR is prone to conflicts. @clintropolis does have a point around console changes. Ideally, there shouldn't be any because if there are, we likely broke compatibility somewhere. Or console is doing things it shouldn't do.

"It cannot be used directly on a column or on a scalar expression.");
assertQueryIsUnplannable(
"SELECT DS_TUPLE_DOUBLES_INTERSECT(NULL, NULL) FROM foo",
"DS_TUPLE_DOUBLES_INTERSECT can only be used on aggregates. " +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we had "Possible error" as a prefix in query planning errors because they are in a way best-effort guesses. During planning phase, we record instances of the planner trying to do something unsupported. and if there is nothing else to show, we just return that instance as an error. But its possible that planner didn't really need to go down that path or it tried to same very fancy planning and we end up surfacing that error instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I think that this got replaced with "Query planning failed for unknown reason, our best guess is this"

"insert into foo1 select __time, dim1 , count(*) as cnt from foo where dim1 is not null group by 1, 2 clustered by dim1"
)
.setExpectedValidationErrorMatcher(invalidSqlContains(
"LUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"LUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause"
"CLUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause"

Comment on lines 1077 to 1079
CoreMatchers.instanceOf(DruidException.class),
ThrowableMessageMatcher.hasMessage(CoreMatchers.containsString(
"INSERT and REPLACE queries cannot have a LIMIT unless PARTITIONED BY is \"ALL\""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these be simplified into invalidSqlContains as well? It's find if that refactoring was a bit too much and you gave up on it after a few :P

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could. I'm not sure which, but you are either seeing stuff left behind when I first started converting tests and hadn't yet build the Matcher to make it simpler (and I missed it in going back to simplify). OR, with fixing the conflicts after 2 months dormant, I will admit that I didn't have all of the context still in my head and perhaps did something a bit rote. Either way, it's much simpler to switch to the invalidSqlContains().

"LATEST() aggregator depends on __time column"))
new DruidExceptionMatcher(DruidException.Persona.ADMIN, DruidException.Category.INVALID_INPUT, "adhoc")
.expectMessageIs(
"Query planning failed for unknown reason, our best guess is this "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Query planning failed for unknown reason, most likely the reason is this"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm anti that messaging because it pretends like the following error message has a high likelihood of being the cause. My experience has told me that this is rarely the case and most likely this error message means "something happened, I'm gonna tell you something that is just as likely to be wrong as it is right, good luck". So I don't want to pretend like it's giving people a good answer when it generally is not.

.setQueryContext(context)
.setExpectedExecutionErrorMatcher(CoreMatchers.allOf(
CoreMatchers.instanceOf(UnsupportedSQLQueryException.class),
CoreMatchers.instanceOf(DruidException.class),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we use one exception class to rule them all, were there places where there was exception-class-specific handling? I am wondering how you dealt with those. Or maybe I will find out myself as I move below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically various of the SQL exceptions get caught and used to tell the volcano planner that it generated a bad query and to keep looking. None of those are converted to DruidException yet. Also, most of the places that were throwing QueryException are still throwing QueryException because we need a release that has the DruidException in it before we can force those forward.

So... really, any place that potentially has other uses just didn't get converted. We will need to go back and review them with the understanding that we now have an exception whose sole purposes is to be delivered back to the user and decide if the current handling is good.

/**
* A generic exception thrown by Druid.
*/
@Deprecated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a note that it's deprecated in favor of the other DruidException?


import org.apache.druid.query.QueryException;

public class QueryExceptionCompat extends DruidException.Failure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class could use some javadocs.

paul-rogers and others added 11 commits June 19, 2023 12:05
Parser errors, validation errors, ad-hoc errors
Moved common error codes into DruidException methods
Revisions from review comments
Incorporates review feedback
This adds a singular DruidException that is intended
to be used to represent all terminal states and error
messages that should result in responses delivered to
the user.  Javadocs exist on DruidException itself.
@cheddar cheddar merged commit cfd07a9 into apache:master Jun 19, 2023
@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jun 19, 2023

The last run of the tests was near complete (there processing tests for jdk11 and jdk17 still running, everything else, including ITs had passed), so I went ahead and merged instead of risk another set of conflicts cropping up and causing yet another CI run.

@vogievetsky vogievetsky added the Needs web console change Backend API changes that would benefit from frontend support in the web console label Jun 21, 2023
abhishekagarwal87 pushed a commit that referenced this pull request Jul 17, 2023
This PR catches the console up to all the backend changes for Druid 27

Specifically:

Add page information to SqlStatementResource API #14512
Allow empty tiered replicants map for load rules #14432
Adding Interactive API's for MSQ engine #14416
Add replication factor column to sys table #14403
Account for data format and compression in MSQ auto taskAssignment #14307
Errors take 3 #14004
AmatyaAvadhanula pushed a commit to AmatyaAvadhanula/druid that referenced this pull request Jul 17, 2023
This PR catches the console up to all the backend changes for Druid 27

Specifically:

Add page information to SqlStatementResource API apache#14512
Allow empty tiered replicants map for load rules apache#14432
Adding Interactive API's for MSQ engine apache#14416
Add replication factor column to sys table apache#14403
Account for data format and compression in MSQ auto taskAssignment apache#14307
Errors take 3 apache#14004
abhishekagarwal87 pushed a commit that referenced this pull request Jul 17, 2023
This PR catches the console up to all the backend changes for Druid 27

Specifically:

Add page information to SqlStatementResource API #14512
Allow empty tiered replicants map for load rules #14432
Adding Interactive API's for MSQ engine #14416
Add replication factor column to sys table #14403
Account for data format and compression in MSQ auto taskAssignment #14307
Errors take 3 #14004

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
This PR catches the console up to all the backend changes for Druid 27

Specifically:

Add page information to SqlStatementResource API apache#14512
Allow empty tiered replicants map for load rules apache#14432
Adding Interactive API's for MSQ engine apache#14416
Add replication factor column to sys table apache#14403
Account for data format and compression in MSQ auto taskAssignment apache#14307
Errors take 3 apache#14004
@vogievetsky vogievetsky removed the Needs web console change Backend API changes that would benefit from frontend support in the web console label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Dev For items related to the project itself, like dev docs and checklists, but not CI Area - Operations Ease of Use

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants