Conversation
19596ed to
27a13f7
Compare
| clientNoTrailingSlash = DriverManager.getConnection( | ||
| StringUtils.maybeRemoveTrailingSlash(server.url), | ||
| CalciteTests.TEST_SUPERUSER_NAME, | ||
| "druid" |
Check failure
Code scanning / CodeQL
Hard-coded credential in API call
clintropolis
left a comment
There was a problem hiding this comment.
the design of DruidException as an internal holder for error stuff seems like a nice improvement over the current QueryException stuff, and splitting the serde out of it should allow stuff to handle things in a much more flexible manner depending on the channel the exception is going to be pushed through (nice friendly messages for users and different stuff for logs, etc).
while looking over stuff i got a bit distracted with individual error messages, so feel free to ignore most of those comments since the things largely aren't new error messages, just minor adjustments to existing things
| * possible to make a message that will make sense to a different persona. Generally speaking, there is a hope | ||
| * that only DEFENSIVE error messages will target this persona. | ||
| */ | ||
| DEVELOPER |
There was a problem hiding this comment.
is this effectively "file a bug report"?
There was a problem hiding this comment.
ok. this is where I assumed we can have IAE or ISE thrown. Would there be equivalents of IAE, ISE that are implementation of DruidException?
There was a problem hiding this comment.
It might be "file a bug" or it might just be "this error message is completely impenetrable, we know that, but don't have the time to make it so that we can make a useful error message, so we just mark it as intended for the DEVELOPER audience and move on". It would be great if there are no errors for this persona (or if they are basically all just DEFENSIVE exceptions)...
| * Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended | ||
| * to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal | ||
| * points where the operation that was happening cannot make forward progress. As such, the only reason to catch a | ||
| * DruidException is if the code has some extra context that it wants to add to the message of the DruidException using | ||
| * {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not | ||
| * be using the DruidException. |
There was a problem hiding this comment.
Nice description but I feel like it doesn't quite answer the questions I have as a developer who might be wanting to use this (apologies if this was covered in previous discussions/iterations of this PR...)
in what types of places should I as a developer throws these errors instead of like ISE or IAE? Should these only be used at surface layers, or is it fine for like a historical to throw one of these too? If I do throw it on a historical, should a broker do stuff to it (like merge errors from multiple hosts perhaps or decorate any additional context?) Are these intended to be used for both synchronous http responses and also async error status values that might be user facing? Just for queries or also other APIs and maybe someday ingestion tasks too?
I think clearing up some of these things will help make this actually get traction (this was also imo one of the harder parts about QueryException as well, like .. when should i actually throw one instead of throwing something else)
There was a problem hiding this comment.
I think that ISE, IAE will be thrown in places where the exception is likely due to a code bug as opposed to something that a user did. For example, in case one comes across an enum that is unknown or a type that is unknown. Maybe we have a different superclass for those exceptions. That superclass should encourage developers to include as much diagnostic information in the exception message as much as possible.
IAE, ISE --> 5xx
DruidException -> 4xx
There was a problem hiding this comment.
Generally speaking, IAE/ISE would disappear and all be replaced by DruidException. We can/definitely would continue to have rules that map IAE/ISE into DruidException (probably as a DEVELOPER-focused persona exception), but in the fullness of time, we'd probably want IAE/ISE to be replaced with DruidException instances.
There was a problem hiding this comment.
i think there is still a use case for them maybe for low level exceptions that don't have the full picture because they are missing some context to provide a meaningful exception at the surface. In these cases I expect the code area which does have the context will catch IAE/ISE/UOE etc and then extract the details and decorate with additional context such as column and table name, etc.
That seems preferable to me than having to push all of this information down all of the way to the bottom, particularly in more 'hot' areas of the code since passing arguments isn't completely free. Though, I suspect some of these cases are probably better to use more specific exceptions dedicated to a purpose, or I guess we could still throw DruidException but catch it at a higher level to decorate with the additional context, so I think we'll just have to see how it works out in practice.
|
|
||
| import java.util.function.Function; | ||
|
|
||
| public class DMatchers |
There was a problem hiding this comment.
nit: what is DMatchers? (javadoc or better name needed?)
There was a problem hiding this comment.
Going away from the PR and coming back to it, I have no clue what it is either :P Will look into it and update.
| private static DruidException unbound(SqlDynamicParam param) | ||
| { | ||
| return InvalidSqlInput.exception("No value bound for parameter (position [%s])", param.getIndex() + 1); | ||
| } | ||
|
|
There was a problem hiding this comment.
nit: should this and the rel parameterizer shuttle share?
| for (Object element : list) { | ||
| if (element == null) { | ||
| throw new IAE("An array parameter cannot contain null values"); | ||
| throw InvalidSqlInput.exception("An array parameter [%s] cannot contain null values", posn + 1); |
There was a problem hiding this comment.
hmm, this message seems a bit confusing, should it be ordered in some manner so it reads like "parameter in position [n] is an array but has null values which are illegal for some reason" (side note: why aren't they allowed i wonder?)
| case NULL: | ||
| if (!literal.isNull()) { | ||
| throw new UnsupportedSQLQueryException("Query has a non-null constant but is of NULL type."); | ||
| throw InvalidSqlInput.exception("Expected a NULL literal, but got non-null constant [%s]", literal); |
There was a problem hiding this comment.
i know this isn't new... but can this.. happen? seems strange
| throw InvalidSqlInput.exception( | ||
| "Expression [%s] evaluates to an unsupported value [%s], expected something that" | ||
| + " can be a Double. Consider casting with 'CAST(<col> AS BIGINT)'", | ||
| druidExpression.getExpression(), | ||
| exprResultDouble | ||
| ); |
There was a problem hiding this comment.
hmm, i know this isn't new, but these messages barely feel like user errors since it happens when calcite is trying to reduce some constant expression and we blow it for one reason or another. Looking back this seems to originally come from #11409 for the one doing numbers and then copied to arrays in #11968
There was a problem hiding this comment.
I took it as the expression has a return type and that return type isn't matching the input of another expression or something like that. Which is why it had the suggestion to cast... But, I was mostly just keeping what was in existence around rather than creating anything new.
|
|
||
| package org.apache.druid.error; | ||
|
|
||
| public class InvalidSqlInput extends InvalidInput |
There was a problem hiding this comment.
is it weird this is defined in processing but sql is not here?
There was a problem hiding this comment.
Yes, it is weird that sql is not in processing (see what I did there? :P)
There was a problem hiding this comment.
heh, that isn't really possible without also merging druid-server into processing, or alternatively splitting druid-sql operators and stuff into processing and putting the schema and higher level stuff that builds on broker stuff in druid-server (which might also be nice, but would make testing slightly more painful maybe?)
| } | ||
| } | ||
|
|
||
| public abstract static class Failure |
abhishekagarwal87
left a comment
There was a problem hiding this comment.
some commentary that I had forgotten to publish. I think steering developers to think about persona is a good step in the right direction. I also think that similarly, we should be able to have developers also write an action associated with the error message. For example, if router can't find any brokers, we log
org.apache.druid.java.util.common.ISE: No default server found!`
This doesn't really offer much help to a new user. The error itself could have been better rewritten. But it would also be nice if it says, go and check your broker's health.
| * Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended | ||
| * to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal | ||
| * points where the operation that was happening cannot make forward progress. As such, the only reason to catch a | ||
| * DruidException is if the code has some extra context that it wants to add to the message of the DruidException using | ||
| * {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not | ||
| * be using the DruidException. |
There was a problem hiding this comment.
I think that ISE, IAE will be thrown in places where the exception is likely due to a code bug as opposed to something that a user did. For example, in case one comes across an enum that is unknown or a type that is unknown. Maybe we have a different superclass for those exceptions. That superclass should encourage developers to include as much diagnostic information in the exception message as much as possible.
IAE, ISE --> 5xx
DruidException -> 4xx
| * possible to make a message that will make sense to a different persona. Generally speaking, there is a hope | ||
| * that only DEFENSIVE error messages will target this persona. | ||
| */ | ||
| DEVELOPER |
There was a problem hiding this comment.
ok. this is where I assumed we can have IAE or ISE thrown. Would there be equivalents of IAE, ISE that are implementation of DruidException?
| /** | ||
| * Indicates some unsupported behavior was requested. TODO | ||
| */ | ||
| UNSUPPORTED(501), |
There was a problem hiding this comment.
why is UNSUPPORTED a 501? shouldn't this be a 400 (Bad request)
There was a problem hiding this comment.
501 == "Not Implemented" i.e. it's using functionality that could work but hasn't been implemented yet. You could argue that it's user-error for using something that's not implemented yet. Or you could argue that it's the server's fault for not implementing the thing.
IIRC, this is pretty seldom used and it's mostly used in SQL planning stuffs for things that should exist as far as SQL is concerned, but don't yet.
There was a problem hiding this comment.
so lets say someone issues a UNION query with de-duplicating while only supports UNION ALL. would you consider that as a 501 or 400? Such planning errors are very common. I am concerned about the error code for such scenarios because a cluster admin might have set an alert on 5xx but not on 4xx. Since the former will indicate the server being bad while the latter indicates the user being bad.
There was a problem hiding this comment.
TBH, I'm fine with both. The 501 code is actually just a hold-over from the QueryException rather than something I'm actually introducing. If anything, reviewing this change has caused it to be more clear that we are returning 501s in some instances, leading to a question of "WTF", why? Which I think is a totally valid question ;).
Fwiw, I have a TODO in there because this is something I wanted to revisit myself. At this point, the only usages of this in the code in this PR are
Which is what happens when we get a query planning failure and no message at all. I don't know if/when this actually happens, so I'm unclear on whether it's actually meaningful. That said, with the switch to the DruidException, we should be able to eliminate the whole plannerContext().getPlanningError() thing that is the source of potentially not having a message (and also is the source of red-herring error messages more often than not), which should, in turn, eliminate the need for this category. Regardless, it needs to exist to maintain what QueryException is doing...
|
On your suggestion about encouraging an action be included. You gave a great example that explains what this PR is all about. Specifically, think of this error message If someone said that it was "Persona.USER", I would hope that the reviewer would be like, "uhhh, why will a user know what a default server is? Your message is not well aligned with your persona" and ask for the message to be updated. If I were to adjust that exception to this new model and not change the message, I would likely make it We could adjust the message to be an ADMIN persona message and say Or a USER persona message and say Both of these include an action, like you suggest, but that action becomes apparent and clear once we actually identify the persona. My hope is that by identifying the persona, reviewers will start to validate that the exception messages align with the intended persona and we all police each other for creating good error messages. I.e. I'm hoping that just identifying the persona is enough to help us do better. Alternatively, for USER-facing messages, I could add the requirement for yet-another field which is "potential actions" (or something similar) which should be populated with a message describing actions that can be taken. I'd personally like to keep it to just a message and code-review based policing. But I'm also willing to add the field if I'm in the minority... Tagging other people whose name also appears in the conversation part of this PR to see if there are opinions one way or the other: @gianm @kfaraz @clintropolis |
|
Now I am also wondering how we will pick a persona :) For example, in this case, I would like to tell the user that you really need to go and check the logs of the broker service. The action of checking error logs is something that an admin persona can do. But most people when they are setting up druid for the first time, are end-users as well as an admin. The error occurs directly in response to something that an end-user is doing. I can return a user-persona-facing error saying that "Contact your cluster administrator to make sure that services are operational" but it's difficult for new users to figure out where to go from there. So, in this case, maybe we return an error message meant for the admin persona even though this error is in user-initiated request-response path. We leave it to cluster admins if they want to filter messages that match admin persona, in case they are not exposing druid directly. For a new user setting up druid for first time, however they will get all these messages unfiltered. If you have something similar in mind, we should ideally document that developers should avoid adding potentially sensitive information in messages that are meant for user persona. |
|
@abhishekagarwal87 once again, you've provided a great example! Essentially, you went through the thought process that I'm hoping we can just do as a part of every exception message that we generate. In this case, making it ADMIN-focused and having more details about server names and stuff makes a lot of sense to me. And the argument that someone who is trying to "hide" Druid should be able to siphon off this message makes a lot of sense to me too. So, I think that I'm now convinced that the message in question should be targeting the ADMIN persona and specifically mention that you should check that the brokers are up and running. To me, this type of conversation, back-and-forth and your eventual conclusion is the whole goal of the adjustments to the error messages. And, once we change that exception to be ADMIN focused, the code will include the fact that this is an ADMIN-focused message. So any future person who comes across it can clearly see the intentions of who gets the message and why, if they feel the message needs updating, they can have their own internal dialog and motivate their feelings by either saying it should target a different persona OR it needs more details to target the current persona. So anyway, I think that yes, you are thinking along the same lines that I'm thinking at this point. I'm all for adding documentation about whatever, but I'm unclear on where you think it would be added for the most impact. Ultimately, this has to become part of the culture of the project rather than be a pure documentation thing, but I can document it anywhere you think it can help ;) |
|
Re culture, I agree. I think we add these bits in the APIs javadocs, that should be more than enough. And then reviewers can just say, "The error message isn't so great. please read the javadocs for reference" |
27a13f7 to
949d545
Compare
| () -> new DataSchema( | ||
| "", | ||
| parser, | ||
| new AggregatorFactory[]{ | ||
| new DoubleSumAggregatorFactory("metric1", "col1"), | ||
| new DoubleSumAggregatorFactory("metric2", "col2"), | ||
| }, | ||
| new ArbitraryGranularitySpec( | ||
| Granularities.DAY, | ||
| ImmutableList.of(Intervals.of("2014/2015")) | ||
| ), | ||
| null, | ||
| jsonMapper | ||
| )); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| () -> new DataSchema( | ||
| dataSource, | ||
| Collections.emptyMap(), | ||
| null, | ||
| null, | ||
| null, | ||
| jsonMapper | ||
| ) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
|
Okay, had a clean bill. Aside from some static checks. Going to push those up. @abhishekagarwal87 @clintropolis I'd love to get this merged today or tomorrow if possible, please do what you need to do to feel comfortable approving.
|
clintropolis
left a comment
There was a problem hiding this comment.
a few minor questions, but I'll go ahead and throw a +1 on this since it seems an overall improvement 🤘
| * Represents an error condition exposed to the user and/or operator of Druid. Given that a DruidException is intended | ||
| * to be delivered to the end user, it should generally never be caught. DruidExceptions are generated at terminal | ||
| * points where the operation that was happening cannot make forward progress. As such, the only reason to catch a | ||
| * DruidException is if the code has some extra context that it wants to add to the message of the DruidException using | ||
| * {@link #prependAndBuild(String, Object...)}. If code wants to catch and handle an exception instead, it should not | ||
| * be using the DruidException. |
There was a problem hiding this comment.
i think there is still a use case for them maybe for low level exceptions that don't have the full picture because they are missing some context to provide a meaningful exception at the surface. In these cases I expect the code area which does have the context will catch IAE/ISE/UOE etc and then extract the details and decorate with additional context such as column and table name, etc.
That seems preferable to me than having to push all of this information down all of the way to the bottom, particularly in more 'hot' areas of the code since passing arguments isn't completely free. Though, I suspect some of these cases are probably better to use more specific exceptions dedicated to a purpose, or I guess we could still throw DruidException but catch it at a higher level to decorate with the additional context, so I think we'll just have to see how it works out in practice.
| } | ||
|
|
||
| @Nullable | ||
| private static String nullOrString(Object o) |
There was a problem hiding this comment.
there is a similar existing method, Evals.asString though its primarily used for processing data for expressions and query time stuff, so i don't feel super strongly about using it here or not
|
|
||
| package org.apache.druid.error; | ||
|
|
||
| public class InvalidSqlInput extends InvalidInput |
There was a problem hiding this comment.
heh, that isn't really possible without also merging druid-server into processing, or alternatively splitting druid-sql operators and stuff into processing and putting the schema and higher level stuff that builds on broker stuff in druid-server (which might also be nice, but would make testing slightly more painful maybe?)
| --> | ||
|
|
||
| WARNING WARNING | ||
| TODO: this README has not been adjusted to align with the current code |
There was a problem hiding this comment.
is this still true? (maybe it should be refreshed to align with the current code before we merge?)
also any possible way to add something like a 'tl;dr' to summarize the important parts? 😛
|
|
||
| import java.util.Map; | ||
|
|
||
| public class ErrorResponseTest |
There was a problem hiding this comment.
since ErrorResponse is an 'over the wire' thingo, should serde tests be here? (or did i miss them somewhere?)
There was a problem hiding this comment.
yeah, they are covered primarily in the server tests. This test also covers it some (by validating asMap, which is what eventually gets serialized), but this test was created primarily just to get coverage inside of the processing package, it doesn't actually validate much more than what the other tests already do from the server package.
| final Response response = expectSynchronousRequestFlow(SIMPLE_TIMESERIES_QUERY); | ||
| Assert.assertEquals(Status.INTERNAL_SERVER_ERROR.getStatusCode(), response.getStatus()); | ||
|
|
||
| final ErrorResponse entity = (ErrorResponse) response.getEntity(); |
There was a problem hiding this comment.
ah i guess this is the serde test i was looking for?
| try { | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
any reason not to just make all of these instanceof checks, especially since some of the catch still have instanceof?
There was a problem hiding this comment.
not really, I think I started doing that and then stopped for some reason, I don't remember why though.
|
also, should there be any web-console changes to go with the adjustments to error response? |
abhishekagarwal87
left a comment
There was a problem hiding this comment.
+1 since I don't have major blocker comments and this PR is prone to conflicts. @clintropolis does have a point around console changes. Ideally, there shouldn't be any because if there are, we likely broke compatibility somewhere. Or console is doing things it shouldn't do.
| "It cannot be used directly on a column or on a scalar expression."); | ||
| assertQueryIsUnplannable( | ||
| "SELECT DS_TUPLE_DOUBLES_INTERSECT(NULL, NULL) FROM foo", | ||
| "DS_TUPLE_DOUBLES_INTERSECT can only be used on aggregates. " + |
There was a problem hiding this comment.
btw we had "Possible error" as a prefix in query planning errors because they are in a way best-effort guesses. During planning phase, we record instances of the planner trying to do something unsupported. and if there is nothing else to show, we just return that instance as an error. But its possible that planner didn't really need to go down that path or it tried to same very fancy planning and we end up surfacing that error instead.
There was a problem hiding this comment.
ok. I think that this got replaced with "Query planning failed for unknown reason, our best guess is this"
| "insert into foo1 select __time, dim1 , count(*) as cnt from foo where dim1 is not null group by 1, 2 clustered by dim1" | ||
| ) | ||
| .setExpectedValidationErrorMatcher(invalidSqlContains( | ||
| "LUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause" |
There was a problem hiding this comment.
| "LUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause" | |
| "CLUSTERED BY found before PARTITIONED BY, CLUSTERED BY must come after the PARTITIONED BY clause" |
| CoreMatchers.instanceOf(DruidException.class), | ||
| ThrowableMessageMatcher.hasMessage(CoreMatchers.containsString( | ||
| "INSERT and REPLACE queries cannot have a LIMIT unless PARTITIONED BY is \"ALL\"")) |
There was a problem hiding this comment.
could these be simplified into invalidSqlContains as well? It's find if that refactoring was a bit too much and you gave up on it after a few :P
There was a problem hiding this comment.
Yes, it could. I'm not sure which, but you are either seeing stuff left behind when I first started converting tests and hadn't yet build the Matcher to make it simpler (and I missed it in going back to simplify). OR, with fixing the conflicts after 2 months dormant, I will admit that I didn't have all of the context still in my head and perhaps did something a bit rote. Either way, it's much simpler to switch to the invalidSqlContains().
| "LATEST() aggregator depends on __time column")) | ||
| new DruidExceptionMatcher(DruidException.Persona.ADMIN, DruidException.Category.INVALID_INPUT, "adhoc") | ||
| .expectMessageIs( | ||
| "Query planning failed for unknown reason, our best guess is this " |
There was a problem hiding this comment.
How about "Query planning failed for unknown reason, most likely the reason is this"
There was a problem hiding this comment.
I'm anti that messaging because it pretends like the following error message has a high likelihood of being the cause. My experience has told me that this is rarely the case and most likely this error message means "something happened, I'm gonna tell you something that is just as likely to be wrong as it is right, good luck". So I don't want to pretend like it's giving people a good answer when it generally is not.
| .setQueryContext(context) | ||
| .setExpectedExecutionErrorMatcher(CoreMatchers.allOf( | ||
| CoreMatchers.instanceOf(UnsupportedSQLQueryException.class), | ||
| CoreMatchers.instanceOf(DruidException.class), |
There was a problem hiding this comment.
As we use one exception class to rule them all, were there places where there was exception-class-specific handling? I am wondering how you dealt with those. Or maybe I will find out myself as I move below.
There was a problem hiding this comment.
Technically various of the SQL exceptions get caught and used to tell the volcano planner that it generated a bad query and to keep looking. None of those are converted to DruidException yet. Also, most of the places that were throwing QueryException are still throwing QueryException because we need a release that has the DruidException in it before we can force those forward.
So... really, any place that potentially has other uses just didn't get converted. We will need to go back and review them with the understanding that we now have an exception whose sole purposes is to be delivered back to the user and decide if the current handling is good.
| /** | ||
| * A generic exception thrown by Druid. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
can you also add a note that it's deprecated in favor of the other DruidException?
|
|
||
| import org.apache.druid.query.QueryException; | ||
|
|
||
| public class QueryExceptionCompat extends DruidException.Failure |
There was a problem hiding this comment.
this class could use some javadocs.
Parser errors, validation errors, ad-hoc errors
Moved common error codes into DruidException methods Revisions from review comments
Incorporates review feedback
This adds a singular DruidException that is intended to be used to represent all terminal states and error messages that should result in responses delivered to the user. Javadocs exist on DruidException itself.
6ffbb9d to
9d50c1e
Compare
|
The last run of the tests was near complete (there processing tests for jdk11 and jdk17 still running, everything else, including ITs had passed), so I went ahead and merged instead of risk another set of conflicts cropping up and causing yet another CI run. |
This PR catches the console up to all the backend changes for Druid 27 Specifically: Add page information to SqlStatementResource API #14512 Allow empty tiered replicants map for load rules #14432 Adding Interactive API's for MSQ engine #14416 Add replication factor column to sys table #14403 Account for data format and compression in MSQ auto taskAssignment #14307 Errors take 3 #14004
This PR catches the console up to all the backend changes for Druid 27 Specifically: Add page information to SqlStatementResource API apache#14512 Allow empty tiered replicants map for load rules apache#14432 Adding Interactive API's for MSQ engine apache#14416 Add replication factor column to sys table apache#14403 Account for data format and compression in MSQ auto taskAssignment apache#14307 Errors take 3 apache#14004
This PR catches the console up to all the backend changes for Druid 27 Specifically: Add page information to SqlStatementResource API #14512 Allow empty tiered replicants map for load rules #14432 Adding Interactive API's for MSQ engine #14416 Add replication factor column to sys table #14403 Account for data format and compression in MSQ auto taskAssignment #14307 Errors take 3 #14004 Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
This PR catches the console up to all the backend changes for Druid 27 Specifically: Add page information to SqlStatementResource API apache#14512 Allow empty tiered replicants map for load rules apache#14432 Adding Interactive API's for MSQ engine apache#14416 Add replication factor column to sys table apache#14403 Account for data format and compression in MSQ auto taskAssignment apache#14307 Errors take 3 apache#14004

This is the 3rd take at creating a unified Exception in the DruidException. Given that I had input on the previous two attempts, I figured I'd take the next stab at offering something that matches the input that I had provided.
DruidExceptionandErrorResponsehave class level javadoc discussing how it's all setup and why, so I would recommend starting there. Then branch out and look at some of the points where DruidExceptions are created (DruidPlanner and various of the other places).There is a compatibility layer in this PR that tries to continue mimicing the old way that QueryException was getting serialized out, but the current PR treats that as best-effort. Specifically, anything that started its life as a DruidException will be an "unknown" error type when serialized back as a QueryException. The error message should make it through, but the type characterization will be different. This should only happen in the middle of roll-forward/roll-back processes, so it will be short-lived and, given that it preserves the message, is hopefully a decent compromise.
Additionally, as I was adjusting tests, I got tired of the SQL tests taking roughly 5 minutes to run through the entire suite, so I looked into why it was doing that and through moving some initialization around and fixing some time-based async tests to be based on latches instead, was able to get that timing down to ~1.5 minutes, so some of those changes are mixed in here as well.
Release note
A new unified exception for surfacing errors to users has been introduced. It is partially compatible with the old way of reporting error messages, response codes remain the same, all fields that previously existed on the response will continue to exist and be populated, including the errorMessage. But some error messages have changed to be more consumable by humans and some cases will have the message restructured. Any clients that are actively looking at the error message to make decisions on what to do should explore the changes to the error messages that they care about, but if all a client looks at is the response code, there should be no impact.
This PR has: