feat: update timeout error UX#10274
Conversation
ktmud
left a comment
There was a problem hiding this comment.
Great feature! Added a couple of comments.
There was a problem hiding this comment.
nit: why is "Chart Owners" in title case?
There was a problem hiding this comment.
This is as defined in @Steejay's specs, which would you prefer here?
There was a problem hiding this comment.
I can go either way if we are considering Chart Owner a proper noun for Superset and will be consistent everywhere.
There was a problem hiding this comment.
@etr2460 I interpreted Chart Owners as the title of a role so I defined it w title case.
There was a problem hiding this comment.
Instead of two big code chunks, how about creating two subcomponents SimpleTimeoutErrorMessage and DetailedTimeoutErrorMessage?
There was a problem hiding this comment.
Let me look into simplifying this a bit more. I might pull some things out into smaller, more reusable components. It's a bit tricky, because depending on where the error is being rendered, it's supposed to look different (sometimes using a modal, sometimes expanding inline)
There was a problem hiding this comment.
Should this be more structural? I'm imagining a errors.json somewhere that could potentially be shared by both the frontend and doc generator:
{
"DATASOURCE_TOO_LARGE": {
"code": 1001,
"title": "The database is too large to query",
"description": "It's likely your datasource has grown too large to run the current
query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data."
},
"DATASOURCE_OVERLOAD": {
"code": 1002,
....
}
}
There was a problem hiding this comment.
+1 this. Error messages should be defined in API level. So that API user (that use json response without js rendered chart) should get the same message when some query went wrong.
Front-end should handle how to render error by its code, not define what is this error.
There was a problem hiding this comment.
So there are a couple things being conflated here I think. First off, an idea of API error codes. These are already consistently defined and described on both the backend and the frontend here:
https://github.com/apache/incubator-superset/blob/master/superset/errors.py#L24
Part of the problem is that each of these errors from a software perspective could have multiple possible root causes and mitigations from the user's perspective. In this case, a timeout error can have multiple different root causes and mitigations, and these root causes/mitigations could apply to many different software errors too. We could add another field onto the error payload coming from the backend to include root causes and mitigations, but I also want to ensure we have freedom on the frontend to render these errors in whatever rich format we see fit. Thus why the software's idea of errors exists on the backend, but the user interpretations live in the frontend.
Finally, as to making the errors more structural: I agree this can be defined better, I'll restructure the TS object. However, I'm a bit anxious about investing a bunch of time into autogenning docs from it since we just voted to change doc systems a couple days ago. It might be better to wait for the doc migration, then to build an autogenned doc page with error details
There was a problem hiding this comment.
That makes sense. Didn't think it was necessary to build autogen right now either.
1c4437d to
cc26a70
Compare
|
Do you have specifications on what are errors, what are warnings etc? |
0c3ebcf to
8aa7e32
Compare
There was a problem hiding this comment.
I'm wondering whether we can just extend the attributes of SupersetError directly on SupersetException and use a custom JSON encoder to serialize it. This way you remove a layer of abstraction and the need to map the arguments for any new errors your might add in the future,
There was a problem hiding this comment.
Hmm, possibly. I'm not super familiar with what the plan is for exception handling on the Python side, but having the SupersetExceptions automatically contain the error properties might make sense.
The reason I shied away from it originally was so that it was very clear what attributes the clients need to render the errors and what parts are server specific. But maybe that's not a big issue. Maybe @john-bodley has thoughts?
There was a problem hiding this comment.
@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?
There was a problem hiding this comment.
I was thinking maybe subclasses of SupersetException itself can handle encoding of error code and messages. Here's an example:
class SupersetException(Exception, SupersetAlert):
def __init__(...) -> None:
SupersetAlert.__init__(
self, code=code, message=message, description=description, extra=extra
)
Exception.__init__(self, self.message)
def json_error_response(error: SupersetException) -> FlaskResponse:
return jsonify({ "error": error.to_json() })8aa7e32 to
00f8ecd
Compare
Codecov Report
@@ Coverage Diff @@
## master #10274 +/- ##
==========================================
+ Coverage 65.57% 70.46% +4.88%
==========================================
Files 600 602 +2
Lines 32101 32177 +76
Branches 3244 3263 +19
==========================================
+ Hits 21051 22672 +1621
+ Misses 10869 9401 -1468
+ Partials 181 104 -77
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
I am thinking the logic for this may be triggered by, should come from a standard, general json format generated from backend.
front-end should not decide what error is caused by what reason. Front-end timeout limit is defined in config, it's not something decided by front-end.
There was a problem hiding this comment.
I've introduced an error_codes mapping within extra that gets passed down from the backend. You're still free to alter it on the frontend as needed, but it consolidates potential root causes to the backend with the more developer helpful error_type
24c85f7 to
c44b8aa
Compare
There was a problem hiding this comment.
Hmm, just realized a bigger question: are there cases that the backend will be able to tell which is the actual reason? If not, why can't they be the same error code?
## Error 1000 - Timeout communicating with datasource.
Your datasource may be too large to query or was under an unusual load.
There was a problem hiding this comment.
It's also possible to remove such mapping altogether:
class SupersetError(IntEnum):
BACKEND_TIMEOUT = 1001
class SupersetException(Exception):
code = None
def __init__(self, message=None, description=None):
if not self.code:
raise RuntimeError('Must not raise SupersetException directly')
error_name = SupersetError(self.code).name
self.message = message or _(f'error.{error_name}.message')
self.description = description or _(f'error.{error_name}.message')
class SupersetTimeoutException(SupersetException):
code = SupersetErrorEnum.BACKEND_TIMEOUTThen add the detailed message and explanation in i18n.
There was a problem hiding this comment.
I think this again points to the differentiation between the actual error and potential root causes. In this case, Error 1001 might be the root cause for many different errors (a timeout error, a generic DB error, a 503/504 from the db engine, etc.). That was the reason for splitting them into two different structures. However, I agree it's getting into a zone of too many different abstraction layers with both the error_type and the error_codes...
@Steejay, this is something where your thoughts would be helpful: e.g. what relationship do you see between error types and the in product user facing error codes?
There was a problem hiding this comment.
The goal of the error code is to suggest a path to a solution by highlighting the potential sources. Ideally we'd like to design a message that can pinpoint the source but this can't always be achieved so instead we try to provide as much context as possible to let the technical user troubleshoot. But youre right, the same error source can lead to different errors.
There was a problem hiding this comment.
I've expanded my thoughts a little and here's a more detailed example: https://gist.github.com/ktmud/59ccbdf6c96570e1ffe41ee8e5ecd479
Let me know if anything was unclear.
Even if an error may represent different root causes, we can return a generic error code and potential causes in extras if needed. If the backend is able to identify the root cause, then cool, we return a more specific error code. But there doesn't have to be another layer of abstraction with all the mappings.
There was a problem hiding this comment.
Yes, I agree error codes and possible root causes shouldn't be mixed together as suggesting root causes and providing remedies are more of a documentation issue rather than something the program cares about.
I've removed the description field in my example code, but other parts of the code still feel like an improvement to me though. There really is no need to have two separate entities for exception catching and messaging.
As for the mapping between symptoms and root causes, maybe we can extend the errors.json file I suggested earlier and add a causes field?
{
"issues": {
"DATASOURCE_TOO_LARGE": {
"code": 1001,
"title": "The database is too large to query",
"description": "It's likely your datasource has grown too large to run the current query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data."
},
"DATASOURCE_OVERLOAD": {
"code": 1002,
// ....
},
},
"causes": {
"BACKEND_TIMEOUT_ERROR": ["DATASOURCE_TOO_LARGE", "DATASOURCE_OVERLOAD"],
}
}There was a problem hiding this comment.
I think we might need separate entities for exception catching and error messaging. One example is for async queries in sql lab. We would catch an exception from the query failing, but then only want to write a subset of attributes to the query table for returning to the client later. Then when the client polls for query status, we're returning an error payload within that request, but no exception is occuring. They seem like 2 different entities to me, and I'm not sure how much is improved by having a complex inheritance tree consisting of both native Python Exceptions and UI specific payload classes
There was a problem hiding this comment.
They are indeed two entities, one is runtime exception, another is the result payload used by the API. But it doesn't mean the payload cannot be generated by exception itself.
The improvement is that you don't have to manually generate the payload every time an exception occurs. You just store relevant information as top-level attributes on the Exception and some general error handlers will return a consistent payload whenever needed. The payload doesn't have to be UI specific. We can certainly return different payloads for storage and API by adding arguments to the to_json method, if that's what we want.
I just find it weird that you had to initialize two classes in order to raise an API-conforming exception. We already use class inheritance for Exceptions and used the pattern extensively for superset.commands.exceptions, so I'm not sure an additional mixin to serialize the exception is that much of complexity.
There was a problem hiding this comment.
hmm, i haven't seen the superset.commands.exceptions module yet, i'll look at that and see how other parts of the code handles errors. This has been a wild west of sorts on the backend side, and I've mainly been focused on making sure whatever happens on the backend, the frontend gets something consistent. If there's a more standard approach now (i think it was discussed in SIP-41) then i'm happy to follow it
ktmud
left a comment
There was a problem hiding this comment.
LGTM with a caveat of some potential future refactoring:
- Create an extended
SupersetErrorAlertcomponent that renders conformingSupersetErrorin consistent format and makeTimeoutErrorMessagereusable. - (Optional) Consolidate
SupersetErrorandSupersetExceptionin the backend.
There was a problem hiding this comment.
@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?
db3d101 to
d041659
Compare
|
@graceguo-supercat and @ktmud, sorry for the delayed response here. I think this is in a good place for final review now. Some significant changes from before:
Thanks for taking the time to be diligent here and get this PR into a better state than it started. Hopefully it should be good to go! |
|
Are the constructs of how the errors are defined still a blocking issue? If so SIP-41 describes the somewhat current state of how errors are handled by the backend and there is clearly a number of inconsistencies which need to be addressed. I wonder whether it makes sense to flush out the architecture of error constructs in SIP-41 as the problem is far-reaching and beyond just the scope of the issues/concerns raised here. |
ktmud
left a comment
There was a problem hiding this comment.
Yes, never intended to let the discussions regarding error handling in the backend block this PR. Sorry for not stamping earlier!
There was a problem hiding this comment.
do you mind use "-" not case in css class?
graceguo-supercat
left a comment
There was a problem hiding this comment.
one lint comment for css class name. otherwise LGTM!
d041659 to
e41361f
Compare

SUMMARY
Implements new error designs within Superset, starting with timeout errors. This standardizes all timeout errors, frontend and backend, to the UI component. Future errors will be built following this same pattern. Also adds documentation pointers for potential root causes so users can unblock themselves
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TEST PLAN
CI + unit tests
Set the timeouts to 1 second and see the new error component across Dashboards, Explore, and SQL Lab. Ensure everything renders correctly when there are no errors.
ADDITIONAL INFORMATION
to: @graceguo-supercat @ktmud @nytai @rusackas
cc: @Steejay