Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Aug 10, 2021

Sometimes users are struggling with getting all the relevant informations in case exceptions occur in server code. If exceptions occur which are not directly linked to a task, there is currently no mechanism in place to forward these to the user. the only way is to rely on logging which can, depending on deployment, be a difficult thing to work with.

In an attempt to make this more accessible, this PR forwards all server related exceptions to a client. There is still an open discussion regarding how to deal with exceptions in general since they are currently handled and logged by tornado (#5184) and I consider this out of scope for this PR.

This PR offers an unpolished but working implementation to discuss the topic of how to expose this to the user

This draft offers

  • programmatic access to the exceptions by storing them in a dictionary on client side (source, exception, traceback, timestamp)
  • scheduler logs exceptions as events such that they could be visualized on our dashboard (out of scope)
  • To manage spam, the client is able to filter exceptions based on type

Open questions

  • Do we need filtering? If so, does it need to be more sophisticated, e.g. filter on msgs similar to python warnings
  • Do we want to filter on the servers directly instead of the client
  • configuration via distributed.yaml or kwargs? both?
  • ?

@fjetter
Copy link
Member Author

fjetter commented Aug 13, 2021

Summary of an offline discussion

  • Exception can be a special kind of event on our event system. Getting this in would require some refactoring in terms of how we use tornado, see also [Idea/Draft/Proposal] Exception handling for server exceptions #5184
  • Forwarding / reporting to the client this way should not be necessary. That creates a weird special case for exceptions and we'd have a hard time with filtering or otherwise giving access to the data
  • Instead of plain forwarding, a Client.subscribe_to_topic(topic: str, handler: callable) would allow us to register a client with the scheduler. Every event on the given topic will then be forwarded to the client. The event handler on client would then allow users to filter, log or otherwise deal with exceptions and events.
  • This way users could also use custom events and listen to them, e.g. by calling get_worker().log_event("custom-metric, 42) during task execution and listen to it. See also https://distributed.dask.org/en/latest/logging.html#structured-logs
  • Future work can also include these events into dashboards

@fjetter
Copy link
Member Author

fjetter commented Aug 18, 2021

Closed in favour of #5217

@fjetter fjetter closed this Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant