feat: log errors but continue running analysis#3137
Open
mikewilli wants to merge 11 commits into
Open
Conversation
28142bb to
47e1aa9
Compare
Contributor
Author
|
Looks like integration tests are failing, moving back to draft for now... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With discrete metrics, we should be able to continue execution on failure. The core of this functionality is to collect the dask task results by looping on
as_completedand handling the errors individually, instead of callingclient.gather. Additionally, each task needs to return a falsy empty result in the case of an expected non-critical error (i.e., if other metric tasks can continue). Otherwise, with the way the dask task dependencies work, dask will see the error and cancel all future tasks. Returning an empty object for the metric queries allows us to pull out the actually complete ones and continue by computing their statistics, while dropping the failed ones. Prior to returning the empty object, the task itself logs the error so it can be resolved later.