Skip to content

feat: log errors but continue running analysis#3137

Open
mikewilli wants to merge 11 commits into
mainfrom
dask-error-handling-2
Open

feat: log errors but continue running analysis#3137
mikewilli wants to merge 11 commits into
mainfrom
dask-error-handling-2

Conversation

@mikewilli
Copy link
Copy Markdown
Contributor

@mikewilli mikewilli commented Apr 22, 2026

With discrete metrics, we should be able to continue execution on failure. The core of this functionality is to collect the dask task results by looping on as_completed and handling the errors individually, instead of calling client.gather. Additionally, each task needs to return a falsy empty result in the case of an expected non-critical error (i.e., if other metric tasks can continue). Otherwise, with the way the dask task dependencies work, dask will see the error and cancel all future tasks. Returning an empty object for the metric queries allows us to pull out the actually complete ones and continue by computing their statistics, while dropping the failed ones. Prior to returning the empty object, the task itself logs the error so it can be resolved later.

@mikewilli mikewilli marked this pull request as ready for review May 11, 2026 18:50
@mikewilli mikewilli requested a review from scholtzan May 11, 2026 18:50
@mikewilli mikewilli force-pushed the dask-error-handling-2 branch from 28142bb to 47e1aa9 Compare May 11, 2026 18:57
@mikewilli mikewilli marked this pull request as draft May 11, 2026 19:03
@mikewilli
Copy link
Copy Markdown
Contributor Author

Looks like integration tests are failing, moving back to draft for now...

@mikewilli mikewilli marked this pull request as ready for review May 11, 2026 22:00
@mikewilli mikewilli marked this pull request as draft May 11, 2026 22:03
@mikewilli mikewilli marked this pull request as ready for review May 13, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant