Skip to content

Conversation

@nathanmonteleone
Copy link

https://simplifi.atlassian.net/browse/INT-11129

I know the ticket talks about a race condition, but after poking with it more today I'm not sure it's actually a race condition causing the problem -- there are a couple of other things that seem to impact the stability, at least when running the integration tests:

  • First off we were trying to restart the WorkerSupervisor's DynamicSupervisor when handling a termination. I don't know this actually broke anything, but it's messy so I fixed it.

  • Added worker_supervisor max_restart and max_seconds options. We were certainly having problems because the defaults for these were too low (see the comments in the code)

…er_supervisor max_restart and max_seconds options.
@nathanmonteleone nathanmonteleone merged commit 052b256 into main Apr 9, 2025
3 checks passed
@nathanmonteleone nathanmonteleone deleted the max_restarts branch April 9, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants