Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Nov 15, 2022

Closes #7312

There are weird race conditions possible if start and stop is running concurrently. I think we should just prohibit doing this and lock it similarly to how we lock start. I ended up introducing similar close_unsafe methods as I did for the start_unsafe back in #5910

@fjetter fjetter force-pushed the server_close_concurrently branch from 4bc4965 to e8737b8 Compare November 15, 2022 18:46
Comment on lines -947 to 938
init_result_q.close()
# If we hit an exception here we need to wait for a least
# one interval for the outside to pick up this message.
# Otherwise we arrive in a race condition where the process
# cleanup wipes the queue before the exception can be
# properly handled. See also
# WorkerProcess._wait_until_connected (the 3 is for good
# measure)
sync_sleep(cls._init_msg_interval * 3)
init_result_q.join_thread()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a drive by. I can break this out in anothe rPR if necessary

Comment on lines 869 to 871
async def close_unsafe(
self, timeout: float | None, reason: str | None, **kwargs: Any
) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with the signature but that's the way it is. Specifically the Nanny and the Worker have very custom signatures making this a bit ugly

@github-actions
Copy link
Contributor

github-actions bot commented Nov 15, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±       0         15 suites  ±0   11h 34m 58s ⏱️ + 4h 42m 30s
  3 223 tests +       4    3 092 ✔️  -      42    87 💤 +  3    38 +  37    6 🔥 +  6 
21 961 runs   - 1 842  20 875 ✔️  - 2 018  819 💤  - 90  246 +245  21 🔥 +21 

For more details on these failures and errors, see this check.

Results for commit d214271. ± Comparison against base commit 5b6351a.

♻️ This comment has been updated with latest results.

@fjetter
Copy link
Member Author

fjetter commented Nov 16, 2022

Ok this approach does not work. The start method is calling close again which would require a reentrant lock. I'll rewrite the start/stop mechanics s.t. we don't need locks

@fjetter fjetter closed this Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restart can kill a worker

1 participant