-
Notifications
You must be signed in to change notification settings - Fork 0
Assignments complete callback. Fix race condition in WorkerSupervisor. Fix error in WorkerSupervisor supervision tree. Improve lifecycle callback documentation. #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ror in the supervision tree of WorkerSupervisor.
|
|
||
| config :logger, | ||
| handle_sasl_reports: false, | ||
| handle_sasl_reports: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to leave this in for now -- if the integration tests fail in CI, I'd like to be able to get a dump of process starts/deaths.
| processes by the consumer group manager. | ||
| """ | ||
| use Supervisor, restart: :transient | ||
| use Supervisor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restart: transient here affects how the module-based supervisor (WorkerSupervisor) is treated by its parent. That's actually not what I intended at all -- if WorkerSupervisor dies I want it to automatically restart.
The intent was for the child DynamicSupervisor not to restart automatically. That needs to be specified below when we create it.
| # Make sure the DynamicSupervisor itself is truly cleaned up from the Supervisor's perspective, | ||
| # so that it will restart reliably | ||
| _ = Supervisor.terminate_child(module_supervisor, :worker_dynamic_supervisor) | ||
| if dynamic_worker_supervisor != :undefined do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for some reason the dynamic_worker_supervisor isn't alive at this point (ex. because we got that transient "coordinator not available error" and it prevented creation or killed off the supervision tree), we should skip the rest so that we don't crash out the top-level supervisor process.
|
Switching this to a draft while I figure out how to start the DynamicSupervisor correctly |
| {DynamicSupervisor, :start_link, | ||
| [[name: {:via, ElsaRegistry, {registry(connection), :worker_dynamic_supervisor}}]]}, | ||
| restart: :transient | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had simply made a mistake here -- the tuple form of a child spec does not actually set the ID from the supervisor's perspective. You have to use the struct form for that.
I believe the upshot of this mistake was that our calls to Supervisor.terminate_child below, were just failing silently.
…icSupervisor. It was causing a race condition, and I think having multiple points of ownership just muddies things anyway.
|
🎉 |
…lsa_fi into assignments_complete_callback
This should now completely fix the following issue:
#15
(Internal issue tracking): https://simplifi.atlassian.net/browse/INT-11499
https://simplifi.atlassian.net/browse/INT-11494
Two changes in this one:
First, added a callback that you can get when a batch of assignments is complete. This should make it easier to write stable integration tests, i.e. you can subscribe to a topic using
latestfor the offset, wait for assignments complete, and then publish.Second, reworked WorkerSupervisor's supervision tree:
restart: :transient-- it was simply in the wrong place before, the intent all along was to have the underlying DynamicSupervisor to be transient, not the module-based Supervisor.{:error, :already_started}from Supervisor.restart_child. This could have also been fixed with an explicit call toElsaRegistry.unregister_name/2, but there's no benefit to having this transient particular process in the registry anyway. It just muddies up the picture really.