Clean rexi stream workers when coordinator process is killed#1824
Merged
nickva merged 2 commits intoapache:masterfrom Dec 20, 2018
Merged
Clean rexi stream workers when coordinator process is killed#1824nickva merged 2 commits intoapache:masterfrom
nickva merged 2 commits intoapache:masterfrom
Conversation
71d50f2 to
2139ab7
Compare
Streams functionality is fairly isolated from the rest of the utils module so move it to its own. This is mostly in preparation to add a streams workers cleaner process.
12de5e4 to
e15c19c
Compare
davisp
requested changes
Dec 20, 2018
Member
davisp
left a comment
There was a problem hiding this comment.
The pdict deletion is necessary. Lets take a quick look around replacements maybe banging a message over to the cleanup process that'll update its Workers variable. If that turns out to be overly complicated its not a huge issue as it should be a fairly rare occurrence in the first place.
e15c19c to
8ce8abd
Compare
davisp
reviewed
Dec 20, 2018
8ce8abd to
3d47ea1
Compare
Sometimes fabric coordinators end up getting brutally terminated [1], and in that case they might never process their `after` clause where their remote rexi workers are killed. Those workers are left lingering around keeping databases active for up to 5 minutes at a time. To prevent that from happening, let coordinators which use streams spawn an auxiliary cleaner process. This process will monitor the main coordinator and if it dies will ensure remote workers are killed, freeing resources immediately. In order not to send 2x the number of kill messages during the normal exit, fabric_util:cleanup() will stop the auxiliary process before continuing. [1] One instance is when the ddoc cache is refreshed: https://github.com/apache/couchdb/blob/master/src/ddoc_cache/src/ddoc_cache_entry.erl#L236
3d47ea1 to
41757cd
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sometimes fabric coordinators end up getting brutally terminated [1], and in that
case they might never process their
afterclause where their remote rexiworkers are killed. Those workers are left lingering around keeping databases
active for up to 5 minutes at a time.
To prevent that from happening, let coordinators which use streams spawn an
auxiliary cleaner process. This process will monitor the main coordinator and
if it dies will ensure remote workers are killed, freeing resources
immediately. In order not to send 2x the number of kill messages during the
normal exit, fabric_util:cleanup() will stop the auxiliary process before
continuing.
[1] One instance is when the ddoc cache is refreshed:
https://github.com/apache/couchdb/blob/master/src/ddoc_cache/src/ddoc_cache_entry.erl#L236