Skip to content

Clean rexi stream workers when coordinator process is killed#1824

Merged
nickva merged 2 commits intoapache:masterfrom
cloudant:clean-rexi-stream-workers
Dec 20, 2018
Merged

Clean rexi stream workers when coordinator process is killed#1824
nickva merged 2 commits intoapache:masterfrom
cloudant:clean-rexi-stream-workers

Conversation

@nickva
Copy link
Copy Markdown
Contributor

@nickva nickva commented Dec 20, 2018

Sometimes fabric coordinators end up getting brutally terminated [1], and in that
case they might never process their after clause where their remote rexi
workers are killed. Those workers are left lingering around keeping databases
active for up to 5 minutes at a time.

To prevent that from happening, let coordinators which use streams spawn an
auxiliary cleaner process. This process will monitor the main coordinator and
if it dies will ensure remote workers are killed, freeing resources
immediately. In order not to send 2x the number of kill messages during the
normal exit, fabric_util:cleanup() will stop the auxiliary process before
continuing.

[1] One instance is when the ddoc cache is refreshed:
https://github.com/apache/couchdb/blob/master/src/ddoc_cache/src/ddoc_cache_entry.erl#L236

@nickva nickva force-pushed the clean-rexi-stream-workers branch 4 times, most recently from 71d50f2 to 2139ab7 Compare December 20, 2018 17:33
Streams functionality is fairly isolated from the rest of the utils module so
move it to its own. This is mostly in preparation to add a streams workers
cleaner process.
@nickva nickva force-pushed the clean-rexi-stream-workers branch 2 times, most recently from 12de5e4 to e15c19c Compare December 20, 2018 17:42
Copy link
Copy Markdown
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pdict deletion is necessary. Lets take a quick look around replacements maybe banging a message over to the cleanup process that'll update its Workers variable. If that turns out to be overly complicated its not a huge issue as it should be a fairly rare occurrence in the first place.

Comment thread src/fabric/src/fabric_streams.erl
Comment thread src/fabric/src/fabric_streams.erl
@nickva nickva force-pushed the clean-rexi-stream-workers branch from e15c19c to 8ce8abd Compare December 20, 2018 17:56
Comment thread src/fabric/src/fabric_streams.erl
@nickva nickva force-pushed the clean-rexi-stream-workers branch from 8ce8abd to 3d47ea1 Compare December 20, 2018 18:24
Sometimes fabric coordinators end up getting brutally terminated [1], and in that
case they might never process their `after` clause where their remote rexi
workers are killed. Those workers are left lingering around keeping databases
active for up to 5 minutes at a time.

To prevent that from happening, let coordinators which use streams spawn an
auxiliary cleaner process. This process will monitor the main coordinator and
if it dies will ensure remote workers are killed, freeing resources
immediately. In order not to send 2x the number of kill messages during the
normal exit, fabric_util:cleanup() will stop the auxiliary process before
continuing.

[1] One instance is when the ddoc cache is refreshed:
 https://github.com/apache/couchdb/blob/master/src/ddoc_cache/src/ddoc_cache_entry.erl#L236
@nickva nickva force-pushed the clean-rexi-stream-workers branch from 3d47ea1 to 41757cd Compare December 20, 2018 18:24
Copy link
Copy Markdown
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@nickva nickva merged commit 632f303 into apache:master Dec 20, 2018
@nickva nickva deleted the clean-rexi-stream-workers branch December 20, 2018 20:41
@wohali wohali mentioned this pull request Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants