Skip to content

Allow forking task runners to terminate and pickup their tasks.#1521

Closed
drcrallen wants to merge 2 commits intoapache:masterfrom
metamx:forkingTaskRunnerRecoverable
Closed

Allow forking task runners to terminate and pickup their tasks.#1521
drcrallen wants to merge 2 commits intoapache:masterfrom
metamx:forkingTaskRunnerRecoverable

Conversation

@drcrallen
Copy link
Copy Markdown
Contributor

  • Communication between MM and peon is accomplished via file system stuff. (adding and deleting files)
  • Force killing a peon is only available on systems which understand the kill command with -15 and -9 as options
  • Currently does not follow prior behavior. There is no option to force tasks to shutdown on MM exit
  • Adds ShutdownResource which adds an endpoint which shuts down the service (currently only used by peon)
  • Add unit tests for ForkingTaskRunner. These take a while to run because they spawn up JVMs.

This allows upgrading the middle manager without killing peons.

Requires #1524

@drcrallen
Copy link
Copy Markdown
Contributor Author

As an alternative, I could fork this off to its own task runner type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name should be more specific about what it actually is: cachedThreadPool? unboundedThreadPool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its actually not needed in this PR anymore, I'll remove it. But it probably should have been cachedThreadPool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 14, 2015

Is there anything that makes sure the RTR doesn't fail the tasks when the MM goes offline? Normally it does that as soon as the ephemeral task status announcements disappear, which would happen immediately.

@drcrallen
Copy link
Copy Markdown
Contributor Author

@gianm : no not yet

@drcrallen drcrallen force-pushed the forkingTaskRunnerRecoverable branch 2 times, most recently from 40bc654 to 5227553 Compare July 14, 2015 22:10
@drcrallen drcrallen force-pushed the forkingTaskRunnerRecoverable branch from 5227553 to 284273e Compare July 16, 2015 18:52
* Communication between MM and peon is accomplished via file system stuff. (adding and deleting files)
* Force killing a peon is only available on systems which understand the `kill` command with -15 and -9 as options
* Currently does not follow prior behavior. There is no option to force tasks to shutdown on MM exit
* Adds ShutdownResource which adds an endpoint which shuts down the service (currently only used by peon)
* Add unit tests for ForkingTaskRunner. These take a while to run because they spawn up JVMs so they are @ignore'd by default
* Add tools.jar dependency for indexing-service for ForkingTaskRunner
@drcrallen drcrallen force-pushed the forkingTaskRunnerRecoverable branch from 284273e to 8830f65 Compare July 16, 2015 19:52
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jul 17, 2015

@gianm @drcrallen just following up with a status update here

@drcrallen
Copy link
Copy Markdown
Contributor Author

@fjy I have 1 TODO @ https://github.com/druid-io/druid/pull/1521/files#diff-d19c7676e29a15ee2e34062c8e4ee6eaR947 which I'm hoping to resolve today. Also I need more testing of Remote Task Runner when MM is offline to make sure it doesn't fail the task. I haven't put a version # on this PR and I still put the Discussion tag to try and help globally communicate it's status.

@drcrallen
Copy link
Copy Markdown
Contributor Author

I also haven't figured out a good solution for #1521 (comment) yet

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jul 18, 2015

@drcrallen you may want to propose this in the forum of talk about it at the next dev sync

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jul 18, 2015

@drcrallen or just ping @gianm as I'm sure he's thought about it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to call lifecycle.shutdown() somewhere here or register it as a shutdown hook. I don't see lifecycle being used in this class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did at one point due to how peons were shutting down, but now it is all done through shutdown hooks and this will be removed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, yeah, just saw the other PR with the update to CliPeon .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants