Skip to content

Add suspend|resume|terminate all supervisors endpoints.#6272

Merged
jon-wei merged 10 commits intoapache:masterfrom
QiuMM:shutkis
Oct 11, 2018
Merged

Add suspend|resume|terminate all supervisors endpoints.#6272
jon-wei merged 10 commits intoapache:masterfrom
QiuMM:shutkis

Conversation

@QiuMM
Copy link
Copy Markdown
Member

@QiuMM QiuMM commented Aug 30, 2018

Every time I update my druid cluster, I need to shut down all KIS supervisors first. It is quite inconvenient to shut down one by one. So I add these APIs.

@QiuMM QiuMM changed the title Showdown all supervisors API. Add showdown all supervisors API. Sep 11, 2018
@clintropolis
Copy link
Copy Markdown
Member

Hi @QiuMM, I'm not sure of your operating procedure, but #6234 adds true suspend/resume functionality to supervisors which might more closely line up with the behavior you want (so you don't have to resubmit the spec for each supervisor, just toggle the tasks off and then back on). However, it still only operates on a single supervisor.

Additionally it deprecates the 'shutdown' api call in favor of a new 'terminate' call, which we thought more closely aligned with it's current behavior which sets a 'tombstone' version of the supervisor spec, requiring re-submitting the working spec to start operating again.

Perhaps you would like to modify this PR to have all 3 "all" endpoints, maybe something like

  • /druid/indexer/v1/supervisor/suspend
  • /druid/indexer/v1/supervisor/resume
  • /druid/indexer/v1/supervisor/terminate (if this is really the behavior you want)

@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Sep 14, 2018

@clintropolis cool of #6234. Before your PR, every time I did some changes and went to update druid nodes, I used shutdown API to terminate all supervisors first. This cause stored supervisors will not be loaded when I restart druid nodes because shutdown API insert a NoopSupervisorSpec. Suspend/resume functionality are what I really want!

I'll modify this PR to have 3 "all" endpoints just as what you suggested.

@QiuMM QiuMM changed the title Add showdown all supervisors API. Add suspend|resume|terminate all supervisors API. Sep 20, 2018
@QiuMM QiuMM changed the title Add suspend|resume|terminate all supervisors API. Add suspend|resume|terminate all supervisors endpoints. Sep 20, 2018
@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Sep 20, 2018

I have added 3 "all" endpoints and tweaked shutdown to terminate in SupervisorResourceTest(because shutdown is deprecated). @clintropolis can you help review this PR.

}

@POST
@Path("/suspendAll")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like the All suffix is necessary for these methods, it seems implied since the url path doesn't specify a supervisor id, and doesn't appear to be a common convention in other API paths that do things to multiple things. That said, the base path for the supervisor api probably should have been /druid/indexer/v1/supervisors instead of /druid/indexer/v1/supervisor so it would have read a bit better, but that seems more painful than useful to change

successes.add(id);
}
});
return Response.ok(ImmutableMap.of(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would probably be nice to have a consistent response between these 3 api methods, in the case of suspending/resuming all i think only the supervisors which changed state are interesting, since a supervisor already being in the requested state isn't really an error

@POST
@Path("/suspendAll")
@Produces(MediaType.APPLICATION_JSON)
public Response specSuspendAll()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think these methods should have 'spec' prefix since they don't apply to an individual spec like the ones that do, maybe just suspendAll and resumeAll and suspendOrResumeAll.

{
return asLeaderWithSupervisorManager(
manager -> {
Set<String> supervisorIds = manager.getSupervisorIds();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this whole block should be pushed down into SupervisorManager as suspendOrResumeAllSupervisors, and return a list of supervisor ids that had a state change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, for a hypothetical suspendOrResumeAllSupervisors method, it might also be nice to extract the actual logic from suspendOrResumeSupervisor into another method, possiblySuspendOrResumeSupervisor that doesn't have the precondition checks, similar to the function of possiblyStopAndRemoveSupervisorInternal, and call that method from both suspendOrResumeSupervisor and suspendOrResumeSupervisors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also want to put the whole operation under a lock so behavior is a bit more consistent

Copy link
Copy Markdown
Member Author

@QiuMM QiuMM Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return a list of supervisor ids that had a state change

I think there is no need to return these ids since these all endpoints are idempotent operations. Just return OK would be fine. And I think users do not care about which supervisor has a state change, they just need to know they have changed all supervisors to suspened state or running state. Besides, return these ids would make the code a little complex and ugly ):

{
return asLeaderWithSupervisorManager(
manager -> {
Set<String> supervisorIds = manager.getSupervisorIds();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be pushed down into SupervisorManager into a terminateAll method.

@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Sep 24, 2018

@clintropolis really appreciate for your review, I'm busy with other things and I'll get to your comments as soon as possible.

@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Sep 26, 2018

@clintropolis sorry for the late, you can go on to review this PR. Thanks!

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, overall LGTM 🤘, just a few minor comments

{
return Collections.singletonList("datasource1");
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be removed, previously this test was first ensuring that a running spec would correctly be suspended, and then that an already suspended spec would result an error, now this only tests the latter behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I removed this is that I have moved the condition spec.get().isSuspended() == suspend into SupervisorManager#possiblySuspendOrResumeSupervisorInternal and modified the implementation of SupervisorResource#specSuspendOrResume to keep consistent with the implementation of SupervisorResource#terminate. After done these, I have to modify the test code, then I found there was no need to keep this. You can check the whole SupervisorResourceTest#testSpecSuspend then you'll find I have tested both behaviors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now 👍

{
return Collections.singletonList("datasource1");
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

} else {
Optional<SupervisorSpec> spec = manager.getSupervisorSpec(id);
final Response.Status status = spec.isPresent() ? Response.Status.BAD_REQUEST : Response.Status.NOT_FOUND;
final String errMsg = spec.isPresent() ? StringUtils.format(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too much stuff for a ternary assignment,

Response.Status status;
String errMsg;
if (spec.isPresent()) {
...

would be easier to follow and wouldn't double check spec.isPresent

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return asLeaderWithSupervisorManager(
manager -> {
manager.suspendOrResumeAllSupervisors(suspend);
return Response.ok(ImmutableMap.of("status", "success")).build();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it could be useful to return the list of spec ids which successfully had a state change for this method and terminate all, but I don't feel strongly about it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stated the reason why I did not return these ids above. For your convenience, I paste here:

I think there is no need to return these ids since these all endpoints are idempotent operations. Just return OK would be fine. And I think users do not care about which supervisor has a state change, they just need to know they have changed all supervisors to suspened state or running state. Besides, return these ids would make the code a little complex and ugly ):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

#### Suspend All Supervisors

```
POST /druid/indexer/v1/supervisor/suspendAll
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still aesthetically prefer /druid/indexer/v1/supervisor/suspend, etc. over /druid/indexer/v1/supervisor/suspendAll, etc, but that's just a matter of personal preference, up to you whether or not to change it 😄

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for me if change to /druid/indexer/v1/supervisors/suspend , but just as you said before, that seems more painful than useful to change. So I keep this All suffix. Besides, in PR #6185 All is also be used in endpoint path.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't suggesting supervisors with the s since yeah that's a more dramatic change, just leaving out the All, but looking around at the rest of the APIs some more, and things already aren't terribly consistent with things like pluralization, etc., anyway so I think basically whatever is probably fine as long as there are docs and stuff does what is intended 😜

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤘

{
return Collections.singletonList("datasource1");
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now 👍

return asLeaderWithSupervisorManager(
manager -> {
manager.suspendOrResumeAllSupervisors(suspend);
return Response.ok(ImmutableMap.of("status", "success")).build();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

#### Suspend All Supervisors

```
POST /druid/indexer/v1/supervisor/suspendAll
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't suggesting supervisors with the s since yeah that's a more dramatic change, just leaving out the All, but looking around at the rest of the APIs some more, and things already aren't terribly consistent with things like pluralization, etc., anyway so I think basically whatever is probably fine as long as there are docs and stuff does what is intended 😜

@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Oct 2, 2018

@clintropolis thanks for your review 👍

@QiuMM
Copy link
Copy Markdown
Member Author

QiuMM commented Oct 8, 2018

@gianm can you help review this pr, I think it can be merged.

```
POST /druid/indexer/v1/supervisor/suspendAll
```
Suspend all supervisors at a time.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think "Suspend all supervisors at once" is better, same for similar usages below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed just now, thanks 👍

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Oct 11, 2018

@QiuMM Reviewed just now, LGTM, had a small comment on the docs

@jon-wei jon-wei merged commit f8f4526 into apache:master Oct 11, 2018
@dclim dclim added this to the 0.13.0 milestone Oct 11, 2018
return Response.status(Response.Status.BAD_REQUEST)
if (manager.suspendOrResumeSupervisor(id, suspend)) {
Optional<SupervisorSpec> spec = manager.getSupervisorSpec(id);
return Response.ok(spec.get()).build();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls Optional.get() without a check

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do so, because manager.suspendOrResumeSupervisor(id, suspend) return true which means the supervisorSpec is not absent.

@QiuMM QiuMM deleted the shutkis branch November 7, 2018 17:14
gianm added a commit to gianm/druid that referenced this pull request Jul 9, 2019
The endpoints added in apache#6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
gianm added a commit that referenced this pull request Jul 9, 2019
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
gianm added a commit to implydata/druid-public that referenced this pull request Jul 9, 2019
…e#8044)

The endpoints added in apache#6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
clintropolis pushed a commit that referenced this pull request Jul 24, 2019
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants