Add suspend|resume|terminate all supervisors endpoints.#6272
Add suspend|resume|terminate all supervisors endpoints.#6272jon-wei merged 10 commits intoapache:masterfrom
Conversation
|
Hi @QiuMM, I'm not sure of your operating procedure, but #6234 adds true suspend/resume functionality to supervisors which might more closely line up with the behavior you want (so you don't have to resubmit the spec for each supervisor, just toggle the tasks off and then back on). However, it still only operates on a single supervisor. Additionally it deprecates the 'shutdown' api call in favor of a new 'terminate' call, which we thought more closely aligned with it's current behavior which sets a 'tombstone' version of the supervisor spec, requiring re-submitting the working spec to start operating again. Perhaps you would like to modify this PR to have all 3 "all" endpoints, maybe something like
|
|
@clintropolis cool of #6234. Before your PR, every time I did some changes and went to update druid nodes, I used I'll modify this PR to have 3 "all" endpoints just as what you suggested. |
|
I have added 3 "all" endpoints and tweaked |
| } | ||
|
|
||
| @POST | ||
| @Path("/suspendAll") |
There was a problem hiding this comment.
I don't feel like the All suffix is necessary for these methods, it seems implied since the url path doesn't specify a supervisor id, and doesn't appear to be a common convention in other API paths that do things to multiple things. That said, the base path for the supervisor api probably should have been /druid/indexer/v1/supervisors instead of /druid/indexer/v1/supervisor so it would have read a bit better, but that seems more painful than useful to change
| successes.add(id); | ||
| } | ||
| }); | ||
| return Response.ok(ImmutableMap.of( |
There was a problem hiding this comment.
I think it would probably be nice to have a consistent response between these 3 api methods, in the case of suspending/resuming all i think only the supervisors which changed state are interesting, since a supervisor already being in the requested state isn't really an error
| @POST | ||
| @Path("/suspendAll") | ||
| @Produces(MediaType.APPLICATION_JSON) | ||
| public Response specSuspendAll() |
There was a problem hiding this comment.
nit: I don't think these methods should have 'spec' prefix since they don't apply to an individual spec like the ones that do, maybe just suspendAll and resumeAll and suspendOrResumeAll.
| { | ||
| return asLeaderWithSupervisorManager( | ||
| manager -> { | ||
| Set<String> supervisorIds = manager.getSupervisorIds(); |
There was a problem hiding this comment.
I think maybe this whole block should be pushed down into SupervisorManager as suspendOrResumeAllSupervisors, and return a list of supervisor ids that had a state change.
There was a problem hiding this comment.
Additionally, for a hypothetical suspendOrResumeAllSupervisors method, it might also be nice to extract the actual logic from suspendOrResumeSupervisor into another method, possiblySuspendOrResumeSupervisor that doesn't have the precondition checks, similar to the function of possiblyStopAndRemoveSupervisorInternal, and call that method from both suspendOrResumeSupervisor and suspendOrResumeSupervisors.
There was a problem hiding this comment.
Might also want to put the whole operation under a lock so behavior is a bit more consistent
There was a problem hiding this comment.
return a list of supervisor ids that had a state change
I think there is no need to return these ids since these all endpoints are idempotent operations. Just return OK would be fine. And I think users do not care about which supervisor has a state change, they just need to know they have changed all supervisors to suspened state or running state. Besides, return these ids would make the code a little complex and ugly ):
| { | ||
| return asLeaderWithSupervisorManager( | ||
| manager -> { | ||
| Set<String> supervisorIds = manager.getSupervisorIds(); |
There was a problem hiding this comment.
I think this should be pushed down into SupervisorManager into a terminateAll method.
|
@clintropolis really appreciate for your review, I'm busy with other things and I'll get to your comments as soon as possible. |
|
@clintropolis sorry for the late, you can go on to review this PR. Thanks! |
clintropolis
left a comment
There was a problem hiding this comment.
Sorry for the delayed review, overall LGTM 🤘, just a few minor comments
| { | ||
| return Collections.singletonList("datasource1"); | ||
| } | ||
| }; |
There was a problem hiding this comment.
I don't think this should be removed, previously this test was first ensuring that a running spec would correctly be suspended, and then that an already suspended spec would result an error, now this only tests the latter behavior.
There was a problem hiding this comment.
The reason why I removed this is that I have moved the condition spec.get().isSuspended() == suspend into SupervisorManager#possiblySuspendOrResumeSupervisorInternal and modified the implementation of SupervisorResource#specSuspendOrResume to keep consistent with the implementation of SupervisorResource#terminate. After done these, I have to modify the test code, then I found there was no need to keep this. You can check the whole SupervisorResourceTest#testSpecSuspend then you'll find I have tested both behaviors.
| { | ||
| return Collections.singletonList("datasource1"); | ||
| } | ||
| }; |
| } else { | ||
| Optional<SupervisorSpec> spec = manager.getSupervisorSpec(id); | ||
| final Response.Status status = spec.isPresent() ? Response.Status.BAD_REQUEST : Response.Status.NOT_FOUND; | ||
| final String errMsg = spec.isPresent() ? StringUtils.format( |
There was a problem hiding this comment.
I think this is too much stuff for a ternary assignment,
Response.Status status;
String errMsg;
if (spec.isPresent()) {
...
would be easier to follow and wouldn't double check spec.isPresent
| return asLeaderWithSupervisorManager( | ||
| manager -> { | ||
| manager.suspendOrResumeAllSupervisors(suspend); | ||
| return Response.ok(ImmutableMap.of("status", "success")).build(); |
There was a problem hiding this comment.
It seems like it could be useful to return the list of spec ids which successfully had a state change for this method and terminate all, but I don't feel strongly about it
There was a problem hiding this comment.
I have stated the reason why I did not return these ids above. For your convenience, I paste here:
I think there is no need to return these ids since these all endpoints are idempotent operations. Just return OK would be fine. And I think users do not care about which supervisor has a state change, they just need to know they have changed all supervisors to suspened state or running state. Besides, return these ids would make the code a little complex and ugly ):
| #### Suspend All Supervisors | ||
|
|
||
| ``` | ||
| POST /druid/indexer/v1/supervisor/suspendAll |
There was a problem hiding this comment.
I think I still aesthetically prefer /druid/indexer/v1/supervisor/suspend, etc. over /druid/indexer/v1/supervisor/suspendAll, etc, but that's just a matter of personal preference, up to you whether or not to change it 😄
There was a problem hiding this comment.
It's fine for me if change to /druid/indexer/v1/supervisors/suspend , but just as you said before, that seems more painful than useful to change. So I keep this All suffix. Besides, in PR #6185 All is also be used in endpoint path.
There was a problem hiding this comment.
Ah, I wasn't suggesting supervisors with the s since yeah that's a more dramatic change, just leaving out the All, but looking around at the rest of the APIs some more, and things already aren't terribly consistent with things like pluralization, etc., anyway so I think basically whatever is probably fine as long as there are docs and stuff does what is intended 😜
| { | ||
| return Collections.singletonList("datasource1"); | ||
| } | ||
| }; |
| return asLeaderWithSupervisorManager( | ||
| manager -> { | ||
| manager.suspendOrResumeAllSupervisors(suspend); | ||
| return Response.ok(ImmutableMap.of("status", "success")).build(); |
| #### Suspend All Supervisors | ||
|
|
||
| ``` | ||
| POST /druid/indexer/v1/supervisor/suspendAll |
There was a problem hiding this comment.
Ah, I wasn't suggesting supervisors with the s since yeah that's a more dramatic change, just leaving out the All, but looking around at the rest of the APIs some more, and things already aren't terribly consistent with things like pluralization, etc., anyway so I think basically whatever is probably fine as long as there are docs and stuff does what is intended 😜
|
@clintropolis thanks for your review 👍 |
|
@gianm can you help review this pr, I think it can be merged. |
| ``` | ||
| POST /druid/indexer/v1/supervisor/suspendAll | ||
| ``` | ||
| Suspend all supervisors at a time. |
There was a problem hiding this comment.
nit: I think "Suspend all supervisors at once" is better, same for similar usages below
|
@QiuMM Reviewed just now, LGTM, had a small comment on the docs |
| return Response.status(Response.Status.BAD_REQUEST) | ||
| if (manager.suspendOrResumeSupervisor(id, suspend)) { | ||
| Optional<SupervisorSpec> spec = manager.getSupervisorSpec(id); | ||
| return Response.ok(spec.get()).build(); |
There was a problem hiding this comment.
Calls Optional.get() without a check
There was a problem hiding this comment.
There is no need to do so, because manager.suspendOrResumeSupervisor(id, suspend) return true which means the supervisorSpec is not absent.
The endpoints added in apache#6272 were missing authorization checks. This patch removes the bulk methods from SupervisorManager, and instead has SupervisorResource run the full list through filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk methods from SupervisorManager, and instead has SupervisorResource run the full list through filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
…e#8044) The endpoints added in apache#6272 were missing authorization checks. This patch removes the bulk methods from SupervisorManager, and instead has SupervisorResource run the full list through filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk methods from SupervisorManager, and instead has SupervisorResource run the full list through filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
Every time I update my druid cluster, I need to shut down all KIS supervisors first. It is quite inconvenient to shut down one by one. So I add these APIs.