Skip to content

Introduce ability to scale supervisor through overlord's REST API#16350

Closed
ac9817 wants to merge 5 commits intoapache:masterfrom
ac9817:ability-to-scale-supervisor
Closed

Introduce ability to scale supervisor through overlord's REST API#16350
ac9817 wants to merge 5 commits intoapache:masterfrom
ac9817:ability-to-scale-supervisor

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Apr 29, 2024

Description

This PR introduces the ability to change the task count on the supervisor without the need to resubmit the spec. This could be used in cases where an user want to scale the supervisor depending on some external factors in his environment. In order to use the current post call, we need to do an extra get call for the current spec and it could also change in between those calls.


Key changed/added classes in this PR
  • SupervisorResource
  • SeekableStreamSupervisorSpec

Release Notes:

Introduces a new Supervisor API to update the task count with out getting to know the complete supervisor spec.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@ac9817 ac9817 force-pushed the ability-to-scale-supervisor branch from 79913b2 to 2da3532 Compare June 25, 2024 15:27
@ac9817 ac9817 force-pushed the ability-to-scale-supervisor branch from 2da3532 to d981e27 Compare June 25, 2024 15:29
Comment on lines +563 to +567
@POST
@Path("/{id}/setTaskCount")
@Produces(MediaType.APPLICATION_JSON)
@ResourceFilters(SupervisorResourceFilter.class)
public Response setTaskCount(@PathParam("id") final String id, @QueryParam("taskCount") final Integer taskCount)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this via a new API that is scoped to just the taskCount field in the supervisor spec, can you update this API to be a PATCH request on .../supervisors/{id}

That would give us the flexibility to atomically update any field in the supervisor spec instead of needing a new API for each supervisor spec modification

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this would be better than what is currently being done in this PR.

But I am not sure why we need a separate API at all.
Couldn't this be an optimization in the existing specPost API itself?

  • Post a supervisor spec
  • If a supervisor already exists for the given id, we compare the spec against its current value.
  • If the change is only in task count, don't go via the normal route and invoke updateTaskCount method instead.
  • Even if we use the PATCH API route, we would still need a similar diff mechanism.

cc: @adithyachakilam , @suneet-s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid an extra GET call to get the actual existing spec to send to POST call. Also, the idea was to try libraries like https://github.com/java-json-tools/json-patch to support editing of any fields dynamically. But a standard PATCH controller seems like not availbale.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the harm in doing a GET before the POST? It is not a very expensive operation.
In fact, it is desirable to know the current state before we try to update it.

and it could also change in between those calls.

This is always a possibility. Some other user could update the task count right after we updated it. Having a separate API does not protect against concurrent updates. Updating a supervisor spec is typically an admin action and is also audited. The likelihood of concurrent updates should be very low.

Also, the idea was to try libraries like https://github.com/java-json-tools/json-patch to support editing of any fields dynamically.

Being able to update separate fields is a valid requirement, but the library we end up using should not dictate the contract of our APIs.

@adithyachakilam , Could you please share some more details on the challenges with the current API?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a very expensive operation.

It does end up being expensive when we build automation solutions on top of it.

Some other user could update the task count right after we updated it.

Sure, In that case two sequential updates are done. But when we do a GET and a POST a valid user update made in between might be overridden.

but the library we end up using should not dictate the contract of our APIs.

It does follow of RFC 6902 (JSON Patch) and RFC 7386 (JSON Merge Patch) which are standard for patch requests merge specification.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Jul 2, 2024

@adithyachakilam , please add more details to the PR description:

  • why is the current API insufficient
  • what is the approach that you have taken
  • any other concerns
  • release note

@ac9817 ac9817 force-pushed the ability-to-scale-supervisor branch from 62926a5 to df408cd Compare July 2, 2024 02:55
@ac9817 ac9817 force-pushed the ability-to-scale-supervisor branch from df408cd to d270ff2 Compare July 2, 2024 03:12
Adithya Chakilam added 2 commits July 1, 2024 23:06
@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Jul 2, 2024

Had an offline discussion with @adithyachakilam .
IIUC, the changes in this PR are just trying to update the current running task count of a supervisor. I must have misunderstood this part earlier.

If this is the requirement, then as @suneet-s suggests, we should have a separate API as follows:

  • It should ideally not be a PATCH since we are not really updating the persisted supervisor object itself, just its current running behaviour.
  • It should be something like POST /v1/supervisor/{id}/runConfig or something
  • The payload should be a new class SupervisorRunConfig which would currently have only the task count field. We may add other fields to this later.
  • It should not need to touch or even use the existing specPost method
  • The implementation should just propagate this new config object to the relevant Supervisor which can update its task count dynamically as needed.

@adithyachakilam , please let me know if this meets your needs.

@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Jul 2, 2024

@kfaraz On another thought, We do want to make this as an persisted update. So it would not just be a runConfig but it does change the state of the supervisor. The ideal way to go forward is to introduce a PATCH request but jsr311-api doesn't provide one. I see these ways forward:

approach 1: (current state of the PR)

  • We need to have a separate api to update the task count. POST /v1/supervisor/{id}/updateTaskCount with some body to just update the task count.

approach 2:

Is there a way to implement an actual PATCH endpoint since that required annotations are not provided by jsr311-api ?

cc: @suneet-s

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 1, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Sep 1, 2024
@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants