Skip to content

[Cluster launcher] [Azure] add option for eviction policy in azure VM template#45397

Merged
rickyyx merged 1 commit intoray-project:masterfrom
ambi-robotics:brijen/delete-template
Jun 3, 2024
Merged

[Cluster launcher] [Azure] add option for eviction policy in azure VM template#45397
rickyyx merged 1 commit intoray-project:masterfrom
ambi-robotics:brijen/delete-template

Conversation

@bthananjeyan
Copy link
Contributor

@bthananjeyan bthananjeyan commented May 17, 2024

Why are these changes needed?

This is necessary to give users the option to pick an eviction policy (Delete vs Deallocate) when nodes in the cluster are pre-empted.

This exposes an additional option that can be set in the node_config attribute in the cluster YAML file.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@bthananjeyan
Copy link
Contributor Author

@architkulkarni @ericl @hongchaodeng please let me know if you have any feedback on this PR, or if there is someone else who should be assigned to review it. Thanks!

@bthananjeyan bthananjeyan force-pushed the brijen/delete-template branch 3 times, most recently from 3967ace to 729bdca Compare May 28, 2024 18:55
@anyscalesam anyscalesam added triage Needs triage (eg: priority, bug/not-bug, and owning component) core Issues that should be addressed in Ray Core labels May 29, 2024
Signed-off-by: bthananjeyan <brijen@ambirobotics.com>
@bthananjeyan bthananjeyan force-pushed the brijen/delete-template branch from 729bdca to 0d8fc30 Compare May 31, 2024 17:13
@bthananjeyan
Copy link
Contributor Author

@architkulkarni @ericl @hongchaodeng bumping this PR in case it was buried. Please let me know if you have any feedback, or if someone else can be assigned to it. I don't have the ability to assign reviewers.

@rickyyx rickyyx enabled auto-merge (squash) June 3, 2024 19:33
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 3, 2024
@rickyyx rickyyx merged commit e38e576 into ray-project:master Jun 3, 2024
@bthananjeyan bthananjeyan deleted the brijen/delete-template branch June 3, 2024 22:55
richardsliu pushed a commit to richardsliu/ray that referenced this pull request Jun 12, 2024
… template (ray-project#45397)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This is necessary to give users the option to pick an eviction policy
(Delete vs Deallocate) when nodes in the cluster are pre-empted.

<!-- Please give a short summary of the change and the problem this
solves. -->
This exposes an additional option that can be set in the `node_config`
attribute in the cluster YAML file.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [x] This PR is not tested :(

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1207339070862431

Signed-off-by: bthananjeyan <brijen@ambirobotics.com>
Signed-off-by: Richard Liu <ricliu@google.com>
@kekulai-fredchang
Copy link
Contributor

kekulai-fredchang commented Jul 31, 2024

FYI, I get an error:

Code: InvalidParameter Message: Eviction policy can be set only on Azure Spot Virtual Machines. For more information, see http://aka.ms/AzureSpot/errormessages.

which was patched for those users that do not deploy all spot instances:
#46199

I got this error because my assumption is that head node is not a spot instance since you want this node to be stable.

Is the best cost-effective practice to keep all nodes as spot instances (even the head node?)

When I set all nodes (including the head) to be spot instances, ray boots up the cluster fine since the eviction policy is applicable to all the spot instances.

Thanks

@MKLepium
Copy link

The Pull request was closed because I hadn't tested it and when I came around to testing it, it turned out it didn't work.

#46198 This was my original issue where I tracked the problem when it first arose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests triage Needs triage (eg: priority, bug/not-bug, and owning component)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants