Skip to content

[Data][CI] Stop running all ML tests on Data premerge#60066

Merged
aslonnie merged 8 commits intoray-project:masterfrom
DeborahOlaboye:data/ci-optimize-ml-tests
Jan 22, 2026
Merged

[Data][CI] Stop running all ML tests on Data premerge#60066
aslonnie merged 8 commits intoray-project:masterfrom
DeborahOlaboye:data/ci-optimize-ml-tests

Conversation

@DeborahOlaboye
Copy link
Contributor

@DeborahOlaboye DeborahOlaboye commented Jan 12, 2026

Description

This PR reduces CI time for Data-only PRs by ensuring that changes to python/ray/data/ no longer trigger all ML/train tests unnecessarily.

Related issues

Closes #59780

Contribution by Gittensor, learn more at https://gittensor.io/

@DeborahOlaboye DeborahOlaboye requested a review from a team as a code owner January 12, 2026 21:40
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to reduce CI time for Data-only PRs by preventing ML/train tests from running unnecessarily. The approach involves modifying the test rules to trigger a new, smaller data test group for data-related file changes, and tagging specific tests as data_integration. While the overall strategy is sound, I've found a critical issue in the implementation of the new Buildkite CI step that will cause test failures. My review includes a detailed explanation and a suggested fix for this issue.

@ray-gardener ray-gardener bot added data Ray Data-related issues devprod community-contribution Contributed by the community labels Jan 13, 2026
@DeborahOlaboye DeborahOlaboye force-pushed the data/ci-optimize-ml-tests branch from 8567f7d to 9346f9b Compare January 13, 2026 07:12
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Just left a few comments

@DeborahOlaboye
Copy link
Contributor Author

Sorry for the delay! Just left a few comments

It's fine. I have now effected changes based on your feedback, kindly review.

@bveeramani
Copy link
Member

@DeborahOlaboye looks like there are some conflicts. Would you mind resolving them?

Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
@DeborahOlaboye DeborahOlaboye force-pushed the data/ci-optimize-ml-tests branch from 579fbab to 6c7e03a Compare January 20, 2026 09:41
@DeborahOlaboye
Copy link
Contributor Author

@DeborahOlaboye looks like there are some conflicts. Would you mind resolving them?

Conflicts have been resolved. Thank you for pointing that out.

@bveeramani
Copy link
Member

LGTM, ty

@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Jan 21, 2026
@bveeramani bveeramani requested a review from aslonnie January 21, 2026 05:34
--except-tags data_integration
depends_on: [ "mlgpubuild-multipy", "forge" ]

- label: ":train: ml: data integration tests"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to split by cpu/gpu?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally did, but I recommended collapsing them into a single step for now for simplicity since there are only 5-ish tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some other tests that use Ray Data (e.g. test_xgboost_trainer). by not including the tag, are you ok with the tradeoff of the tests only running in postmerge @bveeramani

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I can ran the proposal by my team and we're aligned

Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@aslonnie aslonnie merged commit 191d6e4 into ray-project:master Jan 22, 2026
6 checks passed
jinbum-kim pushed a commit to jinbum-kim/ray that referenced this pull request Jan 29, 2026
)

## Description
This PR reduces CI time for Data-only PRs by ensuring that changes to
`python/ray/data/` no longer trigger all ML/train tests unnecessarily.

## Related issues
Closes ray-project#59780

Contribution by Gittensor, learn more at https://gittensor.io/

---------

Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
)

## Description
This PR reduces CI time for Data-only PRs by ensuring that changes to
`python/ray/data/` no longer trigger all ML/train tests unnecessarily.

## Related issues
Closes ray-project#59780

Contribution by Gittensor, learn more at https://gittensor.io/

---------

Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
)

## Description
This PR reduces CI time for Data-only PRs by ensuring that changes to
`python/ray/data/` no longer trigger all ML/train tests unnecessarily.

## Related issues
Closes ray-project#59780

Contribution by Gittensor, learn more at https://gittensor.io/

---------

Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
)

## Description
This PR reduces CI time for Data-only PRs by ensuring that changes to
`python/ray/data/` no longer trigger all ML/train tests unnecessarily.

## Related issues
Closes ray-project#59780

Contribution by Gittensor, learn more at https://gittensor.io/

---------

Signed-off-by: DeborahOlaboye <deboraholaboye@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues devprod go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data][CI] Stop running all ML tests on Data premerge

4 participants