Skip to content

[ci/docker] Move CI + Docker to Python 3.8 [build_base]#35040

Merged
krfricke merged 94 commits intoray-project:masterfrom
krfricke:ci/python-3-8
Jun 15, 2023
Merged

[ci/docker] Move CI + Docker to Python 3.8 [build_base]#35040
krfricke merged 94 commits intoray-project:masterfrom
krfricke:ci/python-3-8

Conversation

@krfricke
Copy link
Copy Markdown
Contributor

@krfricke krfricke commented May 4, 2023

Why are these changes needed?

Implementing #34863, this PR updates our CI and Docker images to use Python 3.8 as the default Python version.

It also updates CUDA version numbers and removes stale CUDA images.

The compatibility test becomes and explicit Python 3.7 compatibility test. Currently the tests we are running are (with some filters):

  • python/ray/tests/horovod
  • python/ray/tests/lightgbm
  • python/ray/tests/xgboost
  • python/ray/tests/ray_lightning
  • python/ray/tests/ml_py37_compat (keras + torch)
  • python/ray/air
  • python/ray/train (AIR tests)
  • python/ray/data (AIR tests)

We may want to add more tests here from other libraries, e.g. rllib (cc @sven1977 @kouroshHakha) and Serve (cc @iycheng @edoakes)

Technical details

Due to a bug in libstdc++ (e.g. tracked here: #33858), the Ray processes sometimes error out. It looks like this occurred rarely in Python 3.7, but after the switch to Python 3.8 we see it more often in some of our tests.

Note that this is an ephemeral failure and not due to specific tests. For instance, we saw test_trial_runner_3.py fail regularly, but it was always different tests. It seemed that just by the sheer amount of tests, and the quick succession of starting/stopping Ray clusters, one of these tests was bound to fail.

For those tests, we introduced an intermediate solution using pytest reruns, namely the --reruns 3 option. This specifies reruns on the subtest level, as opposed to bazels test-suite level. The reruns option was able to make these flaky tests green. However, this undermines our flaky test tracking, as actually flaky tests will be detected less often. After moving to Ubuntu 22 and a new stdlibc++, we should remove these subtest-reruns.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Kai Fricke added 4 commits May 5, 2023 11:18
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Kai Fricke added 8 commits May 5, 2023 14:40
ax
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
@krfricke krfricke added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label May 8, 2023
Signed-off-by: Kai Fricke <kai@anyscale.com>
@krfricke krfricke removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label May 8, 2023
@ollie-iterators
Copy link
Copy Markdown

The link error should be fixed by merging in recent changes from the main branch

Kai Fricke added 5 commits June 12, 2023 22:36
@krfricke
Copy link
Copy Markdown
Contributor Author

Failing tests are also failing on master

@krfricke krfricke added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 13, 2023
@aslonnie aslonnie self-requested a review June 13, 2023 13:37
Copy link
Copy Markdown
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think I am the owner of any of these changes, and you probably do not need my approval. I am generally confused about why many of the changes are required for python version change. I feel that they at least need more explanations to justify.

for example, if we need to change some of the example code when upgrade from python 3.7 to 3.8. should we worry that the examples do not work on python 3.9, 3.10, which we claim to support and have container images that are released already?

I think most of the changes are not really due to python interpreter upgrade, but mostly due to library version changes of ray's dependencies. because the dependency version resolving for each python version will be different for each library (due to the python_version constraints in requirements.txt files), how do we know if ray works on python 3.9 / 3.10 ?

ENV PATH "/home/ray/anaconda3/bin:$PATH"
ARG DEBIAN_FRONTEND=noninteractive
ARG PYTHON_VERSION=3.7.16
ARG PYTHON_VERSION=3.8.16
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they have 3.8.17 already

Comment thread python/ray/air/tests/test_integration_mlflow.py Outdated
Comment thread python/ray/data/tests/test_ecosystem.py
Comment thread python/ray/tests/ml_py37_compat/tune_hvd_keras.py Outdated
Comment thread python/ray/tests/test_cli.py
frame = pd.DataFrame()
for result in results:
frame = frame.append(result.metrics_dataframe, ignore_index=True)
frame = pd.concat([frame, result.metrics_dataframe], ignore_index=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what change caused this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.append has been deprecated and removed in e.g. pandas 1.5.x, which will be installed in some combinations of our requirements (1.3.5 is the last py3.7 release and still supported the deprecated API)

Comment on lines +10 to +11
modin==0.12.1; python_version < '3.8'
modin==0.12.1; python_version >= '3.8'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the same versions? why need the version split?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/ray-project/ray/pull/35040/files#r1220435414 - we will remove the version pin in a follow-up (which will also upgrade pandas). This is to make sure that the py 3.7 version pin remains in that follow-up.

Comment thread python/requirements/ml/requirements_tune.txt Outdated
# Requires decord which is unavailable for arm64
gluoncv==0.10.5.post0; platform_machine != "arm64"
gpy==1.10.0
gpytorch==1.8.1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is gpytorch added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC newer versions (which are available for py 3.8) would lead some tests to fail

Comment thread python/requirements/ml/requirements_tune.txt Outdated
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! If you could add a comment about the Modin dependency issue, that'd be awesome.

Kai Fricke added 5 commits June 14, 2023 08:49
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Copy link
Copy Markdown
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

Copy link
Copy Markdown
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update the corresponding documentation (installing ray guide) and the dockerhub info that the default python version is now 3.8 and not 3.7

@krfricke krfricke merged commit 7d3bc5e into ray-project:master Jun 15, 2023
@krfricke krfricke deleted the ci/python-3-8 branch June 15, 2023 07:18
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…35040)

Implementing ray-project#34863, this PR updates our CI and Docker images to use Python 3.8 as the default Python version.

It also updates CUDA version numbers and removes stale CUDA images.

The compatibility test becomes and explicit Python 3.7 compatibility test. Currently the tests we are running are (with some filters):

- python/ray/tests/horovod
- python/ray/tests/lightgbm
- python/ray/tests/xgboost
- python/ray/tests/ray_lightning
- python/ray/tests/ml_py37_compat (keras + torch)
- python/ray/air
- python/ray/train (AIR tests)
- python/ray/data (AIR tests)

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.