Skip to content

TensorFlow and Horovod test#122

Closed
casparvl wants to merge 9 commits intoEESSI:mainfrom
casparvl:tf_libtest
Closed

TensorFlow and Horovod test#122
casparvl wants to merge 9 commits intoEESSI:mainfrom
casparvl:tf_libtest

Conversation

@casparvl
Copy link
Copy Markdown
Collaborator

@casparvl casparvl commented Jul 5, 2021

I recreated the TensorFlow tests from #106 in the library-of-tests fromat, and reusing the same standard hooks as I've set up for e.g. GROMACS.

To run this test case:

  • Make sure that tests/reframe is in your PYTHONPATH (so that the eessi_utils is found)
  • Adjust the attached settings.py for your system (adapt partition names, #CPU cores, #sockets, # GPUs)
  • Make sure you have a new enough ReFrame installation (3.6.2 or newer should work)
  • Make sure you have a flat module naming scheme (the find_modules logic is used based on the assumption of a flat module naming scheme). Note that it does not have to be the EESSI software stack per se - it could be some locally installed modules, provided that module av Horovod and module av TensorFlow return you respective Horovod and TensorFlow modules.

After that, simply run e.g.

PYTHONPATH=$PYTHONPATH:$(pwd) reframe --config-file=config/settings_cartesius.py --checkpath eessi-checks/applications/tensorflow2.py -r --performance-report

in the software-layer/tests/reframe directory.

Caspar van Leeuwen added 2 commits December 17, 2021 15:41
…le. Set binding if mpirun is used. Update config file for magic castle to contain the relevant items
@boegel
Copy link
Copy Markdown
Contributor

boegel commented Jun 5, 2023

@casparvl Is this still relevant, especially with EESSI/test-suite#38?

I've just introduced 2021.12 and 2023.04 branches, and would like to get rid of the main branch.

@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Apr 2, 2024

Closing this PR. We now have a native TensorFlow distributed test through EESSI/test-suite#38 . This PR (#122) has a TensorFlow test which uses Horovod for distribution. That's nice, if you want to test Horovod, but we can always re-implement that later in the EESSI test suite.

@casparvl casparvl closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants