Gromacs ReFrame test for EESSI, with seperate library test by casparvl · Pull Request #115 · EESSI/software-layer

casparvl · 2021-06-10T12:58:24Z

Before running ReFrame with the test in this PR

make sure that tests/reframe is in your PYTHONPATH
adjust the attached settings.py for your system (adapt partition names, number of CPU cores, number of GPUs etc)
make sure you have a new enough ReFrame installation (with 3.5.0 the required keyword didn't work properly for me, 3.6.2 did)

This test should run on the EESSI software stack, but would also run on a locally installed GROMACS. Skipping of GPU tests on CPU partitions and vice versa only works as long as the module name contains cuda as substring whenever it is a GPU module (i.e. probably won't work in a hierarchical module system).

…ity check to mpi job

…perties.py

…For now, first trying to see if we can do a directory listing. Once that works, I can easily replace with the actual gromacs call.

…he expected result

…rrect exectuable with gmx_mpi, but no output is printed. Looking at the error output, It seems MPI is throwing an error. Still need to resolve that...

… requires the OpenMPI to be compiled with - the same - pmi2

…ultiple EESSI containers...

…ests

…ative test

… branch... well, here it is...

…le name would be different

…and single core later on

…m_cpus specified for the current ReFrame partition, as specified in ReFrame's settings.py

…ild ReFrame 3.5.0

…tem-specific' implementation for EESSI

… number of threads for GPU runs, to num_cpu_cores/num_gpus, and set OMP_NUM_THREADS for this case

…sed tests on GPU nodes. Removed python caches and src directory from the EESSI based test (src is now provided with the library test).

…e, tests get duplicated

…ests

… cpus_per_task are set.

…into gromacs_libtest

…s PR

casparvl · 2021-06-10T13:03:03Z

Probably best to squash this commit when merging. I've done a lot of playing around before I got to this point... ;-)

jjotero

Looks pretty good! Just one comment on the variables though. In order to use the required keyword in the tests, the class attribute must be declared first with the variable builtin as

nsteps = variable(int)

If you don't give this a value, it is set as required by default. If you set a regular Python attribute as required, you won't get all the implicit checking.

jjotero · 2021-06-11T09:10:45Z

+    num_tasks_per_node = required
+    num_cpus_per_task = required
+    nsteps = required
+    modules = required


I wouldn't set modules as required in here, since it is not used anywhere else in this class. I think the required can go in the derived class.

Hm, fair point. My idea when designing this class was to create a library test for just running a GROMACS module. But indeed, nothing specific is in this library that would make this a 'run-only' test. The person deriving from the test might technically build GROMACS as part of the test, and this would still be a valid library test to derive from. I wouldn't recommend it, but you're right: there's no use in forcing the use of a module here.

No, wait, let me get back to this: I derived the library test from RunOnlyRegressionTest. Doesn't that essentially make a module required?

Not necessarily. One could have installed gromacs in the system manually. As long as the executable gmx_mpi is there, this test can run.

Fair point. So used to managing things with modules that I sometimes forget what happens on non-HPC systems =)

jjotero · 2021-06-11T09:13:44Z

+    }
+    maintainers = ['casparvl']
+
+    @rfm.run_before('run')


Since you're using version 3.6.2, you can already write the run before/after hooks as follows

Suggested change

@rfm.run_before('run')

@run_before('run')

jjotero · 2021-06-11T09:23:17Z

+
+    @rfm.run_after('init')
+    def requires_gpu(self):
+        self.requires_cuda = False


You can move this line out to the class body and declare requires_cuda as a test variable as

requires_cuda = variable(bool, value=False)

This will give you type-checking and set it as False by default.

jjotero · 2021-06-11T10:06:05Z

+    num_tasks = required
+    num_tasks_per_node = required
+    num_cpus_per_task = required
+    nsteps = required


This variable needs to be declared first (num_tasks, num_tasks_per_node and so on were declared in the rfm.RegressionTest class already.)

Suggested change

nsteps = required

nsteps = variable(int)

I've created an issue (reframe-hpc/reframe#2011) to raise an error when this is attempted.

Ah, I see! I indeed copied what you did for num_tasks in reframe-hpc/reframe#1996 , but failed to realize that this was actually a special case since the variable was already declared as part of the ReFrame framework. Cool, ok, I'll replace by variable to get the type checking :)

jjotero · 2021-06-11T10:18:54Z

+    @rfm.run_after('setup')
+    def check_gpu_presence(self):
+        self.gpu_list = [ dev.num_devices for dev in self.current_partition.devices if dev.device_type == 'gpu' ]


It feels to me that you might be using this hook in many other tests too. If so, you can move this check_gpu_presence function into a hook utility module and then bind the external function into the class as a hook. To bind it, you can either do

import my_hook_utils as hu class MyTest(rfm.RunOnlyTest): ... @run_after('setup') def check_gpu_presence(self): hu.check_gpu_presence(self)

or as a one-liner

import my_hook_utils as hu class MyTest(rfm.RunOnlyTest): ... run_after('setup')(bind(hu.check_gpu_presence))

Oh, I like this idea! And it gives me another idea to: I can do the same thing for the requires_cuda test. The regex makes some pretty strong assumption (namely that cuda is in the name), but that only works for EB toolchains like fosscuda and intelcuda. It would fail e.g. with gompic. Moving it out of the test and into a utils kind of thing allows us to easily improve and/or adjust that check in a single place if things change in the future.

…eperate hooks and utility functions for EESSI so that I can reuse them in other tests. Make the test much clearner. 2. Change scale parameter to now also include the nsteps and num_nodes. This gets rid of the conditional if statements that set this before. 3. Define nsteps as a variable, so that we have type checking. 4. Get rid of leading rfm. in hook calls. 5. Get rid of requirement for modules in the test library and move it to the child class that implements the actual test. In theory, a run-only regression test could run on a system where a users simply has the gmx_mpi command available from a system path.

jjotero · 2021-06-15T08:23:57Z

This looks pretty good to me now! One last comment on the hook-importing from hooks.py module. If you think that you're gonna have a few hooks that will always get used together in the test, you can bundle them up into a mixin class and then just do multiple inheritance on the tests. For example:

class RequiresGpusAndCuda(rfm.RegressionMixin):
    # Skip testing GPU-based modules on CPU-based nodes    
    @run_after('setup')
    def skip_gpu_test_on_cpu_nodes(self):
        hooks.skip_gpu_test_on_cpu_nodes(self)

    # Skip testing CPU-based modules on GPU-based nodes
    # (though these would run fine, one is usually not interested in them)
    @run_after('setup')
    def skip_cpu_test_on_gpu_nodes(self):
       hooks.skip_cpu_test_on_gpu_nodes(self)

and you feed this into your derived GROMACS test with multiple inheritance.

This is more of a matter of taste than anything though. I think that both multiple inheritance and explicit hook binding work absolutely fine in this example.

casparvl · 2021-06-15T08:58:15Z

Interesting approach with the multiple inheritance. It makes the tests even more compact, but also slightly less descriptive. I think for now, I'd prefer the more verbose and descriptive approach of keeping the hooks in explicitely, rather than through inheritance.

boegel

I wouldn't include the input file itself, only a script to download it from somewhere, and a SHA256 checksum to ensure we got the right file.

boegel · 2021-11-24T16:32:29Z

We should also have a short README file, both in the test/reframe subdirectory, and into the test-specific subdirectory (<whatever>/gromacs/), with some basic general information + test-specific info (short description of input files, type of run, etc.)

casparvl · 2021-12-01T09:14:38Z

+                    'launcher': 'srun',
+                    'access':  ['-p cpu'],
+                    'environs': ['builtin'],
+                    'processor': {


This can now be taken out, since ReFrame supports autodetection of CPU architecture (not yet of GPU arch though).
https://reframe-hpc.readthedocs.io/en/stable/configure.html?highlight=auto%20detection#auto-detecting-processor-information

casparvl · 2021-12-01T10:43:00Z

Discussing with Victor, it might be better to build an EESSI test implementation on top of their GROMACS library test: https://github.com/eth-cscs/reframe/blob/master/hpctestlib/sciapps/gromacs/benchmarks.py

This library tests implements multiple tests (6 different input files for now). We probably want to run 1 test case in CI, but maybe more in the monitoring.

casparvl · 2022-09-08T11:27:11Z

Closing this PR, as we have one on top of the CSCS test library now: #156

casparvl and others added 30 commits November 9, 2020 10:33

Initial try of mpi hello world test

db3758e

Merge branch 'master' of github.com:EESSI/software-layer

fc7e8a0

Made the config do something on both our systems...

d7aa7e2

Changed config to submit to short queue for faster testing. Added san…

0d71815

…ity check to mpi job

Make the test system-independent

3d79f66

Make it system independent

8636c4c

Use a flexible num_tasks_per_node by defining it in config/system_pro…

c6a62a3

…perties.py

remove pychaches

eb94473

Removed logfile

0b446e7

Some cleanup

bee7398

Use flexible task count

545d9ed

Trying to develop a GROMACS test that runs with the EESSI container. …

11011a7

…For now, first trying to see if we can do a directory listing. Once that works, I can easily replace with the actual gromacs call.

Added a test that will just ls into the container, to see if we get t…

bd2c8ec

…he expected result

Updated gromacs test. It seems to run in parallel, and returns the co…

30cc2bd

…rrect exectuable with gmx_mpi, but no output is printed. Looking at the error output, It seems MPI is throwing an error. Still need to resolve that...

made it work with our ancient tmod...

1c335c6

Could only get the MPI in the container to work with pmix, since PMI2…

c81f95f

… requires the OpenMPI to be compiled with - the same - pmi2

Added sanity check for gromacs

49857b6

Wrote down some todo, so that I don't forget...

ebabe02

Gromacs test works now, but requires setup of an alien cache to run m…

94cedcb

…ultiple EESSI containers...

Put script for shared alien cache under version control...

5ffcd5c

made test parameterized. Todo: create seperate container and native t…

fd613f7

…ests

Added tags

45f5977

Made specific class for container test. Todo: do the same for CVMFS n…

6d1f892

…ative test

deleted non-needed files for GROMACS

2f574ee

Somehow I failed to get the right version of the GROMACS test in this…

fabf21e

… branch... well, here it is...

For now, tag this as a CPU only test. For GPU, probably only the modu…

02309e8

…le name would be different

Tag must be a set

cf7610b

Changed tag to singlenode, so we can destinguish between single node …

90f5826

…and single core later on

Updated gromacs for ReFrame 3.5.0 to run a task count based on the nu…

fb5016b

…m_cpus specified for the current ReFrame partition, as specified in ReFrame's settings.py

[ECT] [cart,lisa] [ReFrame-3.5.0.eb] [production] Testing if I can bu…

718cd89

…ild ReFrame 3.5.0

Caspar van Leeuwen and others added 11 commits May 4, 2021 11:38

Updated gromacs check to use parameter instead of parameterized_test

455d639

Added Gromacs PRACE testcase A in library test format, including 'sys…

7ee97e1

…tem-specific' implementation for EESSI

Set correct number of tasks for GPU runs, to one per GPU. Set correct…

5c8c1e2

… number of threads for GPU runs, to num_cpu_cores/num_gpus, and set OMP_NUM_THREADS for this case

Moved setting OMP_NUM_THREADS to library test. Disable running CPU ba…

bebeaf1

…sed tests on GPU nodes. Removed python caches and src directory from the EESSI based test (src is now provided with the library test).

Set valid systems based on what is returned by find_modules. Otherwis…

e3c72d8

…e, tests get duplicated

Added more clear description of how the test decides to run CPU/GPU t…

3f01a3d

…ests

Added more clear description of how num_tasks, num_tasks_per_node and…

520007d

… cpus_per_task are set.

Merge branch 'EESSI:main' into gromacs_libtest

a6d945a

Removed dummy file

b9af81e

Merge branch 'gromacs_libtest' of github.com:casparvl/software-layer …

76eeda9

…into gromacs_libtest

Add an example settings.py that works with the gromacs.py test in thi…

11f91a7

…s PR

jjotero reviewed Jun 11, 2021

View reviewed changes

Caspar van Leeuwen added 3 commits June 14, 2021 17:44

Clarified comment on commented ReFrame version requirement

261a106

Clarified error message

28878cf

casparvl closed this Jul 5, 2021

casparvl reopened this Jul 5, 2021

boegel requested changes Nov 24, 2021

View reviewed changes

casparvl commented Dec 1, 2021

View reviewed changes

Comment thread tests/reframe/eessi_utils/hooks.py

casparvl closed this Sep 8, 2022

Conversation

casparvl commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casparvl commented Jun 10, 2021

Uh oh!

jjotero left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjotero commented Jun 15, 2021

Uh oh!

casparvl commented Jun 15, 2021

Uh oh!

boegel left a comment

Choose a reason for hiding this comment

Uh oh!

boegel commented Nov 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

casparvl commented Dec 1, 2021

Uh oh!

casparvl commented Sep 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

casparvl commented Jun 10, 2021 •

edited

Loading