Skip to content

Gromacs ReFrame test for EESSI, with seperate library test#115

Closed
casparvl wants to merge 44 commits intoEESSI:mainfrom
casparvl:gromacs_libtest
Closed

Gromacs ReFrame test for EESSI, with seperate library test#115
casparvl wants to merge 44 commits intoEESSI:mainfrom
casparvl:gromacs_libtest

Conversation

@casparvl
Copy link
Copy Markdown
Collaborator

@casparvl casparvl commented Jun 10, 2021

Before running ReFrame with the test in this PR

  • make sure that tests/reframe is in your PYTHONPATH
  • adjust the attached settings.py for your system (adapt partition names, number of CPU cores, number of GPUs etc)
  • make sure you have a new enough ReFrame installation (with 3.5.0 the required keyword didn't work properly for me, 3.6.2 did)

This test should run on the EESSI software stack, but would also run on a locally installed GROMACS. Skipping of GPU tests on CPU partitions and vice versa only works as long as the module name contains cuda as substring whenever it is a GPU module (i.e. probably won't work in a hierarchical module system).

casparvl and others added 30 commits November 9, 2020 10:33
…For now, first trying to see if we can do a directory listing. Once that works, I can easily replace with the actual gromacs call.
…rrect exectuable with gmx_mpi, but no output is printed. Looking at the error output, It seems MPI is throwing an error. Still need to resolve that...
… requires the OpenMPI to be compiled with - the same - pmi2
…m_cpus specified for the current ReFrame partition, as specified in ReFrame's settings.py
@casparvl
Copy link
Copy Markdown
Collaborator Author

Probably best to squash this commit when merging. I've done a lot of playing around before I got to this point... ;-)

Copy link
Copy Markdown

@jjotero jjotero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Just one comment on the variables though. In order to use the required keyword in the tests, the class attribute must be declared first with the variable builtin as

nsteps = variable(int)

If you don't give this a value, it is set as required by default. If you set a regular Python attribute as required, you won't get all the implicit checking.

num_tasks_per_node = required
num_cpus_per_task = required
nsteps = required
modules = required
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't set modules as required in here, since it is not used anywhere else in this class. I think the required can go in the derived class.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, fair point. My idea when designing this class was to create a library test for just running a GROMACS module. But indeed, nothing specific is in this library that would make this a 'run-only' test. The person deriving from the test might technically build GROMACS as part of the test, and this would still be a valid library test to derive from. I wouldn't recommend it, but you're right: there's no use in forcing the use of a module here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wait, let me get back to this: I derived the library test from RunOnlyRegressionTest. Doesn't that essentially make a module required?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. One could have installed gromacs in the system manually. As long as the executable gmx_mpi is there, this test can run.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. So used to managing things with modules that I sometimes forget what happens on non-HPC systems =)

}
maintainers = ['casparvl']

@rfm.run_before('run')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're using version 3.6.2, you can already write the run before/after hooks as follows

Suggested change
@rfm.run_before('run')
@run_before('run')

Comment thread tests/reframe/eessi-checks/applications/gromacs.py

@rfm.run_after('init')
def requires_gpu(self):
self.requires_cuda = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this line out to the class body and declare requires_cuda as a test variable as

    requires_cuda = variable(bool, value=False)

This will give you type-checking and set it as False by default.

num_tasks = required
num_tasks_per_node = required
num_cpus_per_task = required
nsteps = required
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable needs to be declared first (num_tasks, num_tasks_per_node and so on were declared in the rfm.RegressionTest class already.)

Suggested change
nsteps = required
nsteps = variable(int)

I've created an issue (reframe-hpc/reframe#2011) to raise an error when this is attempted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! I indeed copied what you did for num_tasks in reframe-hpc/reframe#1996 , but failed to realize that this was actually a special case since the variable was already declared as part of the ReFrame framework. Cool, ok, I'll replace by variable to get the type checking :)

Comment thread tests/reframe/eessi-checks/applications/gromacs.py
Comment on lines +47 to +49
@rfm.run_after('setup')
def check_gpu_presence(self):
self.gpu_list = [ dev.num_devices for dev in self.current_partition.devices if dev.device_type == 'gpu' ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me that you might be using this hook in many other tests too. If so, you can move this check_gpu_presence function into a hook utility module and then bind the external function into the class as a hook. To bind it, you can either do

import my_hook_utils as hu

class MyTest(rfm.RunOnlyTest):
    ...
    @run_after('setup')
    def check_gpu_presence(self):
        hu.check_gpu_presence(self)

or as a one-liner

import my_hook_utils as hu

class MyTest(rfm.RunOnlyTest):
    ...
    run_after('setup')(bind(hu.check_gpu_presence))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like this idea! And it gives me another idea to: I can do the same thing for the requires_cuda test. The regex makes some pretty strong assumption (namely that cuda is in the name), but that only works for EB toolchains like fosscuda and intelcuda. It would fail e.g. with gompic. Moving it out of the test and into a utils kind of thing allows us to easily improve and/or adjust that check in a single place if things change in the future.

Caspar van Leeuwen added 3 commits June 14, 2021 17:44
…eperate hooks and utility functions for EESSI so that I can reuse them in other tests. Make the test much clearner. 2. Change scale parameter to now also include the nsteps and num_nodes. This gets rid of the conditional if statements that set this before. 3. Define nsteps as a variable, so that we have type checking. 4. Get rid of leading rfm. in hook calls. 5. Get rid of requirement for modules in the test library and move it to the child class that implements the actual test. In theory, a run-only regression test could run on a system where a users simply has the gmx_mpi command available from a system path.
@jjotero
Copy link
Copy Markdown

jjotero commented Jun 15, 2021

This looks pretty good to me now! One last comment on the hook-importing from hooks.py module. If you think that you're gonna have a few hooks that will always get used together in the test, you can bundle them up into a mixin class and then just do multiple inheritance on the tests. For example:

class RequiresGpusAndCuda(rfm.RegressionMixin):
    # Skip testing GPU-based modules on CPU-based nodes    
    @run_after('setup')
    def skip_gpu_test_on_cpu_nodes(self):
        hooks.skip_gpu_test_on_cpu_nodes(self)

    # Skip testing CPU-based modules on GPU-based nodes
    # (though these would run fine, one is usually not interested in them)
    @run_after('setup')
    def skip_cpu_test_on_gpu_nodes(self):
       hooks.skip_cpu_test_on_gpu_nodes(self)

and you feed this into your derived GROMACS test with multiple inheritance.

This is more of a matter of taste than anything though. I think that both multiple inheritance and explicit hook binding work absolutely fine in this example.

@casparvl
Copy link
Copy Markdown
Collaborator Author

Interesting approach with the multiple inheritance. It makes the tests even more compact, but also slightly less descriptive. I think for now, I'd prefer the more verbose and descriptive approach of keeping the hooks in explicitely, rather than through inheritance.

@casparvl casparvl closed this Jul 5, 2021
@casparvl casparvl reopened this Jul 5, 2021
Copy link
Copy Markdown
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't include the input file itself, only a script to download it from somewhere, and a SHA256 checksum to ensure we got the right file.

@boegel
Copy link
Copy Markdown
Contributor

boegel commented Nov 24, 2021

We should also have a short README file, both in the test/reframe subdirectory, and into the test-specific subdirectory (<whatever>/gromacs/), with some basic general information + test-specific info (short description of input files, type of run, etc.)

'launcher': 'srun',
'access': ['-p cpu'],
'environs': ['builtin'],
'processor': {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can now be taken out, since ReFrame supports autodetection of CPU architecture (not yet of GPU arch though).
https://reframe-hpc.readthedocs.io/en/stable/configure.html?highlight=auto%20detection#auto-detecting-processor-information

Comment thread tests/reframe/eessi_utils/hooks.py
@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Dec 1, 2021

Discussing with Victor, it might be better to build an EESSI test implementation on top of their GROMACS library test: https://github.com/eth-cscs/reframe/blob/master/hpctestlib/sciapps/gromacs/benchmarks.py

This library tests implements multiple tests (6 different input files for now). We probably want to run 1 test case in CI, but maybe more in the monitoring.

@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Sep 8, 2022

Closing this PR, as we have one on top of the CSCS test library now: #156

@casparvl casparvl closed this Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants