Gromacs ReFrame test for EESSI, with seperate library test#115
Gromacs ReFrame test for EESSI, with seperate library test#115casparvl wants to merge 44 commits intoEESSI:mainfrom casparvl:gromacs_libtest
Conversation
…ity check to mpi job
…For now, first trying to see if we can do a directory listing. Once that works, I can easily replace with the actual gromacs call.
…he expected result
…rrect exectuable with gmx_mpi, but no output is printed. Looking at the error output, It seems MPI is throwing an error. Still need to resolve that...
… requires the OpenMPI to be compiled with - the same - pmi2
…ultiple EESSI containers...
… branch... well, here it is...
…le name would be different
…and single core later on
…m_cpus specified for the current ReFrame partition, as specified in ReFrame's settings.py
…ild ReFrame 3.5.0
…tem-specific' implementation for EESSI
… number of threads for GPU runs, to num_cpu_cores/num_gpus, and set OMP_NUM_THREADS for this case
…sed tests on GPU nodes. Removed python caches and src directory from the EESSI based test (src is now provided with the library test).
…e, tests get duplicated
… cpus_per_task are set.
…into gromacs_libtest
|
Probably best to squash this commit when merging. I've done a lot of playing around before I got to this point... ;-) |
jjotero
left a comment
There was a problem hiding this comment.
Looks pretty good! Just one comment on the variables though. In order to use the required keyword in the tests, the class attribute must be declared first with the variable builtin as
nsteps = variable(int)If you don't give this a value, it is set as required by default. If you set a regular Python attribute as required, you won't get all the implicit checking.
| num_tasks_per_node = required | ||
| num_cpus_per_task = required | ||
| nsteps = required | ||
| modules = required |
There was a problem hiding this comment.
I wouldn't set modules as required in here, since it is not used anywhere else in this class. I think the required can go in the derived class.
There was a problem hiding this comment.
Hm, fair point. My idea when designing this class was to create a library test for just running a GROMACS module. But indeed, nothing specific is in this library that would make this a 'run-only' test. The person deriving from the test might technically build GROMACS as part of the test, and this would still be a valid library test to derive from. I wouldn't recommend it, but you're right: there's no use in forcing the use of a module here.
There was a problem hiding this comment.
No, wait, let me get back to this: I derived the library test from RunOnlyRegressionTest. Doesn't that essentially make a module required?
There was a problem hiding this comment.
Not necessarily. One could have installed gromacs in the system manually. As long as the executable gmx_mpi is there, this test can run.
There was a problem hiding this comment.
Fair point. So used to managing things with modules that I sometimes forget what happens on non-HPC systems =)
| } | ||
| maintainers = ['casparvl'] | ||
|
|
||
| @rfm.run_before('run') |
There was a problem hiding this comment.
Since you're using version 3.6.2, you can already write the run before/after hooks as follows
| @rfm.run_before('run') | |
| @run_before('run') |
|
|
||
| @rfm.run_after('init') | ||
| def requires_gpu(self): | ||
| self.requires_cuda = False |
There was a problem hiding this comment.
You can move this line out to the class body and declare requires_cuda as a test variable as
requires_cuda = variable(bool, value=False)This will give you type-checking and set it as False by default.
| num_tasks = required | ||
| num_tasks_per_node = required | ||
| num_cpus_per_task = required | ||
| nsteps = required |
There was a problem hiding this comment.
This variable needs to be declared first (num_tasks, num_tasks_per_node and so on were declared in the rfm.RegressionTest class already.)
| nsteps = required | |
| nsteps = variable(int) |
I've created an issue (reframe-hpc/reframe#2011) to raise an error when this is attempted.
There was a problem hiding this comment.
Ah, I see! I indeed copied what you did for num_tasks in reframe-hpc/reframe#1996 , but failed to realize that this was actually a special case since the variable was already declared as part of the ReFrame framework. Cool, ok, I'll replace by variable to get the type checking :)
| @rfm.run_after('setup') | ||
| def check_gpu_presence(self): | ||
| self.gpu_list = [ dev.num_devices for dev in self.current_partition.devices if dev.device_type == 'gpu' ] |
There was a problem hiding this comment.
It feels to me that you might be using this hook in many other tests too. If so, you can move this check_gpu_presence function into a hook utility module and then bind the external function into the class as a hook. To bind it, you can either do
import my_hook_utils as hu
class MyTest(rfm.RunOnlyTest):
...
@run_after('setup')
def check_gpu_presence(self):
hu.check_gpu_presence(self)or as a one-liner
import my_hook_utils as hu
class MyTest(rfm.RunOnlyTest):
...
run_after('setup')(bind(hu.check_gpu_presence))There was a problem hiding this comment.
Oh, I like this idea! And it gives me another idea to: I can do the same thing for the requires_cuda test. The regex makes some pretty strong assumption (namely that cuda is in the name), but that only works for EB toolchains like fosscuda and intelcuda. It would fail e.g. with gompic. Moving it out of the test and into a utils kind of thing allows us to easily improve and/or adjust that check in a single place if things change in the future.
…eperate hooks and utility functions for EESSI so that I can reuse them in other tests. Make the test much clearner. 2. Change scale parameter to now also include the nsteps and num_nodes. This gets rid of the conditional if statements that set this before. 3. Define nsteps as a variable, so that we have type checking. 4. Get rid of leading rfm. in hook calls. 5. Get rid of requirement for modules in the test library and move it to the child class that implements the actual test. In theory, a run-only regression test could run on a system where a users simply has the gmx_mpi command available from a system path.
|
This looks pretty good to me now! One last comment on the hook-importing from class RequiresGpusAndCuda(rfm.RegressionMixin):
# Skip testing GPU-based modules on CPU-based nodes
@run_after('setup')
def skip_gpu_test_on_cpu_nodes(self):
hooks.skip_gpu_test_on_cpu_nodes(self)
# Skip testing CPU-based modules on GPU-based nodes
# (though these would run fine, one is usually not interested in them)
@run_after('setup')
def skip_cpu_test_on_gpu_nodes(self):
hooks.skip_cpu_test_on_gpu_nodes(self)and you feed this into your derived GROMACS test with multiple inheritance. This is more of a matter of taste than anything though. I think that both multiple inheritance and explicit hook binding work absolutely fine in this example. |
|
Interesting approach with the multiple inheritance. It makes the tests even more compact, but also slightly less descriptive. I think for now, I'd prefer the more verbose and descriptive approach of keeping the hooks in explicitely, rather than through inheritance. |
boegel
left a comment
There was a problem hiding this comment.
I wouldn't include the input file itself, only a script to download it from somewhere, and a SHA256 checksum to ensure we got the right file.
|
We should also have a short README file, both in the |
| 'launcher': 'srun', | ||
| 'access': ['-p cpu'], | ||
| 'environs': ['builtin'], | ||
| 'processor': { |
There was a problem hiding this comment.
This can now be taken out, since ReFrame supports autodetection of CPU architecture (not yet of GPU arch though).
https://reframe-hpc.readthedocs.io/en/stable/configure.html?highlight=auto%20detection#auto-detecting-processor-information
|
Discussing with Victor, it might be better to build an EESSI test implementation on top of their GROMACS library test: https://github.com/eth-cscs/reframe/blob/master/hpctestlib/sciapps/gromacs/benchmarks.py This library tests implements multiple tests (6 different input files for now). We probably want to run 1 test case in CI, but maybe more in the monitoring. |
|
Closing this PR, as we have one on top of the CSCS test library now: #156 |
Before running ReFrame with the test in this PR
tests/reframeis in yourPYTHONPATH3.5.0therequiredkeyword didn't work properly for me,3.6.2did)This test should run on the EESSI software stack, but would also run on a locally installed GROMACS. Skipping of GPU tests on CPU partitions and vice versa only works as long as the module name contains
cudaas substring whenever it is a GPU module (i.e. probably won't work in a hierarchical module system).