Skip to content

Adding sysimage generation to mpi tests#48

Merged
fverdugo merged 15 commits intorelease-0.2from
adding_sysimage_generation_to_mpi_tests
Oct 22, 2021
Merged

Adding sysimage generation to mpi tests#48
fverdugo merged 15 commits intorelease-0.2from
adding_sysimage_generation_to_mpi_tests

Conversation

@amartinhuertas
Copy link
Copy Markdown
Member

@amartinhuertas amartinhuertas commented Oct 21, 2021

Solve Medium Priority Task in #39

IMPORTANT NOTE: This PR MUST BE reviewed after #47

@amartinhuertas amartinhuertas changed the base branch from master to release-0.2 October 21, 2021 03:30
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 21, 2021

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (73de176) to head (120e0a4).

Additional details and impacted files
@@             Coverage Diff             @@
##           release-0.2     #48   +/-   ##
===========================================
  Coverage         0.00%   0.00%           
===========================================
  Files               33      33           
  Lines             3205    3205           
===========================================
  Misses            3205    3205           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@fverdugo fverdugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Perhaps separating the sequential and mpi tests is enough for now.

@amartinhuertas
Copy link
Copy Markdown
Member Author

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Yes, by now it does not improve much. If we want to perform tests with different values of the number of processors (apart from 1 and 4), this will be very helfpul.

@amartinhuertas
Copy link
Copy Markdown
Member Author

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Besides, we will be evaluating the ability to precompile GridapDistributed.jl. Such action for clusters is a must.

…into adding_sysimage_generation_to_mpi_tests
@fverdugo
Copy link
Copy Markdown
Member

Besides, we will be evaluating the ability to precompile GridapDistributed.jl. Such action for clusters is a must.

yes, but precompilation needs to be done in the final app, because we cannot anticipate which other packages is going to call the user.

With PackageCompiler generating a system image is straight forward once you have the the final app. I think, it is not our job to provide a precompilaiton mechanism having PackageCompiler at our disposal.

In any case, perhaps it is a good idea to create a new repo GridapBenchmarks or whatever, where we can add both serial and parallel benchs. And we definitively want precompilation scripts in that repo.

@amartinhuertas
Copy link
Copy Markdown
Member Author

I think it makes a lot of sense to test the workflow that you are going to face when dealing with a cluster. The repo of apps is going to be abandoned, and not tested frequently (we have already many other experiences of similar repos).

@fverdugo
Copy link
Copy Markdown
Member

I think it makes a lot of sense to test the workflow that you are going to face when dealing with a cluster. The repo of apps is going to be abandoned, and not tested frequently (we have already many other experiences of similar repos).

This is a very good point! but as the sysimage is created now, it is not testing the actual workflow that one needs to follow in a cluster. To check the workflow, you need an app, i.e., you need a (or some) "main" function(s) inside a Package and then create a sys image for those functions.

We have several options:

  1. Move the "main" functions of the tests inside the src dir and then create a system image by calling them on a single MPI task
  2. Create a dummy "test app" in the test dir and move the "main" test functions to the src dir of the "test app" and then precompile the "test app".

I would go for 2 definitively, even though it has some more boilerplate code, since it actually mimics what a user needs to do in a cluster.

What do you think?

@amartinhuertas
Copy link
Copy Markdown
Member Author

yes, but precompilation needs to be done in the final app, because we cannot anticipate which other packages is going to call the user.

I agree the solution that we currently have in this branch is not definitive, in the sense that it is not testing exactly what will happen with the actual app, with all dependencies etc.. But it may happen that even with the current solution we detect a failure of precompilation after a commit. This can give us some early useful information, that would be otherwise very hard to obtain if the issue happens when you deploy the software in the cluster.

@fverdugo
Copy link
Copy Markdown
Member

and BTW the precompilation scrips should be in the "test app" (i.e. the final app) as usual

@amartinhuertas
Copy link
Copy Markdown
Member Author

What do you think?

Ok. Let us go for 2. Can we use the sysimage of the test app to accelerate the mpi tests or would you avoid that?

@amartinhuertas
Copy link
Copy Markdown
Member Author

@fverdugo ... all tests are now passing. PR ready to review/merge.

The CI script generates the TestApp.so sysimage, and then leverages it to run the MPI tests with 1x and 4x MPI tasks. The generation of the image takes 21 min according to github CI. On the other hand, the parallel MPI tests runtime is missleading in Gitlub CI: https://github.com/gridap/GridapDistributed.jl/runs/3970159570?check_suite_focus=true I think it only measures the time spent in the runtests.jl script, not the time spent in the mpirun processes forked out of it.

@fverdugo fverdugo merged commit d7ccea9 into release-0.2 Oct 22, 2021
@fverdugo fverdugo deleted the adding_sysimage_generation_to_mpi_tests branch October 22, 2021 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants