Parallel hyperoptimization with MongoDB#1921
Conversation
f22720e to
1300766
Compare
62027b8 to
8b07ebd
Compare
a80f2d4 to
040d631
Compare
|
Hi @Cmurilochem do we absolutely need mongodb for this? And if so, is there no pip package (or, at worst, conda-forge package?) Using the defaults channel introduces licensing problems. (If there's no other solution so be it, but we can't add it to the conda recipe) |
Hi @scarlehoff. Thanks for your help. I created a test for parallel hyperopt and wanted to make it run. The only way is to use If you so suggest (mainly to avoid add one more depency apart from |
|
The problem of the dependency is separated (if needed, we can add it). But wouldn't it be possible to use it from Edit: otherwise we simply don't add it to the conda-recipe and if one wants to run with mongodb they will have to procure that by themselves. It's not a big problem, I just hoped the conda-forge version worked, but I see it is failing... |
Yes...my test with Edit: It worked surprisingly..... |
7dbd3c3 to
727b378
Compare
Why do you say it's due to communication? That should be very minimal. Seems to me it's memory usage, more than one just doesn't fit on one GPU. In the screenshot you posted, you see that the memory usage is close to 100%. Or... is that this tensorflow thing, where it just reserves all the memory it can?
I'm still confused by this plot. First of all, the parallel/sequential is always referring to parallelization in trials right? Not in replicas? |
Tensorflow allocates all memory for itself in the GPU. You need to do this to control it https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth |
|
Thanks Juan, I remember we did this before. I checked quickly the effect on performance, with 100 replicas and the production runcard, it made it 1% slower, which may just be random variation. I don't know if it will solve the problems here, but maybe it makes sense to just always use this @scarlehoff? I can make a small separate PR for it if you agree. |
Hi @APJansen and @scarlehoff. Thanks for your help. @Cmurilochem I had a spelling mistake while setting Just to illustrate the an idea: the output of the
|
|
Ah great :) Looks promising, and it's actually still roughly twice as fast as one worker per GPU? I saw when I was testing for #1936 that GPU usage is often around 90% at 100 replicas, but still (with the changes there) I was able to run 500 replicas as well, with still better than linear scaling in the number of replicas. So that's not a limit somehow. So perhaps we'll be able to run with even more than 2 per GPU. |
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
…ropt-mongo-worker
… indicated by Aron
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
f5e7f9a to
3b97751
Compare


Aim
This PR aims to implement parallel hyperoptimizations using MongoDB datasets and mongo workers. This will enable us to calculate several trials simultaneously.
Strategy
Similarly to FileTrials, the main idea is to implement a MongoFileTrials class that inherits from MongoTrials. This new MongoFileTrials class will then be the one we will instantiate before calling hyperopt
fminTasks
MongoFileTrialsn3fitcommand andHyperScannerhyper_scan_wrapperto allow for parallel evaluation offmintrialsMondoDBandpymongoas dependenciesUsage
Local Machine (for simple tests only)
First, make sure that you have
MongoDBinstalled either viaconda(not sure if available in the latestcondaversion) orapt-get/brew. Alsopymongois necessary but this can be easily installed viapip(it has already been added as dependency).In the latest version of the code in this PR,
n3fitis adapted to run automatically (by internal subprocessing) bothmongod(that generatesMongoDBdatabases) andhyperopt-mongo-worker(that launches mongo workers).To run parallel hyperopts with
n3fit, do:where
Ndefines the number of mongo workers you want to launch in parallel. Indeed,Nwill define the number of trials we are calculating simultaneously. If you want to restart jobs, make sure you havedir_output_namein your current path and do:Snellius
Here is a complete slurm script showing how we would run a hyperopt experiment in parallel in snellius (including restarts if needed):
This would be run by doing:
Here, each mongo worker selected (4) sees and run in one separate GPU:
as implemented here. In this run, we are then calculating 4 trials in parallel.
We could also set up our experiment to run 2 mongo workers in each gpu (8 trials in parallel), e.g., by using
N_MONGOWORKERS=8in the script above. In this case, we would observe:Performance assessment
Local Machine
I have just made a very quick test in my local pc to assess the possible performance improvement with parallel hyperopts. I used the
hyper-quickcard.ymlcard fromn3fit/tests/regression(with minor modifications) and run it for 10 trials and 2 replicas varying the number of simultaneously launched mongo workers. The results are summarised in the figure below:The results look encouraging a priori.
Snellius
For the snellius tests, I have employed the slurm script above as model and a more complete runcard.txt. I ran 10 trials with 2 replicas with varying numbers of mongo workers. The final results (after several fine tunings in the code) are plotted in the figure below:
It shows the variations of the total clock run time of each job as a function of the number of launched mongo workers. The idea here is that each mongo worker is responsible for one trial in hyperopt, so the more mongo workers we launch the more trials we calculate simultaneously.
I also tested the possibility that we launch more than 1 mongo worker per gpu; see right (light grey) part of the figure. This is actually where we observe the best performance and improvement. So, as seen, a job with 8 mongo workers (2 per gpu) is nearly ~8x faster than a serial hyperopt.