Skip to content

Prioritize task execution by recipe order#4

Merged
mattiarighi merged 11 commits intodevelopmentfrom
task_priority
Sep 23, 2019
Merged

Prioritize task execution by recipe order#4
mattiarighi merged 11 commits intodevelopmentfrom
task_priority

Conversation

@bouweandela
Copy link
Member

Closes #3

@mattiarighi
Copy link
Contributor

That would be great!

@mattiarighi mattiarighi added the enhancement New feature or request label Jun 7, 2019
@mattiarighi mattiarighi self-requested a review June 7, 2019 05:57
@mattiarighi
Copy link
Contributor

Is this still a draft or can I already try it?

This feature would help me a lot to get more robust number for the CPU time, which I want to report in the paper.

@bouweandela
Copy link
Member Author

You can already try it, i think it works, but I need to change it a bit so the priority of the input variables used for derivation is correctly set and I need to add unit tests.

@bouweandela bouweandela marked this pull request as ready for review June 28, 2019 10:08
@mattiarighi
Copy link
Contributor

I finally found some time to test this.

Unfortunately the NCL scripts are crashing due to missing variable_info:

NCL: fatal:Undefined identifier: (variable_info) is undefined, can't continue

That's strange, since the variable_info is correctly defined in <var>_info.ncl files.

main_log.txt
NCL log for ta850

@mattiarighi
Copy link
Contributor

Anything I can do to help with this?
It would really be a great feature to run some reliable performance tests.

@mattiarighi
Copy link
Contributor

mattiarighi commented Sep 18, 2019

To fully make use of this feature, it would be useful to write the time required by each task in the debug log (this is done only for the provenance at the moment). In this way, the user can execute the recipe once and then reorder the tasks according to their runtimes.

Is that possible?

@mattiarighi
Copy link
Contributor

I tried to reorder the diagnostics in recipe_perfemtrics_CMIP5.yml, putting the slowest (sm and taZONAL) at the beginning, and testing with 16 parallel tasks.

The task are now started in the order listed in the recipe, but this helps only for the preprocessor part of the diagnostics. See the list below: the two tasks to derive the variable sm for the first diagnostic are started as first, but the other preprocessing of sm and the diagnostic script itself sm/grading are called later, in a random order. The same happens with taZONAL.

Starting task sm/sm_derive_input_sm in process [13499]
Starting task sm/sm_derive_input_mrsos in process [13498]
Starting task taZONAL/ta in process [13500]
Starting task ta850/ta in process [13501]
Starting task ta200/ta in process [13502]
Starting task ta30/ta in process [13503]
Starting task ta5/ta in process [13504]
Starting task ua850/ua in process [13505]
Starting task ua200/ua in process [13506]
Starting task va850/va in process [13507]
Starting task va200/va in process [13508]
Starting task zg500/zg in process [13509]
Starting task hus400/hus in process [13510]
Starting task tas/tas in process [13511]
Starting task ts/ts in process [13512]
Starting task pr/pr in process [13513]
Starting task clt/clt in process [13499]
Starting task ta200/cycle in process [13502]
Starting task ta850/cycle in process [13501]
Starting task ta850/grading in process [13504]
Starting task ta200/grading in process [13503]
Starting task ta30/cycle in process [13507]
Starting task ta5/cycle in process [13508]
Starting task sm/sm in process [13498]
Starting task va850/grading in process [13501]
Starting task va200/grading in process [13502]
Starting task ua200/grading in process [13506]
Starting task ua850/grading in process [13505]
Starting task rlut/rlut in process [13508]
Starting task hus400/grading in process [13510]
Starting task rsut/rsut in process [13507]
Starting task lwcre/lwcre_derive_input_rlutcs in process [13503
Starting task lwcre/lwcre_derive_input_rlut in process [13504]
Starting task swcre/swcre_derive_input_rsutcs in process [13510
Starting task zg500/grading in process [13509]
Starting task swcre/swcre_derive_input_rsut in process [13501]
Starting task od550aer/od550aer in process [13502]
Starting task od870aer/od870aer in process [13506]
Starting task abs550aer/abs550aer in process [13505]
Starting task pr/grading in process [13513]
Starting task ts/grading in process [13512]
Starting task clt/latlon in process [13499]
Starting task clt/grading in process [13509]
Starting task sm/grading in process [13498]
Starting task od550lt1aer/od550lt1aer in process [13513]
Starting task tas/latlon in process [13511]
Starting task abs550aer/grading in process [13505]
Starting task tas/grading in process [13512]
Starting task od870aer/grading in process [13506]
Starting task toz/toz_derive_input_ps in process [13506]
Starting task toz/toz_derive_input_toz in process [13505]
Starting task toz/toz_derive_input_tro3 in process [13505]
Starting task od550lt1aer/grading in process [13506]
Starting task toz/toz in process [13513]
Starting task od550aer/grading in process [13505]
Starting task toz/grading_global in process [13502]
Starting task toz/grading_shpolar in process [13509]
Starting task rsut/grading in process [13506]
Starting task rlut/grading in process [13512]
Starting task lwcre/lwcre in process [13505]
Starting task swcre/swcre in process [13511]
Starting task lwcre/latlon in process [13513]
Starting task lwcre/grading in process [13509]
Starting task swcre/latlon in process [13502]
Starting task swcre/grading in process [13507]
Starting task taZONAL/zonal in process [13504]
Starting task collect/RMSD in process [13498]
Starting task collect/taylor in process [13508]

So this PR helps a lot and should be merged, but it's probably not addressing #3 completely.
The results of 5 benchmark tests with and without this feature are shown below: there is definitely an improvement, but the variability is still there.

Without task priority:
0:25:33.422961
0:23:45.208164
0:21:42.018953
0:23:50.358184
0:19:26.837539
mean=22m 51s
median=23m 45s

With task priority:
0:17:19.614452
0:17:47.433762
0:22:01.984059
0:22:51.531649
0:25:54.753893
mean=21m 11s
median=22m 1s

@bouweandela
Copy link
Member Author

I added the run time and task priority to the info message, this should help with the ordering.

the two tasks to derive the variable sm for the first diagnostic are started as first, but the other preprocessing of sm and the diagnostic script itself sm/grading are called later, in a random order. The same happens with taZONAL.

This is because the current implementation should start as many tasks as configured in config-user.yml, starting with the tasks with the lowest priority. However, it can only start a task if all of it's ancestors are ready. Could you post the results above, but including the task completed messages? Because then we can see if there is a problem or tasks are simply waiting for ancestor tasks.

@mattiarighi
Copy link
Contributor

Because then we can see if there is a problem or tasks are simply waiting for ancestor tasks.

I think they are simply waiting for ancestors:

esmvalcore._recipe:1165 Creating preprocessor task sm/sm
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1thr10' task for variable 'sm'
esmvalcore._recipe:1181 Creating diagnostic task sm/grading
esmvalcore._recipe:1165 Creating preprocessor task taZONAL/ta
esmvalcore._recipe:779 Creating preprocessor 'ppALL' task for variable 'ta'
esmvalcore._recipe:1181 Creating diagnostic task taZONAL/zonal
esmvalcore._recipe:1165 Creating preprocessor task ta850/ta
esmvalcore._recipe:779 Creating preprocessor 'pp850' task for variable 'ta'
esmvalcore._recipe:1181 Creating diagnostic task ta850/cycle
esmvalcore._recipe:1181 Creating diagnostic task ta850/grading
esmvalcore._recipe:1165 Creating preprocessor task ta200/ta
esmvalcore._recipe:779 Creating preprocessor 'pp200' task for variable 'ta'
esmvalcore._recipe:1181 Creating diagnostic task ta200/cycle
esmvalcore._recipe:1181 Creating diagnostic task ta200/grading
esmvalcore._recipe:1165 Creating preprocessor task ta30/ta
esmvalcore._recipe:779 Creating preprocessor 'pp30' task for variable 'ta'
esmvalcore._recipe:1181 Creating diagnostic task ta30/cycle
esmvalcore._recipe:1165 Creating preprocessor task ta5/ta
esmvalcore._recipe:779 Creating preprocessor 'pp5' task for variable 'ta'
esmvalcore._recipe:1181 Creating diagnostic task ta5/cycle
esmvalcore._recipe:1165 Creating preprocessor task ua850/ua
esmvalcore._recipe:779 Creating preprocessor 'pp850' task for variable 'ua'
esmvalcore._recipe:1181 Creating diagnostic task ua850/grading
esmvalcore._recipe:1165 Creating preprocessor task ua200/ua
esmvalcore._recipe:779 Creating preprocessor 'pp200' task for variable 'ua'
esmvalcore._recipe:1181 Creating diagnostic task ua200/grading
esmvalcore._recipe:1165 Creating preprocessor task va850/va
esmvalcore._recipe:779 Creating preprocessor 'pp850' task for variable 'va'
esmvalcore._recipe:1181 Creating diagnostic task va850/grading
esmvalcore._recipe:1165 Creating preprocessor task va200/va
esmvalcore._recipe:779 Creating preprocessor 'pp200' task for variable 'va'
esmvalcore._recipe:1181 Creating diagnostic task va200/grading
esmvalcore._recipe:1165 Creating preprocessor task zg500/zg
esmvalcore._recipe:779 Creating preprocessor 'pp500' task for variable 'zg'
esmvalcore._recipe:1181 Creating diagnostic task zg500/grading
esmvalcore._recipe:1165 Creating preprocessor task hus400/hus
esmvalcore._recipe:779 Creating preprocessor 'pp400' task for variable 'hus'
esmvalcore._recipe:1181 Creating diagnostic task hus400/grading
esmvalcore._recipe:1165 Creating preprocessor task tas/tas
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV2' task for variable 'tas'
esmvalcore._recipe:1181 Creating diagnostic task tas/latlon
esmvalcore._recipe:1181 Creating diagnostic task tas/grading
esmvalcore._recipe:1165 Creating preprocessor task ts/ts
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1x1' task for variable 'ts'
esmvalcore._recipe:1181 Creating diagnostic task ts/grading
esmvalcore._recipe:1165 Creating preprocessor task pr/pr
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1' task for variable 'pr'
esmvalcore._recipe:1181 Creating diagnostic task pr/grading
esmvalcore._recipe:1165 Creating preprocessor task clt/clt
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV2' task for variable 'clt'
esmvalcore._recipe:1181 Creating diagnostic task clt/latlon
esmvalcore._recipe:1181 Creating diagnostic task clt/grading
esmvalcore._recipe:1165 Creating preprocessor task rlut/rlut
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1' task for variable 'rlut'
esmvalcore._recipe:1181 Creating diagnostic task rlut/grading
esmvalcore._recipe:1165 Creating preprocessor task rsut/rsut
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1' task for variable 'rsut'
esmvalcore._recipe:1181 Creating diagnostic task rsut/grading
esmvalcore._recipe:1165 Creating preprocessor task lwcre/lwcre
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1' task for variable 'lwcre'
esmvalcore._recipe:1181 Creating diagnostic task lwcre/latlon
esmvalcore._recipe:1181 Creating diagnostic task lwcre/grading
esmvalcore._recipe:1165 Creating preprocessor task swcre/swcre
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1' task for variable 'swcre'
esmvalcore._recipe:1181 Creating diagnostic task swcre/latlon
esmvalcore._recipe:1181 Creating diagnostic task swcre/grading
esmvalcore._recipe:1165 Creating preprocessor task od550aer/od550aer
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV2thr10' task for variable 'od550aer'
esmvalcore._recipe:1181 Creating diagnostic task od550aer/grading
esmvalcore._recipe:1165 Creating preprocessor task od870aer/od870aer
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1thr10' task for variable 'od870aer'
esmvalcore._recipe:1181 Creating diagnostic task od870aer/grading
esmvalcore._recipe:1165 Creating preprocessor task abs550aer/abs550aer
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1thr10' task for variable 'abs550aer'
esmvalcore._recipe:1181 Creating diagnostic task abs550aer/grading
esmvalcore._recipe:1165 Creating preprocessor task od550lt1aer/od550lt1aer
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV1thr10' task for variable 'od550lt1aer
esmvalcore._recipe:1181 Creating diagnostic task od550lt1aer/grading
esmvalcore._recipe:1165 Creating preprocessor task toz/toz
esmvalcore._recipe:779 Creating preprocessor 'ppNOLEV2thr10' task for variable 'toz'
esmvalcore._recipe:1181 Creating diagnostic task toz/grading_global
esmvalcore._recipe:1181 Creating diagnostic task toz/grading_shpolar
esmvalcore._recipe:1181 Creating diagnostic task collect/RMSD
esmvalcore._recipe:1181 Creating diagnostic task collect/taylor
esmvalcore._task:228 Starting task sm/sm_derive_input_sm in process [39054]
esmvalcore._task:228 Starting task sm/sm_derive_input_mrsos in process [39053]
esmvalcore._task:228 Starting task ta200/ta in process [39057]
esmvalcore._task:228 Starting task taZONAL/ta in process [39055]
esmvalcore._task:228 Starting task ta850/ta in process [39056]
esmvalcore._task:228 Starting task ta30/ta in process [39058]
esmvalcore._task:228 Starting task ua850/ua in process [39060]
esmvalcore._task:228 Starting task ta5/ta in process [39059]
esmvalcore._task:228 Starting task ua200/ua in process [39061]
esmvalcore._task:228 Starting task va850/va in process [39062]
esmvalcore._task:228 Starting task va200/va in process [39063]
esmvalcore._task:228 Starting task zg500/zg in process [39064]
esmvalcore._task:228 Starting task tas/tas in process [39066]
esmvalcore._task:228 Starting task hus400/hus in process [39065]
esmvalcore._task:228 Starting task ts/ts in process [39067]
esmvalcore._task:228 Starting task pr/pr in process [39068]
esmvalcore._task:232 Successfully completed task sm/sm_derive_input_sm in 0:00:05.636452
esmvalcore._task:228 Starting task clt/clt in process [39054]
esmvalcore._task:232 Successfully completed task ta200/ta in 0:04:04.544591
esmvalcore._task:232 Successfully completed task ta30/ta in 0:04:08.847918
esmvalcore._task:228 Starting task ta200/cycle in process [39057]
esmvalcore._task:232 Successfully completed task ta850/ta in 0:04:16.483031
esmvalcore._task:232 Successfully completed task ta5/ta in 0:04:17.468716
esmvalcore._task:228 Starting task ta200/grading in process [39058]
esmvalcore._task:228 Starting task ta850/cycle in process [39056]
esmvalcore._task:228 Starting task ta850/grading in process [39059]
esmvalcore._task:232 Successfully completed task va200/va in 0:04:58.335266
esmvalcore._task:232 Successfully completed task sm/sm_derive_input_mrsos in 0:04:58.635926
esmvalcore._task:228 Starting task sm/sm in process [39053]
esmvalcore._task:565 Collecting provenance of task ta200/cycle took 4.9 seconds
esmvalcore._task:232 Successfully completed task ta200/cycle in 0:00:48.758161
esmvalcore._task:232 Successfully completed task ua850/ua in 0:05:02.055424
esmvalcore._task:232 Successfully completed task va850/va in 0:05:04.358697
esmvalcore._task:565 Collecting provenance of task ta850/cycle took 3.1 seconds
esmvalcore._task:232 Successfully completed task ta850/cycle in 0:00:39.646499
esmvalcore._task:228 Starting task ta5/cycle in process [39057]
esmvalcore._task:228 Starting task ta30/cycle in process [39063]
esmvalcore._task:232 Successfully completed task ua200/ua in 0:05:11.792592
esmvalcore._task:228 Starting task ua850/grading in process [39060]
esmvalcore._task:228 Starting task ua200/grading in process [39062]
esmvalcore._task:228 Starting task va850/grading in process [39056]
esmvalcore._task:228 Starting task va200/grading in process [39061]
esmvalcore._task:565 Collecting provenance of task ta200/grading took 22.0 seconds
esmvalcore._task:232 Successfully completed task ta200/grading in 0:01:13.152322
esmvalcore._task:228 Starting task rlut/rlut in process [39058]
esmvalcore._task:565 Collecting provenance of task ta5/cycle took 1.4 seconds
esmvalcore._task:232 Successfully completed task ta5/cycle in 0:00:36.916506
esmvalcore._task:228 Starting task rsut/rsut in process [39057]
esmvalcore._task:232 Successfully completed task hus400/hus in 0:05:48.245311
esmvalcore._task:565 Collecting provenance of task ta30/cycle took 3.1 seconds
esmvalcore._task:232 Successfully completed task ta30/cycle in 0:00:41.679714
esmvalcore._task:228 Starting task hus400/grading in process [39065]
esmvalcore._task:565 Collecting provenance of task ta850/grading took 19.4 seconds
esmvalcore._task:232 Successfully completed task ta850/grading in 0:01:26.148020
esmvalcore._task:228 Starting task lwcre/lwcre_derive_input_rlut in process [39063]
esmvalcore._task:228 Starting task lwcre/lwcre_derive_input_rlutcs in process [39059]
esmvalcore._task:565 Collecting provenance of task ua850/grading took 18.9 seconds
esmvalcore._task:232 Successfully completed task ua850/grading in 0:01:09.082623
esmvalcore._task:565 Collecting provenance of task hus400/grading took 12.7 seconds
esmvalcore._task:232 Successfully completed task hus400/grading in 0:00:38.767516
esmvalcore._task:228 Starting task swcre/swcre_derive_input_rsutcs in process [39060]
esmvalcore._task:565 Collecting provenance of task va850/grading took 18.5 seconds
esmvalcore._task:232 Successfully completed task va850/grading in 0:01:08.480406
esmvalcore._task:228 Starting task swcre/swcre_derive_input_rsut in process [39065]
esmvalcore._task:565 Collecting provenance of task va200/grading took 18.3 seconds
esmvalcore._task:232 Successfully completed task va200/grading in 0:01:08.271151
esmvalcore._task:565 Collecting provenance of task ua200/grading took 22.1 seconds
esmvalcore._task:232 Successfully completed task ua200/grading in 0:01:13.242600
esmvalcore._task:228 Starting task od550aer/od550aer in process [39056]
esmvalcore._task:228 Starting task od870aer/od870aer in process [39061]
esmvalcore._task:228 Starting task abs550aer/abs550aer in process [39062]
esmvalcore._task:232 Successfully completed task zg500/zg in 0:06:51.626924
esmvalcore._task:228 Starting task zg500/grading in process [39064]
esmvalcore._task:232 Successfully completed task pr/pr in 0:07:18.379317
esmvalcore._task:228 Starting task pr/grading in process [39068]
esmvalcore._task:232 Successfully completed task clt/clt in 0:07:28.813079
esmvalcore._task:228 Starting task clt/latlon in process [39054]
esmvalcore._task:232 Successfully completed task sm/sm in 0:02:44.742266
esmvalcore._task:232 Successfully completed task ts/ts in 0:07:49.006200
esmvalcore._task:565 Collecting provenance of task pr/grading took 11.5 seconds
esmvalcore._task:232 Successfully completed task pr/grading in 0:00:23.078429
esmvalcore._task:228 Starting task sm/grading in process [39053]
esmvalcore._task:228 Starting task ts/grading in process [39067]
esmvalcore._task:228 Starting task clt/grading in process [39068]
esmvalcore._task:232 Successfully completed task tas/tas in 0:08:08.162402
esmvalcore._task:228 Starting task tas/latlon in process [39066]
esmvalcore._task:565 Collecting provenance of task zg500/grading took 34.0 seconds
esmvalcore._task:232 Successfully completed task zg500/grading in 0:01:23.038467
esmvalcore._task:232 Successfully completed task abs550aer/abs550aer in 0:01:41.828397
esmvalcore._task:228 Starting task abs550aer/grading in process [39062]
esmvalcore._task:228 Starting task tas/grading in process [39064]
esmvalcore._task:232 Successfully completed task od870aer/od870aer in 0:01:53.178152
esmvalcore._task:228 Starting task od870aer/grading in process [39061]
esmvalcore._task:565 Collecting provenance of task ts/grading took 14.7 seconds
esmvalcore._task:232 Successfully completed task ts/grading in 0:00:43.950529
esmvalcore._task:228 Starting task od550lt1aer/od550lt1aer in process [39067]
esmvalcore._task:565 Collecting provenance of task abs550aer/grading took 10.8 seconds
esmvalcore._task:232 Successfully completed task abs550aer/grading in 0:00:20.176235
esmvalcore._task:228 Starting task toz/toz_derive_input_tro3 in process [39062]
esmvalcore._task:565 Collecting provenance of task od870aer/grading took 8.7 seconds
esmvalcore._task:232 Successfully completed task od870aer/grading in 0:00:18.127593
esmvalcore._task:228 Starting task toz/toz_derive_input_toz in process [39061]
esmvalcore._task:232 Successfully completed task toz/toz_derive_input_toz in 0:00:05.703561
esmvalcore._task:228 Starting task toz/toz_derive_input_ps in process [39061]
esmvalcore._task:232 Successfully completed task toz/toz_derive_input_ps in 0:00:26.008192
esmvalcore._task:232 Successfully completed task toz/toz_derive_input_tro3 in 0:00:39.86671
esmvalcore._task:228 Starting task toz/toz in process [39061]
esmvalcore._task:565 Collecting provenance of task tas/grading took 20.8 seconds
esmvalcore._task:232 Successfully completed task tas/grading in 0:01:11.869811
esmvalcore._task:232 Successfully completed task od550aer/od550aer in 0:03:08.528291
esmvalcore._task:228 Starting task od550aer/grading in process [39062]
esmvalcore._task:565 Collecting provenance of task clt/grading took 40.2 seconds
esmvalcore._task:232 Successfully completed task clt/grading in 0:02:08.791384
esmvalcore._task:232 Successfully completed task od550lt1aer/od550lt1aer in 0:01:29.131070
esmvalcore._task:228 Starting task od550lt1aer/grading in process [39064]
esmvalcore._task:565 Collecting provenance of task od550aer/grading took 5.4 seconds
esmvalcore._task:232 Successfully completed task od550aer/grading in 0:00:20.038011
esmvalcore._task:565 Collecting provenance of task tas/latlon took 24.4 seconds
esmvalcore._task:232 Successfully completed task tas/latlon in 0:02:02.380069
esmvalcore._task:565 Collecting provenance of task od550lt1aer/grading took 7.4 seconds
esmvalcore._task:232 Successfully completed task od550lt1aer/grading in 0:00:15.829468
esmvalcore._task:232 Successfully completed task toz/toz in 0:01:00.576568
esmvalcore._task:228 Starting task toz/grading_shpolar in process [39067]
esmvalcore._task:228 Starting task toz/grading_global in process [39056]
esmvalcore._task:565 Collecting provenance of task toz/grading_shpolar took 0.9 seconds
esmvalcore._task:232 Successfully completed task toz/grading_shpolar in 0:00:05.120651
esmvalcore._task:565 Collecting provenance of task toz/grading_global took 1.8 seconds
esmvalcore._task:232 Successfully completed task toz/grading_global in 0:00:08.139052
esmvalcore._task:232 Successfully completed task lwcre/lwcre_derive_input_rlut in 0:05:11.8
esmvalcore._task:232 Successfully completed task rlut/rlut in 0:05:52.664351
esmvalcore._task:232 Successfully completed task rsut/rsut in 0:05:46.073557
esmvalcore._task:228 Starting task rlut/grading in process [39068]
esmvalcore._task:228 Starting task rsut/grading in process [39062]
esmvalcore._task:232 Successfully completed task swcre/swcre_derive_input_rsut in 0:05:21.5
esmvalcore._task:565 Collecting provenance of task rsut/grading took 10.8 seconds
esmvalcore._task:232 Successfully completed task rsut/grading in 0:00:34.957410
esmvalcore._task:565 Collecting provenance of task rlut/grading took 12.6 seconds
esmvalcore._task:232 Successfully completed task rlut/grading in 0:00:38.731840
esmvalcore._task:232 Successfully completed task lwcre/lwcre_derive_input_rlutcs in 0:06:29
esmvalcore._task:228 Starting task lwcre/lwcre in process [39066]
esmvalcore._task:232 Successfully completed task swcre/swcre_derive_input_rsutcs in 0:05:59
esmvalcore._task:228 Starting task swcre/swcre in process [39064]
esmvalcore._task:232 Successfully completed task lwcre/lwcre in 0:00:41.841030
esmvalcore._task:232 Successfully completed task swcre/swcre in 0:00:37.604242
esmvalcore._task:228 Starting task lwcre/latlon in process [39061]
esmvalcore._task:232 Successfully completed task taZONAL/ta in 0:13:33.553678
esmvalcore._task:228 Starting task lwcre/grading in process [39067]
esmvalcore._task:228 Starting task swcre/latlon in process [39056]
esmvalcore._task:228 Starting task swcre/grading in process [39063]
esmvalcore._task:228 Starting task taZONAL/zonal in process [39058]
esmvalcore._task:565 Collecting provenance of task lwcre/grading took 18.6 seconds
esmvalcore._task:232 Successfully completed task lwcre/grading in 0:00:42.642806
esmvalcore._task:565 Collecting provenance of task swcre/grading took 17.8 seconds
esmvalcore._task:232 Successfully completed task swcre/grading in 0:00:40.752216
esmvalcore._task:565 Collecting provenance of task lwcre/latlon took 27.2 seconds
esmvalcore._task:232 Successfully completed task lwcre/latlon in 0:01:16.173953
esmvalcore._task:565 Collecting provenance of task swcre/latlon took 23.9 seconds
esmvalcore._task:232 Successfully completed task swcre/latlon in 0:01:29.538106
esmvalcore._task:565 Collecting provenance of task taZONAL/zonal took 17.4 seconds
esmvalcore._task:232 Successfully completed task taZONAL/zonal in 0:02:08.590593
esmvalcore._task:565 Collecting provenance of task clt/latlon took 16.7 seconds
esmvalcore._task:232 Successfully completed task clt/latlon in 0:08:11.166146
esmvalcore._task:565 Collecting provenance of task sm/grading took 766.7 seconds
esmvalcore._task:232 Successfully completed task sm/grading in 0:15:56.372579
esmvalcore._task:228 Starting task collect/RMSD in process [39057]
esmvalcore._task:565 Collecting provenance of task collect/RMSD took 0.5 seconds
esmvalcore._task:232 Successfully completed task collect/RMSD in 0:00:04.861220
esmvalcore._task:228 Starting task collect/taylor in process [39065]
esmvalcore._task:565 Collecting provenance of task collect/taylor took 0.3 seconds
esmvalcore._task:232 Successfully completed task collect/taylor in 0:00:02.592613

This is a nice example, since sm/grading starts relatively late and the runtime is longer than average (27m35s).

@bouweandela
Copy link
Member Author

bouweandela commented Sep 18, 2019

I think they are simply waiting for ancestors:

In that case, I don't think there is any way in which changing the order could help any further with reducing runtime. Or do you have ideas?

@mattiarighi
Copy link
Contributor

Or do you have ideas?

No, but I wonder why we don't get the 68 tasks of this recipe always executed in the same order.

This is the starting order of the tasks in the 5 tests I've run with recipe_perfmetrics_CMIP5.yml.

task1.txt
task2.txt
task3.txt
task4.txt
task5.txt

The first 16 in the list are basically always the same (since I run with 16 parallel tasks). But afterwards the order changes quite randomly. For example, the bottleneck task sm/grading is at position 44, 48, 44, 52, and 43 in the five runs, which affects the total runtime.

Maybe I should try to look at the other tasks as well and try to optimize the order further.

Should we mention this new feature somewhere in the documentation?

@bouweandela
Copy link
Member Author

The first 16 in the list are basically always the same (since I run with 16 parallel tasks).

That is to be expected, since they will be the highest priority tasks without ancestors, this is always the same

But afterwards the order changes quite randomly. For example, the bottleneck task sm/grading is at position 44, 48, 44, 52, and 43 in the five runs, which affects the total runtime.

This is probably due to variation in the run-time of the ancestor tasks. This variation can be caused by waiting for a shared resource. This could be lots of things, it could be other preprocessing tasks if you're running too many tasks on the same node, or it could be access to storage affected by other user's demands...

Maybe I should try to look at the other tasks as well and try to optimize the order further.

Yes, that might help, but it really depends on what is causing the variation.

Should we mention this new feature somewhere in the documentation?

Good point, I'll add documentation to this pull request.

@mattiarighi
Copy link
Contributor

Thanks for clarifying!
Let's merge this once documentation is done.
I will then open a PR in ESMValTool with the optimized recipe_perfmetrics_CMIP5.yml.

@bouweandela
Copy link
Member Author

@mattiarighi I ended up adding a lot more documentation than planned, because it looked like the concept of tasks was not explained anywhere, so it was difficult to talk about their priority.

@mattiarighi
Copy link
Contributor

I ended up adding a lot more documentation than planned, because it looked like the concept of tasks was not explained anywhere, so it was difficult to talk about their priority.

Well done, thanks!
Just a few minor suggestions, feel free to ignore them if you don't agree.

@bouweandela
Copy link
Member Author

Just a few minor suggestions, feel free to ignore them if you don't agree.

Changed the text accordingly

@mattiarighi mattiarighi merged commit c754a4e into development Sep 23, 2019
@mattiarighi mattiarighi deleted the task_priority branch September 23, 2019 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prioritize task execution by recipe order

2 participants