I thought I'd cross-post a bit from future.batchtools to here (see futureverse/future.batchtools#23 for the original request) -- we use SLURM which, like many batch systems, supports "array jobs" -- basically two-level job hierarchies. These are implemented, in part, so batch systems don't get overwhelmed by people submitting thousands of jobs.
My understanding is future isn't quite there when it comes to e.g. hierarchical looping. It seems to move forward on this, there's an easy fix which is simply the job x array, from future's point of view, is flattened, e.g. if you have an array with max 100 jobs, and you have 250 things to loop through, you just create/distribute the iterations across 3 jobs (loop ids 1:100, 101:200, 201:250). I would think the "embarassingly easy" way to think of array jobs could keep the current API and just require the backend settings to be set with e.g. array.jobs=TRUE and max.array.jobs=100.
A more complicated solution would be to begin supporting truly hierarchical looping structures, almost like a nested for loop (outer loop = job, inner loop = array). Of course this is more complicated conceptually.
In the meantime, we'd love to use future for a problem we're trying to solve (batch processing 1000s of satellite images) but we can't because our HPC limits the number of jobs we can run at once.
I thought I'd cross-post a bit from future.batchtools to here (see futureverse/future.batchtools#23 for the original request) -- we use SLURM which, like many batch systems, supports "array jobs" -- basically two-level job hierarchies. These are implemented, in part, so batch systems don't get overwhelmed by people submitting thousands of jobs.
My understanding is future isn't quite there when it comes to e.g. hierarchical looping. It seems to move forward on this, there's an easy fix which is simply the job x array, from future's point of view, is flattened, e.g. if you have an array with max 100 jobs, and you have 250 things to loop through, you just create/distribute the iterations across 3 jobs (loop ids 1:100, 101:200, 201:250). I would think the "embarassingly easy" way to think of array jobs could keep the current API and just require the backend settings to be set with e.g. array.jobs=TRUE and max.array.jobs=100.
A more complicated solution would be to begin supporting truly hierarchical looping structures, almost like a nested for loop (outer loop = job, inner loop = array). Of course this is more complicated conceptually.
In the meantime, we'd love to use future for a problem we're trying to solve (batch processing 1000s of satellite images) but we can't because our HPC limits the number of jobs we can run at once.