Slurm utils design

# Discussion with @markfuge and @g-braeunlich 

## Main use cases
- Dataset generation: calling `optimize` or `simulate`. Currently uses the config and design factories to create a `parameter_space` using `slurm.Args`.
- Starting gradient descent from different initial points: you only change the starting_point of the optimization (all other parameters kept unchanged). 
- We want to run things other than optimize or simulate: render or even custom code could be run inside the job (e.g., using a callback). MapReduce like logic.
- Group small runs into a bigger job to reduce Slurm scheduling overhead. Might be useful for ML models evaluations.
- 

## Additional features
- Specify how long each runtime should be for Slurm for jobs that are too long (bad simulations). This timeout is job specific and Euler specific. 
- How to check which run failed? How do you track back to the config/parameters/args that triggered this run? What we want: job id, error or timeout, args used to run this job.

## Found bugs
- Job array size limit on Euler (kicked out by Euler)
- OOM killed for the reduce node (ask for a bigger node for reduce)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slurm utils design #107

Discussion with @markfuge and @g-braeunlich

Main use cases

Additional features

Found bugs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slurm utils design #107

Description

Discussion with @markfuge and @g-braeunlich

Main use cases

Additional features

Found bugs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions