This repository provides a comprehensive guide to using the Slurm Workload Manager for running and managing jobs on High-Performance Computing (HPC) platforms. It covers essential Slurm commands, advanced features, and includes example scripts tailored to various use cases.
- Basic Slurm Commands: Learn the foundational commands to submit, monitor, and manage jobs.
- Job Dependencies: Understand how to define dependencies between jobs for complex workflows.
- Job Arrays: Simplify large-scale, repetitive tasks with job arrays.
- Advanced Topics: Explore additional techniques to enhance job management efficiency.
The repository includes ready-to-use Slurm job scripts for different scenarios, such as:
- Simple batch job submission
- Running parallel jobs
- Submitting job arrays
- Submitting jobs with dependencies
sinfo: Display compute partition and node information
sbatch: Sumbit a job script for remote execution
srun: Launch parallel tasks (job steps) for MPI jobs
salloc: Allocate resources for an interactive job
squeue: Display status of jobs and job steps
sprio: Display job priority information
scancel: Cancel pending or running jobs
sstat: Display status information for running jobs
sacct: Dispaly accounting information for past jobs
seff: Display job efficiency information for past jobs
scontrol: Display or modify Slurm configuration and state
Add #SBATCH --dependency=<type> to job script Or use
sbatch --dependency=<type> script.sh
sbatch --dependency=afterok:job_id script.sh
sbatch --dependency=afternotok:job_id script.sh
sbatch --dependency=afterany:job_id script.sh
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily
This tutorial was heavily inspired by the YouTube lecture:
Slurm Job Management
by the University of Southern California.