Summary
The attributes.queue_name field in the PSI/J JobSpec is ignored by the NERSC adapter. Regardless of the value provided (gpu_regular, regular, shared, etc.), all jobs are submitted with Slurm QOS gpu_debug, which has a 30-minute wall time limit.
This effectively caps all GPU jobs submitted through the API to 30 minutes, even though the user's account and the requested QOS allow much longer wall times.
Evidence
Job submitted via the API with queue_name: "gpu_regular":
$ scontrol show job 49622634 | grep QOS
Priority=69119 Nice=0 Account=m3792_g QOS=gpu_debug
Actual Slurm QOS limits on Perlmutter (from sacctmgr show qos):
| QOS |
MaxWall |
gpu_debug |
00:30:00 |
gpu_regular |
2-00:00:00 |
gpu_shared |
2-00:00:00 |
The user's account (m3792_g) has no per-association MaxWall limit (sacctmgr show assoc shows empty MaxWall), so the 30-min cap comes entirely from the API forcing gpu_debug.
Reproduction
POST /api/v1/compute/job/6d00f875-dfc1-4a41-9309-456c5f2048df
{
"executable": "/path/to/script.sh",
"resources": {"node_count": 1, "gpu_cores_per_process": 4},
"attributes": {
"queue_name": "gpu_regular",
"account": "m3792_g",
"duration": 5400
}
}
Expected: Job submitted with sbatch -q gpu_regular -A m3792_g -t 01:30:00
Actual: Job submitted with QOS=gpu_debug, fails with QOSMaxWallDurationPerJobLimit for any duration > 1800s
Workaround
Limit wall time to 1800s (30 min). For longer GPU jobs, submit directly via sbatch on Perlmutter, bypassing the API.
Environment
- NERSC Perlmutter
- API:
https://api.iri.nersc.gov/api/v1
- Date: 2026-03-03
Summary
The
attributes.queue_namefield in the PSI/J JobSpec is ignored by the NERSC adapter. Regardless of the value provided (gpu_regular,regular,shared, etc.), all jobs are submitted with Slurm QOSgpu_debug, which has a 30-minute wall time limit.This effectively caps all GPU jobs submitted through the API to 30 minutes, even though the user's account and the requested QOS allow much longer wall times.
Evidence
Job submitted via the API with
queue_name: "gpu_regular":Actual Slurm QOS limits on Perlmutter (from
sacctmgr show qos):gpu_debuggpu_regulargpu_sharedThe user's account (
m3792_g) has no per-association MaxWall limit (sacctmgr show assocshows empty MaxWall), so the 30-min cap comes entirely from the API forcinggpu_debug.Reproduction
Expected: Job submitted with
sbatch -q gpu_regular -A m3792_g -t 01:30:00Actual: Job submitted with
QOS=gpu_debug, fails withQOSMaxWallDurationPerJobLimitfor any duration > 1800sWorkaround
Limit wall time to 1800s (30 min). For longer GPU jobs, submit directly via
sbatchon Perlmutter, bypassing the API.Environment
https://api.iri.nersc.gov/api/v1