Skip to content

Launcher won't run in parallel on different cluster #64

@rsbrennan

Description

@rsbrennan

I'm having issues using launcher in a different slurm cluster. I have things running on TACC no problem.

After installing I can successfully run simple jobs that aren't running in parallel. That is, they run sequentially.

I run into problems with specifying LAUNCHER_RMI=SLURM. Specifically, when I try to run jobs in parallel, it hangs forever and repeatedly prints the attached error found here: launcher_error.txt. Note that this is only one instance of the error, which will be repeated until the job times out.

The error is stemming from line 308 in the paramrun file, when trying to autoretry the ssh submission of each job. The jobs are never submitted. It is possible that this problem is specific to the design of the cluster I'm using (at Michigan State Univ). I'm curious if others have successfully used launcher elsewhere and/or if there are any tips to getting things running.

This isn't an issue with my job scripts as they run fine on TACC.

The job file echos hello world and my launcher file is below:

#!/bin/bash

#SBATCH -J ustacks_launcher
#SBATCH --mem 250M
#SBATCH -n 10
#SBATCH -N 1
#SBATCH -o test_%j.out
#SBATCH -e test_%j.err
#SBATCH -t 00:10:00

#------------------------------------------------------

export LAUNCHER_DIR=~/launcher
export LAUNCHER_WORKDIR=`pwd`
export LAUNCHER_JOB_FILE=default_work_file
export LAUNCHER_RMI=SLURM

$LAUNCHER_DIR/paramrun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions