-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Hello, thank you for providing these profiles, I have been using the slurm one for years and it has enabled all of our pipeline to run really well.
This past week I noticed an issue that I am unsure how to diagnose. From the error log, jobs are submitted to cluster as expected, singularity is activated (Activating singularity image...) and then error. There is no other information in the err/out files to indicate what happened. I have contacted our HPC admins and they confirmed singularity was working on the given node and cannot see any particular reason on their end for the failure. If I allow restarts in the profile config (restart-times: 2), it resubmits and seems to work, although sometimes it requires more than 1 restart.
Have you seen this before? Is it likely server-side and not an issue with the profile's submission losing the job or something?
Thanks for your time,
Jonah.