From dae4643f2c27482b5b2ecfb4c4e6396f9f0b73d6 Mon Sep 17 00:00:00 2001 From: Terry Kong Date: Sun, 23 Mar 2025 14:28:58 -0700 Subject: [PATCH] fix: ray.sub race condition when overlapping srun commands on same node trying a different approach Signed-off-by: Terry Kong --- ray.sub | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/ray.sub b/ray.sub index a600d96063..7258aec045 100644 --- a/ray.sub +++ b/ray.sub @@ -59,7 +59,11 @@ ip_head=$head_node_ip:$port # First we start the head of the ray cluster on one of the physical nodes # Set GPU/CPU resources to 0 to avoid scheduling on the head node + head_cmd=$(cat <