We are still running into port collisions because not all ray ports are fixed (more apparent on larger node runs). Here are some outputs of the current slurm script:
#ValueError: Ray component worker_ports is trying to use a port number 53058 that is used by other components.
#Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 53058, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 64678, 'metrics_export': 54151, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
# ValueError: Ray component worker_ports is trying to use a port number 53156 that is used by other components.
# Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 64894, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 35083, 'metrics_export': 53156, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
#ValueError: Ray component worker_ports is trying to use a port number 53225 that is used by other components.
#Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 60967, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 55382, 'metrics_export': 53225, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
We are still running into port collisions because not all ray ports are fixed (more apparent on larger node runs). Here are some outputs of the current slurm script: