Skip to content

Sandbox timeout after several minutes into training, and "422 Unprocessable Entity" #9

@arcyleung

Description

@arcyleung

When running run_skyrl_agent_oh7b_s1.sh I am seeing timeouts on the verl pipeline side, even after increasing remote_runtime_api_timeout from 10s to 30s.
Image
Image

Eventually the SkyRL-OpenHands remote runtime server fails to create new containers and returns 500 error. We have plenty CPUs (224 cores) for openhands so not sure, any suggestions would be appreciated:
Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions