-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Labels
Description
Submissions stuck in "Scoring" status
- Submissions stuck in "Scoring" state instead of "Failed" when the compute worker crashes
In the comment here, interesting explanation about why this problem happens and how to solve it:
Related issues:
-
submission in "Scoring" status for multiple hours on default queue #1184
-
Similarly, it looks like the status get stuck to "Preparing" when failing during this process.
Example failure during "Preparing":
[2025-09-18 11:25:05,234: ERROR/ForkPoolWorker-2] Task compute_worker_run[fd956bf5-3e2d-4168-ab48-f0896dc80993] raised unexpected: OSError(28, 'No space left on device')
Traceback (most recent call last):
[...]
OSError: [Errno 28] No space left on deviceDuplication of submission files
Original issue:
Directory structure problem
Docker pull failing
- Docker pull failing
Pull for image: codalab/codalab-legacy:py39 returned a non-zero exit code! Check if the docker image exists on docker hub.
Related issues:
- submission in "Scoring" status for multiple hours on default queue #1184
- Solution is always running #1278
- Submission stuck on scoring status (Twice) #1263
Solution:
- To have more logs, we need to update
compute_worker.pyso we print more logs in the logger (More logs when docker pull fails in compute_worker.py #1283).
Logs at the wrong place
- Docker pull error during scoring are written in ingestion stderr instead of scoring stdrerr (Docker pull in scoring #1204)
Solved by: Show error in scoring std_err #1214
No hostname is server status when status is "Preparing"
- The "Preparing" status means that the worker is downloading the necessary data and programs to run the submission. We should already have a hostname in the server status page during this phase, but it is not the case.