Update Python using Poetry (Issue #1413) #1416

bbearce · 2024-04-18T06:23:52Z

Reviewers

Summary

Meant to transition us to poetry from pip dependency management. I only upped the python version to 3.9 and not even for all Dockerfiles. I ran into issues using python3.10. There are too many changes in a variety of packages. I think the only real way to do this is one package at a time and slowly over time bump up the python. For now I really just switched us to poetry from pip but kept the same packages versions. I think I changed one called ipdb and upgraded a bit because python 3.9 required it.

Dockerfiles updated:

Dockerfile

docker build --no-cache -f Dockerfile -t codabench-django:latest ./

Dockerfile.compute_worker

docker build --no-cache -f Dockerfile.compute_worker -t codabench-compute_worker:latest ./

Dockerfile.compute_worker_gpu

docker build --no-cache -f Dockerfile.compute_worker_gpu -t codabench-compute_worker_gpu:latest ./

Dockerfile.flower

docker build --no-cache -f Dockerfile.flower -t codabench-flower:latest ./

PS: Do we use Dockerfile.celery at all?

Issues this PR resolves

Upgrading Python and packages using Poetry #1413

Checklist

Didayolo · 2024-04-22T12:58:52Z

Nice progress. This is a fundamental change that needs to be tested deeply before merging.

ihsaan-ullah · 2024-04-23T05:42:36Z

Nice work done @bbearce. I see that there is poetry for compute worker. Will this affect the setting up of a compute worker?
https://github.com/codalab/codabench/wiki/Compute-Worker-Management---Setup

Didayolo · 2024-04-23T11:01:46Z

Also, what is the file compute_worker/poetry.lock ?

Didayolo · 2024-06-11T16:30:30Z

This currently runs on https://codabench-test.lri.fr/
We need also to test compute workers
Building the GPU compute worker Dockerfile did NOT work
We are supposed to remove the requirements.txt files in this PR

bbearce · 2024-06-17T15:55:53Z

Was working on this over the weekend. A couple questions.

I had to delete basically every docker image to build the gpu one. Could we request maybe 50GB total for this VM (157.136.249.152)?
The compute worker and gpu worker both start from the same python3.9 and I believe that is for consistency. However installing cuda drivers and nvidia-toolkit is harder that way. I have a current Dockerfile I'm working on and it builds to about ~8GB but it starts from image ubuntu:20.04. Can we start from ubuntu or even better yet, nvidia-cuda images? Adding python after the fact is easy and I could even integrate pyenv, though don't want to go too crazy.
- The obvious con is that it wouldn't be the same base image as the compute worker, unless we switched that to come from ubuntu:20.04 as well (just skip cuda installation).

What do you think?

ihsaan-ullah · 2024-06-17T16:50:19Z

If you want to use a cuda image, you can check this one which we are using in another project

https://github.com/FAIR-Universe/HEP-Challenge/blob/master/docker/Dockerfile

bbearce · 2024-06-17T16:54:36Z

Exactly. The more I think about it, this is great but how much should the gpu worker and the cpu worker match (Dockerfile wise)? My take is that you'd want them to be as similar as possible and I'm worried using an nvidia image for the cpu worker doesn't make sense so then these would come from different bases. Does anyone else think having both workers start from ubuntu is too low level?

PS: Totally down to do an nvidia image for gpu worker and python or ubuntu for cpu worker if folks don't think the different base matters much. Would greatly simplify gpu setup, effectively allowing us to skip cuda install altogether.

Didayolo · 2024-06-18T09:37:33Z

Totally down to do an nvidia image for gpu worker and python or ubuntu for cpu worker if folks don't think the different base matters much. Would greatly simplify gpu setup, effectively allowing us to skip cuda install altogether.

Yeah I guess we can go for this

Didayolo · 2024-06-19T15:03:20Z

We updated the compute worker docker images:

tag test for testing CPU
tag gpu for GPU version

The CPU one is currently running submissions on the test server without problems.

We are in the process of testing the GPU version. Bug:

Traceback (most recent call last):
  File "/usr/bin/celery", line 5, in <module>
    from celery.__main__ import main
ModuleNotFoundError: No module named 'celery'
Traceback (most recent call last):
  File "/usr/bin/celery", line 5, in <module>
    from celery.__main__ import main
ModuleNotFoundError: No module named 'celery'
Traceback (most recent call last):
  File "/usr/bin/celery", line 5, in <module>
    from celery.__main__ import main
ModuleNotFoundError: No module named 'celery'
Traceback (most recent call last):
  File "/usr/bin/celery", line 5, in <module>
    from celery.__main__ import main
ModuleNotFoundError: No module named 'celery'
Traceback (most recent call last):
  File "/usr/bin/celery", line 5, in <module>
    from celery.__main__ import main
ModuleNotFoundError: No module named 'celery'

Didayolo · 2024-06-19T18:04:49Z

Testing the GPU compute worker, I get this error on a submission of the "GPU test" bundle:

[2024-06-19 18:02:02,324: INFO/MainProcess] compute-worker@88d49e766880 ready.
[2024-06-19 18:02:02,325: INFO/MainProcess] Received task: compute_worker_run[ef4800b6-9ff8-4626-8702-260f99d2c8b5]  
[2024-06-19 18:02:02,426: INFO/ForkPoolWorker-1] Received run arguments: {'user_pk': 7, 'submissions_api_url': 'https://www.codabench.org/api', 'secret': 'fd3e705e-4a4f-4777-9507-fede640bda15', 'docker_image': 'codalab/codalab-legacy:gpu', 'execution_time_limit': 600, 'id': 71224, 'is_scoring': False, 'prediction_result': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/prediction_result/2024-06-19-1718819984/6d99e685bc54/prediction_result.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=34B6XI%2FOee1gMw1L8p88gx5UNw8%3D&content-type=application%2Fzip&Expires=1718906385', 'ingestion_only_during_scoring': False, 'program_data': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/dataset/2024-06-19-1718819979/0b3a8502929c/submission.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=Q4%2FLO4Va4yIINYzRn2JSCcz%2BgBQ%3D&Expires=1718906385', 'prediction_stdout': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-19-1718819985/87a4537c79e3/prediction_stdout.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=BHhGAaPiK%2BBz0VgIyuBkzyy3hOM%3D&content-type=application%2Fzip&Expires=1718906385', 'prediction_stderr': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-19-1718819985/ea6d07365e4b/prediction_stderr.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=5G2icoWHEBZgMLXJrbSxAAUKsGA%3D&content-type=application%2Fzip&Expires=1718906385', 'prediction_ingestion_stdout': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-19-1718819985/a890fb3b7207/prediction_ingestion_stdout.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=FSOuv3o9UGq4hjRnpJTa8jKe8Nk%3D&content-type=application%2Fzip&Expires=1718906385', 'prediction_ingestion_stderr': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-19-1718819985/e030e979fdb9/prediction_ingestion_stderr.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=oFdZi6FjiY1BckMc%2BoXct6l%2BwpY%3D&content-type=application%2Fzip&Expires=1718906385'}
[2024-06-19 18:02:02,427: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/71224/ with data = {'status': 'Preparing', 'status_details': None, 'secret': 'fd3e705e-4a4f-4777-9507-fede640bda15'}
[2024-06-19 18:02:02,702: INFO/ForkPoolWorker-1] Submission updated successfully!
[2024-06-19 18:02:02,702: INFO/ForkPoolWorker-1] Checking if cache directory needs to be pruned...
[2024-06-19 18:02:02,703: INFO/ForkPoolWorker-1] Cache directory does not need to be pruned!
[2024-06-19 18:02:02,703: INFO/ForkPoolWorker-1] Getting bundle https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/dataset/2024-06-19-1718819979/0b3a8502929c/submission.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=Q4%2FLO4Va4yIINYzRn2JSCcz%2BgBQ%3D&Expires=1718906385 to unpack @ program
[2024-06-19 18:02:02,777: INFO/ForkPoolWorker-1] Beginning MD5 checksum of submission: /codabench/tmp42i5mcin/bundles/tmpctn6s1_z
[2024-06-19 18:02:02,777: INFO/ForkPoolWorker-1] Checksum result: 573f142dfb0c45b19c063131db595bb5
[2024-06-19 18:02:02,777: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/71224/ with data = {'md5': '573f142dfb0c45b19c063131db595bb5', 'secret': 'fd3e705e-4a4f-4777-9507-fede640bda15'}
[2024-06-19 18:02:02,870: INFO/ForkPoolWorker-1] Submission updated successfully!
[2024-06-19 18:02:02,870: INFO/ForkPoolWorker-1] Running pull for image: codalab/codalab-legacy:gpu
[2024-06-19 18:02:02,873: INFO/ForkPoolWorker-1] Destroying submission temp dir: /codabench/tmp42i5mcin
[2024-06-19 18:02:02,876: ERROR/ForkPoolWorker-1] Task compute_worker_run[ef4800b6-9ff8-4626-8702-260f99d2c8b5] raised unexpected: FileNotFoundError(2, 'No such file or directory')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/compute_worker.py", line 115, in run_wrapper
    run.prepare()
  File "/compute_worker.py", line 803, in prepare
    self._get_container_image(self.container_image)
  File "/compute_worker.py", line 367, in _get_container_image
    container_engine_pull = check_output(cmd)
  File "/usr/local/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/local/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.9/subprocess.py", line 1837, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-docker'

This comes from the new Dockerfile, with the older docker image the error does not appear. Indeed, we are using the new nvidia toolkit. We should update the code to stop using nvidia-docker.

Didayolo · 2024-06-19T19:29:43Z

Now the CPU version is broken...

[2024-06-19 19:26:28,933: INFO/ForkPoolWorker-1] Connecting to wss://codabench-test.lri.fr/submission_input/6/797/d3673c8d-e345-4f34-aa44-73e01e63c530/
[2024-06-19 19:26:31,802: WARNING/ForkPoolWorker-1] WS: b'docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].\n'
[2024-06-19 19:26:31,802: INFO/ForkPoolWorker-1] Process exited with 125
[2024-06-19 19:26:31,802: INFO/ForkPoolWorker-1] Disconnecting from websocket wss://codabench-test.lri.fr/submission_input/6/797/d3673c8d-e345-4f34-aa44-73e01e63c530/
[2024-06-19 19:26:33,936: INFO/ForkPoolWorker-1] [exited with 125]
[2024-06-19 19:26:33,936: INFO/ForkPoolWorker-1] [stderr]
b'docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].\n'
[2024-06-19 19:26:33,936: INFO/ForkPoolWorker-1] Putting raw data b'docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].\n' in https://minio-test.lri.fr/codabench-private/submission_details/2024-06-19-1718825186/183b292a2ba8/scoring_stderr.txt?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature=08Orj1ptjBxO7cguAlMCA83YwGs%3D&content-type=application%2Fzip&Expires=1718911586

The code in compute_worker.py:

        engine_cmd = [
            CONTAINER_ENGINE_EXECUTABLE,
            'run',
            # Remove it after run
            '--rm',
            f'--name={self.ingestion_container_name if kind == "ingestion" else self.program_container_name}',

            # Don't allow subprocesses to raise privileges
            '--security-opt=no-new-privileges',

            # GPU or not
            '--gpus', 
            'all' if os.environ.get("USE_GPU") else '0',

            # Set the volumes
            '-v', f'{self._get_host_path(program_dir)}:/app/program',
            '-v', f'{self._get_host_path(self.output_dir)}:/app/output',
            '-v', f'{self.data_dir}:/app/data:ro',

            # Start in the right directory
            '-w', '/app/program',

            # Don't buffer python output, so we don't lose any
            '-e', 'PYTHONUNBUFFERED=1',
        ]

Didayolo · 2024-06-19T19:47:52Z

Fixed!

bbearce added 6 commits April 17, 2024 22:23

poetry beginning

b1d1bb6

poetry test for django

a5847cd

poetry test for django

335aba0

poetry test for django

32892d6

poetry test for django

883fe5d

poetry test for django

93297e7

Didayolo added the In Progress label Apr 18, 2024

bbearce added 2 commits April 18, 2024 11:44

poetry test for django

3b03e69

poetry test for django

6665b38

Didayolo changed the title ~~Issue 1413~~ Update Python using Poetry (Issue #1413) Apr 19, 2024

bbearce added 4 commits April 20, 2024 15:05

poetry test for django

4a1cac4

poetry test for django

2598bcc

poetry test for django

675de70

poetry test for django

6157194

Didayolo removed the In Progress label Apr 23, 2024

Update Dockerfile.compute_worker_gpu

ddb47fa

Sugested direction

f1dc6a1

bbearce and others added 3 commits June 18, 2024 15:14

Python 3.9 for compute worker Dockerfile

c438f25

gpu worker dockerfile

d3fe8dd

Add platform to Dockerfile.compute_worker_gpu

851966d

python based gpu worker

f7505d1

bbearce added 2 commits June 19, 2024 16:58

python based gpu worker

0b1299f

new python based gpu dockerfile

7ae4f76

Didayolo added 3 commits June 19, 2024 20:49

Remove nvidia-docker

210da6f

Fix engine_cmd

317c126

Remove Dockerfile.compute_worker_gpu_bb

76689e4

Fix engine_cmd and remove requirements.txt files

e220b21

Didayolo mentioned this pull request Jun 19, 2024

Upgrading Python and packages using Poetry #1413

Closed

Didayolo merged commit 386e670 into develop Jun 19, 2024

Didayolo deleted the issue_1413 branch June 19, 2024 19:54

Didayolo mentioned this pull request Jun 19, 2024

Merge develop into master (/!\ Poetry /!\) #1490

Merged

Didayolo mentioned this pull request Jul 4, 2024

Remove Dockerfile.celery #1511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Python using Poetry (Issue #1413) #1416

Update Python using Poetry (Issue #1413) #1416

Uh oh!

bbearce commented Apr 18, 2024 •

edited by Didayolo

Loading

Uh oh!

Didayolo commented Apr 22, 2024

Uh oh!

ihsaan-ullah commented Apr 23, 2024

Uh oh!

Didayolo commented Apr 23, 2024

Uh oh!

Didayolo commented Jun 11, 2024 •

edited

Loading

Uh oh!

bbearce commented Jun 17, 2024 •

edited

Loading

Uh oh!

ihsaan-ullah commented Jun 17, 2024

Uh oh!

bbearce commented Jun 17, 2024 •

edited

Loading

Uh oh!

Didayolo commented Jun 18, 2024

Uh oh!

Didayolo commented Jun 19, 2024 •

edited

Loading

Uh oh!

Didayolo commented Jun 19, 2024 •

edited

Loading

Uh oh!

Didayolo commented Jun 19, 2024

Uh oh!

Didayolo commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update Python using Poetry (Issue #1413) #1416

Update Python using Poetry (Issue #1413) #1416

Uh oh!

Conversation

bbearce commented Apr 18, 2024 • edited by Didayolo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewers

Summary

Issues this PR resolves

Checklist

Uh oh!

Didayolo commented Apr 22, 2024

Uh oh!

ihsaan-ullah commented Apr 23, 2024

Uh oh!

Didayolo commented Apr 23, 2024

Uh oh!

Didayolo commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbearce commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ihsaan-ullah commented Jun 17, 2024

Uh oh!

bbearce commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Didayolo commented Jun 18, 2024

Uh oh!

Didayolo commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Didayolo commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Didayolo commented Jun 19, 2024

Uh oh!

Didayolo commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bbearce commented Apr 18, 2024 •

edited by Didayolo

Loading

Didayolo commented Jun 11, 2024 •

edited

Loading

bbearce commented Jun 17, 2024 •

edited

Loading

bbearce commented Jun 17, 2024 •

edited

Loading

Didayolo commented Jun 19, 2024 •

edited

Loading

Didayolo commented Jun 19, 2024 •

edited

Loading