-
Notifications
You must be signed in to change notification settings - Fork 45
Add support for running compute worker with other container engines #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for the pull request. Be sure that we will review it soon. |
|
This is the podman invokation used to run the worker: |
This is in preparation for generalizing container engine support to allow the use of podman.
Add support for configuring the container engine through an environment variable ( CONTAINER_ENGINE_EXECUTABLE ).
Docker will create them, but other container engines like podman may not.
This Containerfile allows rootless Podman in Podman (PINP).
|
I can confirm that I have tested with the docker worker, to ensure that these changes don't break running with docker. |
|
We get this kind of error when processing a submission : I think because it missed a mapping volume /codabench/storage:/codabench as mentioned here. After adding the mapping volume option Somehow the volume |
|
Can you please try the following invocation ( from the above comment ), we don't need to bind mount from the host, we just use the folder from inside the worker container. The security options and device are important: |
I tried exactly that command. However, the child container needs the access to temporary files generated in original container which lauched by your command. In our case, we map the volume
maps the temporary files |
|
@dtuantran I can confirm that my tests work, any chance we could jump on a quick call to see what is going on at your end? Just to confirm, you aren't bind mounting any volumes into the compute_worker contain? |
|
@dtuantran I think I see the issue with your set up, please try setting |
|
I've added the Containerfile for building the image. The gpu container can detect and use the GPU. However, I got error : I've to add the option codabench/compute_worker/compute_worker.py Line 520 in 43e01d4 And then, it works. |
|
@dtuantran Thank for trying this. I would prefer not to use |
I think this option creates perhaps a security flaw. That why I didn't commit into PR. @cjh1 : Do you know if there is another solution ? |
|
Here is a write up for rootless podman and nvidia, we should try this approach: https://github.com/henrymai/podman_wsl2_cuda_rootless |
I already did theses configurations on the VM host before testing gpu compute worker. However, It's strange that when testing with |
|
Which |
| -l info \ | ||
| -Q compute-worker \ | ||
| -n compute-worker@%n \ | ||
| --concurrency=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't duplicate this code. We should just base the gpu version on the other compute worker image and just make the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built the gpu version base on your Containerfile in order to validate gpu case. You can remove it if it isn't necessary.
I use this one |
No need to add |
|
|
||
| # Include deps | ||
| RUN curl -s -L https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo | sudo tee /etc/yum.repos.d/cuda.repo && \ | ||
| curl -s -L https://nvidia.github.io/nvidia-docker/rhel9.0/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtuantran Not sure how this built for you, sudo is not setup in the container, anyway these steps run as root so sudo is not needed. I will try to fix it up.
A brief description of the purpose of the changes contained in this PR.
This PR adds a new configuration environment variable (
CONTAINER_ENGINE_EXECUTABLE) to allow the compute worker to be run with other container technology such as podman. It also provides a rootless podmanContainerfile.Issues this PR resolves
Allow computer worker to run with podman
Checklist