feat: add CuBLAS support in Docker images#403
feat: add CuBLAS support in Docker images#403mudler merged 4 commits intomudler:masterfrom sebastien-prudhomme:dockerfile-cublas
Conversation
Signed-off-by: Sébastien Prud'homme <sebastien.prudhomme@gmail.com>
|
Hello there! How do you set how many layers are offloaded to GPU? |
Hi, you need to setup "gpu_layers" in the model definition: |
|
@marianbastiUNRN for your problem described on Discord: CUDA version?, NVIDIA driver version? See Try to build the image with the same CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION as your driver. CUDA and driver compatibility: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility |
|
@sebastien-prudhomme that's amazing! thank you! this is looking good at a first pass, I'll review it later and try to give it a shot locally too |
Thanks for the reply! My config file look like this (formatted in utf-8 and validated ): ---
name: gpt-3.5-turbo
description: |
Manticore 13B - (previously Wizard Mega)
license: N/A
config_file: |
backend: llama
parameters:
model: manticore
top_k: 80
temperature: 0.2
top_p: 0.7
context_size: 1024
f16: true
template:
completion: manticore-completion
chat: manticore-chat
prompt_templates:
- name: manticore-completion
content: |
### Instruction: Complete the following sentence: {{.Input}}
### Assistant:
- name: manticore-chat
content: |
### Instruction: {{.Input}}
### Assistant:
gpu_layers: 60Other relevant comment of mine here |
mudler
left a comment
There was a problem hiding this comment.
fantastic, thanks @sebastien-prudhomme !
Description
This PR fixes #280 partially. The CI needs also to be modified to build multiple Docker images corresponding to different compilation options.
Notes for Reviewers
I've choosed to use CUDA 11 as default version. This can be changed when building the Docker image by using the CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION args:
I've also choosed to installed needed librairies only when BUILD_TYPE is "cublas". I've adapted things for "openblas" and "stablediffusion" options in the same way.
Be carefull now that the rebuild made on start of the image will only allow rebuilding with the same options provided at build time.
For people you want to test on Linux, you need a NVIDIA card, a recent NVIDIA driver and the nvidia-container-toolkit. Then just launch the container with
docker run --gpus all ...and don't forget to configure "gpu_layers" in your model definition.You should see VRAM offloading when the model is loaded:
Signed commits