feat: add support for cublas/openblas in the llama.cpp backend by mudler · Pull Request #258 · mudler/LocalAI

mudler · 2023-05-14T20:17:42Z

See upstream PR: ggml-org/llama.cpp#1412

Allows to build LocalAI with the llama.cpp backend with cublas/openblas:

Cublas

To build, run:

make BUILD_TYPE=cublas CUDA_LIBPATH=.... build

OpenBLAS

make BUILD_TYPE=openblas build

To set the number of GPU layers, in the config file:

gpu_layers: 4

This also drops the "generic" build type, as I'm sunsetting it in favor of specific cmake parameters

Related to: #69

mudler · 2023-05-16T14:25:56Z

Let's merge this to master as it add-only and doesn't hurt as a starting point. I successfully built it on colab, but no way to test this locally. I'll update the docs and let see out of bug reports.

bubthegreat · 2023-05-16T18:58:27Z

Might be worth dropping this command in a readme that should allow folks to test that they have a valid detectable GPU:

docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi

Example output showing a valid GPU:

PS C:\Users\bubth\Development\LocalAI\nvidia> docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi
Unable to find image 'nvidia/cuda:10.2-base' locally
10.2-base: Pulling from nvidia/cuda
25fa05cd42bd: Already exists
24a22c1b7260: Already exists
8dea37be3176: Already exists
b4dc78aeafca: Already exists
a57130ec8de1: Already exists
Digest: sha256:86aba51da8781cc370350a6e30166ab2714229d505fd87f8d28ff6d3677a0ba4
Status: Downloaded newer image for nvidia/cuda:10.2-base
Tue May 16 18:56:46 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti      On | 00000000:01:00.0  On |                  N/A |
| 35%   46C    P8               36W / 350W|   6131MiB / 12288MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
PS C:\Users\bubth\Development\LocalAI\nvidia>

Thireus · 2023-05-19T07:43:20Z

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

mudler · 2023-05-19T09:08:59Z

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

good catch @Thireus! thanks! - do you also have a GPU at hand so you can test this out? also, do you feel taking a stab at fixing it? otherwise I'll have a look soon

@Thireus

Thanks to @Thireus for noticing it. See: #258

@Thireus

Thanks to @Thireus for noticing it. See: #258

@Thireus

Thanks to @Thireus for noticing it. See: #258

ghost · 2023-05-22T21:37:55Z

Hey there!
I was able to build localai from an nvidia/cuda image, modifying the Dockerfile to install golang like this:

ARG GO_VERSION=1.20.4
ARG BUILD_TYPE=cublas
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
ENV REBUILD=true
WORKDIR /build
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y checkinstall libgomp1 libopenblas-dev libopenblas-base libopencv-dev libopencv-core-dev git make
RUN apt-get install -y curl unzip
RUN curl -L https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz -o /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
ENV PATH="$PATH::/usr/local/go/bin"
RUN curl -L https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.tar.gz -o /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
ENV PATH="$PATH::/usr/local/cmake-3.26.4-linux-x86_64/bin"
RUN apt-get install -y ca-certificates
ENV PATH /usr/lib/go-${GO_VERSION}/bin:$PATH
COPY . .
RUN ln -s /usr/include/opencv4/opencv2/ /usr/include/opencv
RUN make build
EXPOSE 8080
ENTRYPOINT [ "/build/entrypoint.sh" ]

Ive run into a couple of issues:
• gpu_layers is ignored and i can't get it to offload any work to the GPU
• If i set REBUILD=false, then the GPU is not used and it assumes that the container is non-cublas/openblas
This is my config file:

- name: gpt-3.5-turbo
  parameters:
    model: Manticore-13B.ggmlv3.q4_0.bin
    temperature: 0.3 
  context_size: 2048
  threads: 6
  backend: llama
  stopwords:
  - "USER:"
  - "### Instruction:"
  roles:
    user: "USER:"
    system: "ASSISTANT:"
    assistant: "ASSISTANT:"
  gpu_layers: 40

Using the provided yaml like in model-gallery yield the error

ERR error loading config file: cannot load config file: cannot unmarshal config file: yaml: unmarshal errors:
  line 1: cannot unmarshal !!map into []*api.Config

Cheers!

mudler marked this pull request as draft May 14, 2023 20:18

mudler force-pushed the gpu branch from 068bc57 to ceafde2 Compare May 14, 2023 20:26

mudler changed the title ~~feat: add support for cublas/openblas on the llama.cpp backend~~ feat: add support for cublas/openblas in the llama.cpp backend May 14, 2023

mudler force-pushed the gpu branch 3 times, most recently from 1997bf6 to 6a185ca Compare May 14, 2023 21:07

feat: add support for cublas/openblas on the llama.cpp backend

7eb2d48

mudler force-pushed the gpu branch from 6a185ca to 7eb2d48 Compare May 14, 2023 21:10

Merge branch 'master' into gpu

9bd62be

mudler marked this pull request as ready for review May 16, 2023 14:24

mudler merged commit acd03d1 into master May 16, 2023

mudler deleted the gpu branch May 16, 2023 14:26

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

60c0fe8

Thanks to @Thireus for noticing it. See: #258

mudler mentioned this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas #310

Merged

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

cfbf7bd

Thanks to @Thireus for noticing it. See: #258

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

512e1e2

Thanks to @Thireus for noticing it. See: #258

ghost mentioned this pull request May 29, 2023

feat: add CuBLAS support in Docker images #403

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for cublas/openblas in the llama.cpp backend#258

feat: add support for cublas/openblas in the llama.cpp backend#258
mudler merged 2 commits intomasterfrom
gpu

mudler commented May 14, 2023 •

edited

Loading

Uh oh!

mudler commented May 16, 2023

Uh oh!

bubthegreat commented May 16, 2023

Uh oh!

Thireus commented May 19, 2023 •

edited

Loading

Uh oh!

mudler commented May 19, 2023

Uh oh!

ghost commented May 22, 2023 •

edited by ghost

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mudler commented May 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cublas

OpenBLAS

Uh oh!

mudler commented May 16, 2023

Uh oh!

bubthegreat commented May 16, 2023

Uh oh!

Thireus commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mudler commented May 19, 2023

Uh oh!

ghost commented May 22, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mudler commented May 14, 2023 •

edited

Loading

Thireus commented May 19, 2023 •

edited

Loading

ghost commented May 22, 2023 •

edited by ghost

Loading