Skip to content

Transformer backend error on CUDA  #1774

@fakezeta

Description

@fakezeta

LocalAI version:

quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version:

Windows 11 Docker 25.03 with wsl2 backend
Kernel Version: 5.15.133.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.62GiB
GPU NVidia 3060Ti 8GB VRAM

Describe the bug

Running intfloat/multilingual-e5-base with transformer backend with cuda: true fail with RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) in logs
To Reproduce

Request embedding from AnythingLLM with the following embedding configuration

name: text-embedding-ada-002
backend: transformers
cuda: true
embeddings: true
low_vram: true
f16: true
device: cuda:0
parameters:
  model: intfloat/multilingual-e5-base

Expected behavior

Generate Embedding
Logs

8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Server started. Listening on: 127.0.0.1:46411
8:26AM DBG GRPC Service Ready
8:26AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:intfloat/multilingual-e5-base ContextSize:0 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:true Embeddings:true NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:12 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/intfloat/multilingual-e5-base Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Loading model intfloat/multilingual-e5-base to CUDA.
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Traceback (most recent call last):
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/grpc/_server.py", line 552, in _call_behavior
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     response_or_iterator = behavior(argument, context)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/build/backend/python/transformers/transformers_server.py", line 112, in Embedding
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     model_output = self.model(**encoded_input)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 830, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     embedding_output = self.embeddings(
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr                        ^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return self._call_impl(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 126, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     inputs_embeds = self.word_embeddings(input_ids)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return self._call_impl(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 162, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return F.embedding(
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr   File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/functional.py", line 2233, in embedding
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Additional context

I've implemented a fix locally and opened this Issue to track it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions