-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
LocalAI version:
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
Environment, CPU architecture, OS, and Version:
Windows 11 Docker 25.03 with wsl2 backend
Kernel Version: 5.15.133.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.62GiB
GPU NVidia 3060Ti 8GB VRAM
Describe the bug
Running intfloat/multilingual-e5-base with transformer backend with cuda: true fail with RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) in logs
To Reproduce
Request embedding from AnythingLLM with the following embedding configuration
name: text-embedding-ada-002
backend: transformers
cuda: true
embeddings: true
low_vram: true
f16: true
device: cuda:0
parameters:
model: intfloat/multilingual-e5-base
Expected behavior
Generate Embedding
Logs
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Server started. Listening on: 127.0.0.1:46411
8:26AM DBG GRPC Service Ready
8:26AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:intfloat/multilingual-e5-base ContextSize:0 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:true Embeddings:true NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:12 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/intfloat/multilingual-e5-base Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Loading model intfloat/multilingual-e5-base to CUDA.
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr Traceback (most recent call last):
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/grpc/_server.py", line 552, in _call_behavior
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr response_or_iterator = behavior(argument, context)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/build/backend/python/transformers/transformers_server.py", line 112, in Embedding
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr model_output = self.model(**encoded_input)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 830, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr embedding_output = self.embeddings(
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return self._call_impl(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 126, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr inputs_embeds = self.word_embeddings(input_ids)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return self._call_impl(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return forward_call(*args, **kwargs)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 162, in forward
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return F.embedding(
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr File "/opt/conda/envs/transformers/lib/python3.11/site-packages/torch/nn/functional.py", line 2233, in embedding
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8:26AM DBG GRPC(intfloat/multilingual-e5-base-127.0.0.1:46411): stderr RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Additional context
I've implemented a fix locally and opened this Issue to track it.