Skip to content

Crash with --useclblast (GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr) #222

@h3ndrik

Description

@h3ndrik

Sometimes koboldcpp crashes when using --useclblast

Not using BLAS or only using OpenBLAS works fine. It only crashes when i add --useclblast 0 0 to the command line.
I'm not sure if this has to do with the new quantization method. Some models work fine, even when i use them with --useclblast. The first model i noticed this with, was a ggmlv3 q4_K_M. But i also did a git pull and re-compiled koboldcpp. So i'm not sure what introduced the bug.

I'm running the latest git 6635f7e version on Debian GNU/Linux. Compiled with make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1
Everything worked flawlessly for a long time.

Here is the cli output of a crash:

h3ndrik@pc:~/tmp/koboldcpp$ python3 koboldcpp.py --threads 2 --nommap --useclblast 0 0 models/nous-hermes-13b.ggmlv3.q4_K_M.bin                                                                                                      
Welcome to KoboldCpp - Version 1.29                                                                                   
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.                 
Initializing dynamic library: koboldcpp_clblast.so                                                                    
==========                                                                                                            
Loading model: /home/h3ndrik/tmp/koboldcpp/models/nous-hermes-13b.ggmlv3.q4_K_M.bin                                      
[Threads: 2, BlasThreads: 2, SmartContext: False]                                                                     
                                                                                                                      
---                                                                                                                   
Identified as LLAMA model: (ver 5)                                                                                    
Attempting to Load...                                                                                                 
---                                                                                                                   
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | 
F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |                                              
llama.cpp: loading model from /home/h3ndrik/tmp/koboldcpp/models/nous-hermes-13b.ggmlv3.q4_K_M.bin                       
llama_model_load_internal: format     = ggjt v3 (latest)                                                              
llama_model_load_internal: n_vocab    = 32001                                                                         
llama_model_load_internal: n_ctx      = 2048                                                                          
llama_model_load_internal: n_embd     = 5120                                                                          
llama_model_load_internal: n_mult     = 256                                                                           
llama_model_load_internal: n_head     = 40                                                                            
llama_model_load_internal: n_layer    = 40                                                                            
llama_model_load_internal: n_rot      = 128                                                                           
llama_model_load_internal: ftype      = 15 (mostly Q4_K - Medium)                                                     
llama_model_load_internal: n_ff       = 13824                                                                         
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B                                                                  [94/1203]
llama_model_load_internal: ggml ctx size = 7460.66 MB

Platform:0 Device:0  - Intel(R) OpenCL HD Graphics with Intel(R) Graphics Gen9 [0x191d]

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics Gen9 [0x191d]'
ggml_opencl: device FP16 support: true
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 9508.66 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 0 layers to GPU
llama_model_load_internal: total VRAM used: 0 MB
....................................................................................................
llama_init_from_file: kv self size  = 1600.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET / HTTP/1.1" 200 -
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET /api/v1/model HTTP/1.1" 200 -
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET /api/v1/info/version HTTP/1.1" 200 -

Input: {"n": 1, "max_context_length": 1024, "max_length": 80, "rep_pen": 1.1, "temperature": 0.7, "top_p": 0.5, "top_k
": 0, "top_a": 0.75, "typical": 0.19, "tfs": 0.97, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [6, 5
, 4, 3, 2, 1, 0], "prompt": "### Instruction:\n[Redacted]\n### Response:", "quiet": true}

Processing Prompt [BLAS] (276 / 276 tokens)GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr
Aborted

$ uname -a

Linux pc 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

python 3.9.2

A few libraries i installed: libblas-dev, libclblas-dev, libopenblas-dev, libmkl-intel-thread

(Edit: I'm probably not using clblast with an old intel iGPU anyways, so feel free to close this issue if you don't want to fix this. Seems using intel clblast on this hardware makes everything slower, not faster.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions