Crash with --useclblast (GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr)

Sometimes koboldcpp crashes when using `--useclblast`

Not using BLAS or only using OpenBLAS works fine. It only crashes when i add `--useclblast 0 0` to the command line.
I'm not sure if this has to do with the new quantization method. Some models work fine, even when i use them with `--useclblast`. The first model i noticed this with, was a ggmlv3 q4_K_M. But i also did a git pull and re-compiled koboldcpp. So i'm not sure what introduced the bug.

I'm running the latest git 6635f7efce3389a0b15d3a01cdc85c4e65c8bccc version on Debian GNU/Linux. Compiled with `make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1`
Everything worked flawlessly for a long time.

Here is the cli output of a crash:

```bash
h3ndrik@pc:~/tmp/koboldcpp$ python3 koboldcpp.py --threads 2 --nommap --useclblast 0 0 models/nous-hermes-13b.ggmlv3.q4_K_M.bin                                                                                                      
Welcome to KoboldCpp - Version 1.29                                                                                   
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.                 
Initializing dynamic library: koboldcpp_clblast.so                                                                    
==========                                                                                                            
Loading model: /home/h3ndrik/tmp/koboldcpp/models/nous-hermes-13b.ggmlv3.q4_K_M.bin                                      
[Threads: 2, BlasThreads: 2, SmartContext: False]                                                                     
                                                                                                                      
---                                                                                                                   
Identified as LLAMA model: (ver 5)                                                                                    
Attempting to Load...                                                                                                 
---                                                                                                                   
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | 
F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |                                              
llama.cpp: loading model from /home/h3ndrik/tmp/koboldcpp/models/nous-hermes-13b.ggmlv3.q4_K_M.bin                       
llama_model_load_internal: format     = ggjt v3 (latest)                                                              
llama_model_load_internal: n_vocab    = 32001                                                                         
llama_model_load_internal: n_ctx      = 2048                                                                          
llama_model_load_internal: n_embd     = 5120                                                                          
llama_model_load_internal: n_mult     = 256                                                                           
llama_model_load_internal: n_head     = 40                                                                            
llama_model_load_internal: n_layer    = 40                                                                            
llama_model_load_internal: n_rot      = 128                                                                           
llama_model_load_internal: ftype      = 15 (mostly Q4_K - Medium)                                                     
llama_model_load_internal: n_ff       = 13824                                                                         
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B                                                                  [94/1203]
llama_model_load_internal: ggml ctx size = 7460.66 MB

Platform:0 Device:0  - Intel(R) OpenCL HD Graphics with Intel(R) Graphics Gen9 [0x191d]

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics Gen9 [0x191d]'
ggml_opencl: device FP16 support: true
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 9508.66 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 0 layers to GPU
llama_model_load_internal: total VRAM used: 0 MB
....................................................................................................
llama_init_from_file: kv self size  = 1600.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET / HTTP/1.1" 200 -
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET /api/v1/model HTTP/1.1" 200 -
172.16.33.42 - - [08/Jun/2023 18:30:31] "GET /api/v1/info/version HTTP/1.1" 200 -

Input: {"n": 1, "max_context_length": 1024, "max_length": 80, "rep_pen": 1.1, "temperature": 0.7, "top_p": 0.5, "top_k
": 0, "top_a": 0.75, "typical": 0.19, "tfs": 0.97, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [6, 5
, 4, 3, 2, 1, 0], "prompt": "### Instruction:\n[Redacted]\n### Response:", "quiet": true}

Processing Prompt [BLAS] (276 / 276 tokens)GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr
Aborted
```

`$ uname -a`

Linux pc 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

python 3.9.2

A few libraries i installed: libblas-dev, libclblas-dev, libopenblas-dev, libmkl-intel-thread

(Edit: I'm probably not using clblast with an old intel iGPU anyways, so feel free to close this issue if you don't want to fix this. Seems using intel clblast on this hardware makes everything slower, not faster.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash with --useclblast (GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr) #222

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crash with --useclblast (GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr) #222

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions