Compile bug: convert.cu

### Git commit

221c0e0c5841a814e95b9bfd549de9d6ae00ac6e

### Operating systems

Linux

### GGML backends

CUDA

### Problem description & steps to reproduce

My daily compilation of llama.cpp from newest source failed on `convert.cu`. Can confirm reverting to 18f3b5ff9e5eda4e7d04bceff8ffdccb0a696ed8 fixes it.

### First Bad Commit

07a19e2

### Compile command

```shell
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FORCE_CUBLAS=OFF -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_CUDA_F16=ON -DGGML_BLAS=ON -DGGML_BLAS_VEND
OR=OpenBLAS
cmake --build build --config Release -j 8
```

### Relevant log output

```shell
[  9%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_1.cu.o
/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(34): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + 0] = v.x;
                 ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_0, dst_t=nv_bfloat16]" at line 764

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(35): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + y_offset] = v.y;
                        ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_0, dst_t=nv_bfloat16]" at line 764

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(34): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + 0] = v.x;
                 ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_1, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_1, dst_t=nv_bfloat16]" at line 766

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(35): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + y_offset] = v.y;
                        ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_1, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q4_1, dst_t=nv_bfloat16]" at line 766

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(34): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + 0] = v.x;
                 ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_0, dst_t=nv_bfloat16]" at line 768

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(35): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + y_offset] = v.y;
                        ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_0, dst_t=nv_bfloat16]" at line 768

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(34): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + 0] = v.x;
                 ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_1, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_1, dst_t=nv_bfloat16]" at line 770

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(35): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + y_offset] = v.y;
                        ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_1, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=2, dequantize_kernel=dequantize_q5_1, dst_t=nv_bfloat16]" at line 770

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(34): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + 0] = v.x;
                 ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=1, dequantize_kernel=dequantize_q8_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=1, dequantize_kernel=dequantize_q8_0, dst_t=nv_bfloat16]" at line 772

/devel/tools/llama.cpp/ggml/src/ggml-cuda/convert.cu(35): error: more than one operator "=" matches these operands:
            function "__nv_bfloat16::operator=(float)" (declared at line 305 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(short)" (declared at line 530 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned short)" (declared at line 534 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(int)" (declared at line 538 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned int)" (declared at line 542 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(long long)" (declared at line 546 of /usr/include/cuda_bf16.hpp)
            function "__nv_bfloat16::operator=(unsigned long long)" (declared at line 550 of /usr/include/cuda_bf16.hpp)
            operand types are: nv_bfloat16 = __half
      y[iy0 + y_offset] = v.y;
                        ^
          detected during:
            instantiation of "void dequantize_block<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t) [with qk=32, qr=1, dequantize_kernel=dequantize_q8_0, dst_t=nv_bfloat16]" at line 474
            instantiation of "void dequantize_block_cuda<qk,qr,dequantize_kernel,dst_t>(const void *, dst_t *, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, cudaStream_t) [with qk=32, qr=1, dequantize_kernel=dequantize_q8_0, dst_t=nv_bfloat16]" at line 772
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile bug: convert.cu #14834

Git commit

Operating systems

GGML backends

Problem description & steps to reproduce

First Bad Commit

Compile command

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compile bug: convert.cu #14834

Description

Git commit

Operating systems

GGML backends

Problem description & steps to reproduce

First Bad Commit

Compile command

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions